In a multi-node environment, we have seen issues where one of the nodes gets abandoned due to few issues.
Main cause is -- One of the nodes is not reachable by ADOP.
1. Misconfiguration in the secondary node.
2. Xml not available at all.
3. Unable to SSH from Primary MT.
4. Wrong entries in /etc/hosts
The scenario arises when one of the previous ADOP cycles has already failed on this node and has been overlooked by the Engineer.
We may end-up missing the failure update because the overall status shows as 0 though it has failed in one of the nodes which goes unnoticed, if not checked properly.
Precautions to be taken in Multi-Node Env:
1. Check the ADOP status for all the nodes.(adop -status -detail).
2. Check if STATUS for any of them is showing as FAILED or F in ad_adop_sessions.
Node getting ABANDONED/Evicted happens when we do the following after ignoring the above incident.
If there are nodes, 1,2 and 3 in an instance
1. Prepare completes successfully on 1 and 2, but fails on node 3. Then, we may try to proceed with apply or cutover.
In this next phase, it asks below question.
2. Patch get applied successfully on nodes 1,3 and fails on node 2.Then node 2 will be out of Sync with other nodes.
And we try to proceed with cutover on remaining nodes and get following WARNING message:
3. If cutover is successful on some of the nodes 1 and 3 but fails on node 2.
After starting services, node 2 is marked as ABANDONED. Next if we try a fs_clone or apply below WARNING is prompted.
Previous tasks have failed or are incomplete on node: N2
Do you want adop to continue with other completed nodes [y/n]?
If the user says yes 'y' to the above prompt, then ADOP will proceed with the next phase and the FAILED node
in the previous phase will be marked as ABANDONED.
If user says no 'n' to the above prompt, then ADOP will exit with error.
******** SAY “N” AND EXIT OUT OF ADOP AT THIS STAGE ************
TO RECOVER THE NODE....
WE CAN TRY USING OPTIONS ALL_NODES=YES FROM PRIMARY OR ALL_NODES=NO FROM FAILED NODE .
If, for any reasons, we have said Yes for above prompt, then below steps need to be done for deleting and adding the node back, as a resolution.
High level steps:
1. Delete the abandoned node
2. Run fs_clone on the primary node
3. Restore the abandoned node
Step by step process to restore a abandoned node:
1. Deleting a node.
Case 1: If the secondary node to be deleted is accessible Login to the secondary node to be deleted. a) Source the run file system. Ensure that all application tier services from the run and patch file system for the node to be deleted are shut down. Execute the ebs-delete-node option of the adProvisionEBS.pl script as follows: $ perl
/patch/115/bin/adProvisionEBS.pl ebs-delete-node \ -contextfile= -logfile= Example: bash-4.1$ perl $AD_TOP/patch/115/bin/adProvisionEBS.pl ebs-delete-node -contextfile=/pczb1i/inst/fs2/inst/apps/PCZB1I_vmohsczbk012/appl/admin/PCZB1I_vmohsczbk012.xml -logfile=/pczb1i/applmgr/deletenode.log Enter the APPS Schema password: Enter the WebLogic AdminServer password: Node deleted successfully. This will delete the managed servers, OHS instances and Node Manager on the current node from the run file system WebLogic domain. b) This is needed only if instance is on R12.TXK.C.Delta.4 or lower. Source the patch file system. Execute the ebs-delete-node option of the adProvisionEBS.pl script from the patch file system providing the PATCH CONTEXT_FILE Case 2: If the secondary node to be deleted is not accessible Login to the primary node. Source the run file system. Execute the ebs-delete-node option of the adProvisionEBS.pl script as follows: $ perl /patch/115/bin/adProvisionEBS.pl ebs-delete-node \ -contextfile= -hostname= -logfile= 2. Sync the OHS configuration on the other nodes to remove references of the deleted node. We dont need this step, as there is a separate cluster for each server on each machine in our environments. 3. Set s_shared_file_system on primary to false if, the deleted node is the only node other than the primary node. Run autoconfig. 4. Run autoconfig on Db tiers. This step is required to sync-up the tcp.invited_nodes attribute in sqlnet.ora to remove the deleted node from the value of this attribute. Relogin and bounce DB listeners. 5. Delete the INST_TOP for the deleted node, only the directory of the Run Edition File System and the Patch Edition File System should be deleted. Example: Move the directory PCZB1I_vmohsczbk012 to PCZB1I_vmohsczbk012_bkp under /pczb1i/inst/fs2/inst/apps/ Please DO NOT delete anything. 6. Run FS_CLONE on primary node. 7. Execute adpreclone Utility on the Run and Patch File System On the run file system: $ cd /admin/scripts $ ./adadminsrvctl.sh start $ ./adpreclone.pl appsTier Once the utility completes, shut down the application tier processes: $ ./adstpall.sh / On the patch file system: $ cd /admin/scripts $ ./adadminsrvctl.sh start forcepatchfs $ ./adpreclone.pl appsTier Once the utility completes, shut down the application tier processes. $ ./adstpall.sh / forcepatchfs A farm is a collection of components managed by Fusion Middleware Control. It can contain Oracle WebLogic Server domains, one Administration Server, one or more Managed Servers, and the Oracle Fusion Middleware components that are installed, configured, and running in the domain. 8. Add the Secondary Application Tier Node to the Farm 8a) Prepare the PairsFile for Configuring the Run File System mkdir -p /pczb1i/applmgr/pairsfile/run mkdir -p /pczb1i/applmgr/pairsfile/patch cd /pczb1i/applmgr/pairsfile/run cp $INST_TOP/appl/admin/PCZB1I_vmohsczbk012_run.txt myrunpairsfile.txt 8b)Make necessary modifications to the pairsfile. Some of the inputs required for the Add Node API are automatically filled in. The sections that you need to fill in are the following. [Instance Specific] - This is instance specific information for the node you are going to add. Refer to the source context file for reference. [Services] On NOde 2 Modify the file 'myrunpairsfile.txt' [Instance Specific] s_temp=/pczb1i/inst/fs2/inst/apps/PCZB1I_vmohsczbk012/temp s_contextname=PCZB1I_vmohsczbk012 s_hostname=vmohsczbk012 s_domainname=oracleoutsourcing.com s_cphost=vmohsczbk012 s_webhost=vmohsczbk012 s_config_home=/pczb1i/inst/fs2/inst/apps/PCZB1I_vmohsczbk012 s_inst_base=/pczb1i/inst s_display=vmohsczbk012:0.0 s_forms-c4ws_display=vmohsczbk012:0.0 s_ohs_instance=EBS_web_PCZB1I_OHS2 s_webport=8000 s_http_listen_parameter=8000 s_https_listen_parameter=4443 [Services To be Enabled on the Secondary Application Tier Node] s_web_applications_status=enabled s_web_entry_status=enabled s_apcstatus=enabled s_root_status=enabled s_batch_status=disabled s_other_service_group_status=enabled s_adminserverstatus=disabled 8c) Configure the Run File System on the Secondary Node. Run below commands on secondary node providing primary node xml. bash-4.1$ export PATH=/pczb1i/applmgr/fs2/FMW_Home/webtier/perl/bin:$PATH bash-4.1$ /pczb1i/applmgr/fs2/FMW_Home/webtier/perl/bin/perl ./adclonectx.pl addnode contextfile=/pczb1i/inst/fs2/inst/apps/PCZB1I_vmohsczbk017/appl/admin/PCZB1I_vmohsczbk017.xml pairsfile=/pczb1i/applmgr/pairsfile/run/myrunpairsfile.txt outfile=/pczb1i/inst/fs2/inst/apps/PCZB1I_vmohsczbk012/appl/admin/PCZB1I_vmohsczbk012.xml 8d) Create the required directories and copy the pairsfile into a directory of your choice on the secondary application tier node cd /pczb1i/applmgr/pairsfile/patch cp /pczb1i/inst/fs1/inst/apps/PCZB1I_vmohsczbk012/appl/admin/PCZB1I_vmohsczbk012_patch.txt mypatchpairsfile.txt 8e) Configure the Patch File System on the Secondary Node export PATH=/pczb1i/applmgr/fs1/FMW_Home/webtier/perl/bin:$PATH cd /pczb1i/applmgr/fs1/EBSapps/comn/clone/bin /pczb1i/applmgr/fs1/FMW_Home/webtier/perl/bin/perl ./adclonectx.pl addnode contextfile=/pczb1i/inst/fs1/inst/apps/PCZB1I_vmohsczbk017/appl/admin/PCZB1I_vmohsczbk017.xml pairsfile=/pczb1i/applmgr/pairsfile/patch/mypatchpairsfile.txt outfile=/pczb1i/inst/fs1/inst/apps/PCZB1I_vmohsczbk012/appl/admin/PCZB1I_vmohsczbk012.xml 9) Check and update mod_wl_ohs.conf and apps.conf entries of managed servers. a) If any of these managed servers are not desired to be part of the cluster configuration on the current node, Run txkSetAppsConf.pl with -configoption=removeMS to delete the managed server. The details of these managed servers are deleted from the OHS configuration files mod_wl_ohs.conf and apps.conf on the current node. Example: perl /patch/115/bin/txkSetAppsConf.pl -contextfile= \ -configoption=removeMS -oacore=testserver1.example.com:7201 -forms=testserver2.example.com:7601 b) If any of the managed servers from the newly added node are desired to be part of the cluster configuration on the current node, Run txkSetAppsConf.pl with -configoption=addMS to add the managed server. The details of these managed servers are added into the OHS configuration files mod_wl_ohs.conf and apps.conf on the current node. Example: $ perl /patch/115/bin/txkSetAppsConf.pl -contextfile= \ -configoption=addMS -oacore=testserver1.example.com:7205 -oafm=testserver2.example.com:7605 10. a) Run Autoconfig on RUN filesystem on All Application tier nodes. b) Reload the Apps listeners. 11. a) Bring down Node manager on patch FS of newly added node if it's up. adnodemgrctl.sh stop b) Bring down Admin Server and the Node Manager on the Patch FS of the primary node adadminsrvctl.sh stop adnodemgrctl.sh stop 12. Run Autoconfig on DB tiers and reload the listeners.