Wednesday, April 15, 2015

Scenarios/causes/precautions and fix for Abandoned Node in 12.2

In a multi-node environment, we have seen issues where one of the nodes gets abandoned due to few issues.

Main cause is -- One of the nodes is not reachable by ADOP.

1. Misconfiguration in the secondary node.
2. Xml not available at all.
3. Unable to SSH from Primary MT.
4. Wrong entries in /etc/hosts

The scenario arises when one of the previous ADOP cycles has already failed on this node and has been overlooked by the Engineer.
We may end-up missing the failure update because the overall status shows as 0 though it has failed in one of the nodes which goes unnoticed, if not checked properly.

Precautions to be taken in Multi-Node Env:

1. Check the ADOP status for all the nodes.(adop -status -detail).
2. Check if STATUS for any of them is showing as FAILED or F in ad_adop_sessions.

Node getting ABANDONED/Evicted happens when we do the following after ignoring the above incident.

Possible Scenarios:

If there are nodes, 1,2 and 3 in an instance

1. Prepare completes successfully on 1 and 2, but fails on node 3. Then, we may try to proceed with apply or cutover.
In this next phase, it asks below question.

2. Patch get applied successfully on nodes 1,3 and fails on node 2.Then node 2 will be out of Sync with other nodes.
And we try to proceed with cutover on remaining nodes and get following WARNING message:

3. If cutover is successful on some of the nodes 1 and 3 but fails on node 2.
After starting services, node 2 is marked as ABANDONED. Next if we try a fs_clone or apply below WARNING is prompted.

Previous tasks have failed or are incomplete on node: N2
Do you want adop to continue with other completed nodes [y/n]?

If the user says yes 'y' to the above prompt, then ADOP will proceed with the next phase and the FAILED node
in the previous phase will be marked as  ABANDONED.

If user says no 'n' to the above prompt, then ADOP will exit with error.

********  SAY “N” AND EXIT OUT OF ADOP AT THIS STAGE ************


If, for any reasons, we have said Yes for above prompt, then below steps need to be done for deleting and adding the node back, as a resolution.

High level steps:

1. Delete the abandoned node
2. Run fs_clone on the primary node
3. Restore the abandoned node

Step by step process to restore a abandoned node:

1. Deleting a node.

Case 1: If the secondary node to be deleted is accessible 

Login to the secondary node to be deleted.
Source the run file system.
Ensure that all application tier services from the run and patch file system for the node to be deleted are shut down.

Execute the ebs-delete-node option of the script as follows:
$ perl /patch/115/bin/ ebs-delete-node \
-contextfile= -logfile=

bash-4.1$ perl $AD_TOP/patch/115/bin/ ebs-delete-node 

Enter the APPS Schema password: 
Enter the WebLogic AdminServer password: 
Node deleted successfully.

This will delete the managed servers, OHS instances 
and Node Manager on the current node from the run file system WebLogic domain.

b) This is needed only if instance is on R12.TXK.C.Delta.4 or lower.

Source the patch file system.
Execute the ebs-delete-node option of the script 
from the patch file system providing the PATCH CONTEXT_FILE

Case 2: If the secondary node to be deleted is not accessible 
Login to the primary node.
Source the run file system.
Execute the ebs-delete-node option of the script as follows:

$ perl /patch/115/bin/ ebs-delete-node \
-contextfile= -hostname= -logfile= 

Sync the OHS configuration on the other nodes to remove references of the deleted node.

We dont need this step, as there is a separate cluster for each server on each machine in our environments.

3. Set s_shared_file_system on primary to false if, the deleted node is the only node other than the primary node.
Run autoconfig.

4. Run autoconfig on Db tiers.
This step is required to sync-up the tcp.invited_nodes attribute in sqlnet.ora 
to remove the deleted node from the value of this attribute. 

Relogin and bounce DB listeners.

5. Delete the INST_TOP for the deleted node,
only the  directory of the Run Edition File System and the Patch Edition File System should be deleted.

Move the directory PCZB1I_vmohsczbk012 to PCZB1I_vmohsczbk012_bkp under /pczb1i/inst/fs2/inst/apps/

Please DO NOT delete anything.

6. Run FS_CLONE on primary node.

7. Execute adpreclone Utility on the Run and Patch File System

On the run file system:
$ cd /admin/scripts
$ ./ start
$ ./ appsTier

Once the utility completes, shut down the application tier processes:
$ ./ /

On the patch file system:
$ cd /admin/scripts
$ ./ start forcepatchfs
$ ./ appsTier

Once the utility completes, shut down the application tier processes.
$ ./ / forcepatchfs

A farm is a collection of components managed by Fusion Middleware Control. 
It can contain Oracle WebLogic Server domains, one Administration Server, one or more Managed Servers, 
and the Oracle Fusion Middleware components that are installed, configured, and running in the domain.

8. Add the Secondary Application Tier Node to the Farm

8a)  Prepare the PairsFile for Configuring the Run File System 

mkdir -p /pczb1i/applmgr/pairsfile/run
mkdir -p /pczb1i/applmgr/pairsfile/patch

cd /pczb1i/applmgr/pairsfile/run
cp $INST_TOP/appl/admin/PCZB1I_vmohsczbk012_run.txt myrunpairsfile.txt

8b)Make necessary modifications to the pairsfile. 
Some of the inputs required for the Add Node API are automatically filled in. 
The sections that you need to fill in are the following.

[Instance Specific] - This is instance specific information for the node you are going to add. 
Refer to the source context file for reference.


On NOde 2
Modify the file 'myrunpairsfile.txt'

[Instance Specific]


[Services To be Enabled on the Secondary Application Tier Node]


8c) Configure the Run File System on the Secondary Node. Run below commands on secondary node providing primary node xml.

bash-4.1$ export PATH=/pczb1i/applmgr/fs2/FMW_Home/webtier/perl/bin:$PATH

bash-4.1$ /pczb1i/applmgr/fs2/FMW_Home/webtier/perl/bin/perl ./ addnode 

8d) Create the required directories and copy the pairsfile into a directory
 of your choice on the secondary application tier node

 cd /pczb1i/applmgr/pairsfile/patch
 cp /pczb1i/inst/fs1/inst/apps/PCZB1I_vmohsczbk012/appl/admin/PCZB1I_vmohsczbk012_patch.txt mypatchpairsfile.txt
  8e) Configure the Patch File System on the Secondary Node
  export PATH=/pczb1i/applmgr/fs1/FMW_Home/webtier/perl/bin:$PATH
  cd /pczb1i/applmgr/fs1/EBSapps/comn/clone/bin

 /pczb1i/applmgr/fs1/FMW_Home/webtier/perl/bin/perl ./ addnode 
 9) Check and update mod_wl_ohs.conf and apps.conf entries of managed servers.
 a) If any of these managed servers are not desired to be part of the cluster configuration on the current node,
 Run with -configoption=removeMS to delete the managed server.
 The details of these managed servers are deleted from the OHS configuration files 
 mod_wl_ohs.conf and apps.conf on the current node.
  perl /patch/115/bin/ -contextfile= \
  b) If any of the managed servers from the newly added node are desired 
       to be part of the cluster configuration on the current node,
    Run with -configoption=addMS to add the managed server.
    The details of these managed servers are added into the OHS configuration files mod_wl_ohs.conf and apps.conf 
    on the current node.
    $ perl /patch/115/bin/ -contextfile= \
 10. a) Run Autoconfig on RUN filesystem on All Application tier nodes.
     b) Reload the Apps listeners.
 11. a) Bring down Node manager on patch FS of newly added node if it's up. stop
     b) Bring down  Admin Server and the Node Manager on the Patch FS of the primary node stop stop
 12. Run Autoconfig on DB tiers and reload the listeners.