Saturday, January 12, 2013

Install Oracle Cloud Control 12C in an Active/Passive Environment for High Availability


Sharing my action plan to Install Oracle Cloud Control 12C in an Active/Passive Environment for High Availability.So that one node will have the Database running and another will have OEM applications running, they will be able to fail over to each other's node.

Database: Oracle 11gR2 [11.2.0.3]
OS      : RHEL 5.7
Machine : Two machine with atleast two shared mountpoint on a Redhat Cluster.


Overview and Requirements
-------------------------
The following conditions must be met for Cloud Control to fail over to a different host:

    1. The installation must be done using a Virtual Host Name and an associated unique IP address.
    >>> DNS and CLuster IP from OEM App, make sure that DNS associated with CLuster IP of OEM APP can be resolved both way
    
    2. Install on a shared disk/volume which holds the binaries and the gc_inst directory.
    >>> Install OEM APP on shared disk [keeping all the ORACLE_HOMEs on shared storage associated in cluster]

    3. The Inventory location must failover to the surviving node.
    >>> keeping Inventory on shared storage associated in cluster

    4. The software owner and time zone parameters must be the same on all cluster member nodes that will host this Oracle Management Service (OMS)
    >>> create users with same userID and username. MAker sure to configure NTP and same correct timezone 

Setting Up Shared Inventory
---------------------------
Use the following steps to set up shared inventory:
    1. Create your new ORACLE_BASE directory on shared storage:
      $ mkdir /oemapp/data01/orabase/
      $ chown -R oracle:oinstall /oemapp/data01/orabase/

    2. Create the Oracle Inventory directory under the new ORACLE_BASE:
      $ mkdir /oemapp/data01/orabase/oraInventory

    3. Create the oraInst.loc file in shared location
      $ vi /oemapp/data01/orabase/oraInst.loc
      ##Enter the path information to the Oracle Inventory directory and specify the group of the software owner as the oinstall user
      inventory_loc=/oemapp/data01/orabase/oraInventory
      inst_group=oinstall

Installing the Software
-----------------------
Refer to the following steps when installing the software:
   
    1. Point to the inventory location file oraInst.loc (under the ORACLE_BASE in this case), as well as specifying the host name of the virtual group(DNS form oem apps in this case)
       and install the oracle binaries on shared storage:
      $ ./runInstaller -invPtrloc /oemapp/data01/orabase/oraInst.loc ORACLE_HOSTNAME=oemapp.robi.com.bd -debug

when for putting middleware home location put the below:

/oemapp/data01/OEMAPP/middleware

user:weblogic(oem4Robi)
nodemanager(oem4Robi)

drop the repository from oemdb by following command:
emca -deconfig dbcontrol db -repos drop -SYS_PWD oem4Robi -SYSMAN_PWD oem4Robi -HOST oemdb.robi.com.bd

mkdir -p /oemdb/data01/OEMDB/data/mgmt/

run the root script on current node first and switch over and then run from node 2:

/oemapp/data01/OEMAPP/middleware/oms/allroot.sh

    2. Install Oracle Management Services on cluster member Host1(oemapp in our case).

    3. Continue the remainder of the installation normally.

    4. Once completed, copy the files oraInst.loc and oratab to /etc on all cluster member hosts (Host2, Host3, ...)


Starting Up Services
--------------------
Ensure that you start your services in the proper order. Use the order listed below:

    1. Establish the IP address on the active node.

    2. Start the TNS listener (if it is part of the same failover group).

    3. Start the database (if it is part of the same failover group).

    4. Start Cloud Control using emctl start oms

    5. Test functionality


Configuring the Software Library
--------------------------------

The software library location must be accessed by all Management Services. The configuration of software library is not performed during installation and needs to be configured post-install using the Enterprise Manager Console:

    1. On the Enterprise Manager home page, from the Setup menu, select Provisioning and Patching, and then select Software Library.

    2. Click the Provisioning subtab.

    3. On the Provisioning page, click the Administration subtab.

    4. In the Software Library Configuration section, click Add to set the Software Library Directory Location to a shared storage that can be accessed by any host running the Management Service.
    
/oemapp/data01/OEMAPP/middleware/oms/bin/emctl start oms

This information is also available at: 

/oemapp/data01/OEMAPP/middleware/oms/install/setupinfo.txt

See below for information pertaining to your Enterprise Manager installation:


Use the following URL to access:
--------------------------------
1. Enterprise Manager Cloud Control URL: https://oemapp.robi.com.bd:7799/em
2. Admin Server URL: https://oemapp.robi.com.bd:7101/console

The following details need to be provided during the additional OMS install:

1. Admin Server Hostname: oemapp.robi.com.bd
2. Admin Server Port: 7101

 NOTE:
 An encryption key has been generated to encrypt sensitive data in the Management Repository. If this key is lost, all encrypted data in the Repository becomes unusable. Please run following command to backup the Management Service configuration including the emkey and keep the backup on another system:
 emctl exportconfig oms -dir  


http://docs.oracle.com/cd/E24628_01/doc.121/e24473/ha_setup.htm#CEGCCCDE

DBFS on Exadata, Normal Oracle Database and on Client


I prepared below action plan while installing DBFS on both exadata and on an external ETL machine to mount DBFS from exadata.


-- ---------------------------
-- serverside [if not exadata]
-- ---------------------------
1. Check to see if the "kernel-devel" package is installed.
#rpm -q kernel-devel
kernel-devel-2.6.18-128.el5
#If not, then install it

2. Determine the kernel directory.
# echo /usr/src/kernels/`uname -r`-`uname -p`
/usr/src/kernels/2.6.18-128.el5-x86_64

3. Install the FUSE package as the "root" user using the following commands, substituting your kernel 

directory.
--There may be WARNING regarding "--with-kernel",  just ignore it
--download fuse-2.7.3.tar.gz from "http://fuse.sourceforge.net/"
# tar -xzvf fuse-2.7.3.tar.gz
# cd fuse-2.7.3
# ./configure --prefix=/usr --with-kernel=/usr/src/kernels/2.6.18-128.el5-x86_64
# make
# make install
# /sbin/depmod
# /sbin/modprobe fuse
# chmod 666 /dev/fuse
# echo "/sbin/modprobe fuse" >> /etc/rc.modules
# chmod 744 /etc/rc.modules

4. install below rpms
fuse-libs-2.7.4-8.el5
fuse-devel-2.7.4-8.el5
fuse-2.7.4-8.el5

5. create DBFS tablespace
SQL> create bigfile tablespace tbs_dbfs datafile '+DATA2' size 200G autoextend off;

6. create DBFS user
SQL> create user dbfs_user identified by dbfs_passwd default tablespace tbs_dbfs quota unlimited on tbs_dbfs;
SQL> grant create session, create table, create procedure, dbfs_role to dbfs_user;

7. create the file system in tablespace 
cd $ORACLE_HOME/rdbms/admin
sqlplus dbfs_user/dbfs_passwd@DB

SQL> @dbfs_create_filesystem.sql tbs_dbfs my_dbfs  
--tbs_dbfs= DBFS TABLESPACE, my_dbfs=USER GIVEN NAME OF FILESYSTEM
[
-- for advanced setting use below command 
SQL> start dbfs_create_filesystem_advanced tbs_dbfs DBFS nocompress nodeduplicate noencrypt non-partition
]

[
--to drop a file system, this can be done using the "dbfs_drop_filesystem.sql" script with the file system 

name.
cd $ORACLE_HOME/rdbms/admin
sqlplus test/test

SQL> @dbfs_drop_filesystem.sql my_dbfs
--my_dbfs=USER GIVEN NAME OF FILESYSTEM which created in previous step
]

8. add oracle to fuse group
# groupadd fuse
# usermod -a -G fuse oracle

9. create dbfs mount point 
# mkdir /data01/DBFS_DIR
# chown oracle:oinstall /data01/DBFS_DIR

10. to let "dbfs os user(oracle)" provide access to other and root
# echo user_allow_other > /etc/fuse.conf -- to let "dbfs os user(oracle)" provide access to other and root 

11. using oracle user collect group id of oinstall group in the DB server 
$cat /etc/group|grep oinstall|cut -d: -f3 

12. using oracle user create a password file
$ echo dbfs_passwd > $HOME/passwd.txt
chmod 600 $HOME/passwd.txt #only oracle 

13. mount DBFS for only mounter
export ORACLE_HOME=;export TNS_ADMIN=;export 

LD_LIBRARY_PATH=$ORACLE_HOME/lib;$ORACLE_HOME/bin/dbfs_client dbfs_user@DEVDW  -o direct_io /data01/DBFS_DIR < 

$HOME/passwd.txt &
--or
-- for root also
export ORACLE_HOME=;export TNS_ADMIN=;export 

LD_LIBRARY_PATH=$ORACLE_HOME/lib;$ORACLE_HOME/bin/dbfs_client dbfs_user@DEVDW -o allow_root  -o direct_io 

/data01/DBFS_DIR < $HOME/passwd.txt &

14. unmount
fusermount -u /data01/DBFS_DIR

-- -------------------
-- IF DB is on Exadata 
-- -------------------

1. OS level COnfig
# usermod -a -G fuse oracle
# mkdir /u01/app/DBFS_DIR
# chown oracle:oinstall /u01/app/DBFS_DIR
# echo user_allow_other > /etc/fuse.conf

2. Create tablespace and user
create bigfile tablespace tbs_dbfs datafile '+DATA2' size 200G autoextend off;
create user dbfs_user identified by dbfs_passwd default tablespace tbs_dbfs quota unlimited on tbs_dbfs;
grant create session, create table, create procedure, dbfs_role to dbfs_user;
start dbfs_create_filesystem_advanced tbs_dbfs DBFS nocompress nodeduplicate noencrypt non-partition

3. create a file with password
$echo dbfs_passwd > $ORACLE_HOME/passwd.txt
#chown 700 /u01/app/oracle/product/11.2.0/dbhome_1/passwd.txt

4. Create mount script
export 

ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1;TNS_ADMIN=/u01/app/oracle/product/11.2.0/dbhome_1/network/

admin;export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/dbhome_1/lib;nohup 

/u01/app/oracle/product/11.2.0/dbhome_1/bin/dbfs_client dbfs_user@DEVDW -o allow_other,direct_io 

/u01/app/DBFS_DIR < /u01/app/oracle/product/11.2.0/dbhome_1/passwd.txt &
-- for root also
export 

ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1;TNS_ADMIN=/u01/app/oracle/product/11.2.0/dbhome_1/network/

admin;export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/dbhome_1/lib;
nohup /u01/app/oracle/product/11.2.0/dbhome_1/bin/dbfs_client dbfs_user@DEVDW -o allow_root,direct_io 

/u01/app/DBFS_DIR < /u01/app/oracle/product/11.2.0/dbhome_1/passwd.txt &

5. command for unlount
fusermount -u /u01/app/DBFS_DIR


-- -----------
-- client side
-- -----------


1. Check to see if the "kernel-devel" package is installed.
#rpm -q kernel-devel
kernel-devel-2.6.18-128.el5
#If not, then install it

2. Determine the kernel directory.
# echo /usr/src/kernels/`uname -r`-`uname -p`
/usr/src/kernels/2.6.18-128.el5-x86_64

3. Install the FUSE package as the "root" user using the following commands, substituting your kernel 

directory.
--There may be WARNING regarding "--with-kernel",  just ignore it
# tar -xzvf fuse-2.7.3.tar.gz
# cd fuse-2.7.3
# ./configure --prefix=/usr --with-kernel=/usr/src/kernels/2.6.18-128.el5-x86_64
# make
# make install
# /sbin/depmod
# /sbin/modprobe fuse
# chmod 666 /dev/fuse
# echo "/sbin/modprobe fuse" >> /etc/rc.modules
# chmod 744 /etc/rc.modules

4. install below rpms
fuse-libs-2.7.4-8.el5
fuse-devel-2.7.4-8.el5
fuse-2.7.4-8.el5

5. permit other user to execute dbfs_client
#chmod 755 <$ORACLE_HOME>/bin/dbfs_client

6. permit other user to read tnsnames.ora
#chmod 755 <$ORACLE_HOME>/network/admin/tnsnames.ora

7. create OS dbfsuser and group
#groupadd -g 101 fuse [with same ID as server]
#create user dbfs_user -g fuse

8. create DBFS mount point
#mkdir /data01/DBFS_DIR
#chown oracle:fuse /data01/DBFS_DIR
#chmod 775 /data01/DBFS_DIR

9. create password file for dbfs  
#echo dbfs_passwd > $HOME/passwd.txt
#chown oracle:fuse $HOME/passwd.txt
#chmod 640 $HOME/passwd.txt

10. mount DBFS for only mounter
export ORACLE_HOME=;export TNS_ADMIN=;export 

LD_LIBRARY_PATH=$ORACLE_HOME/lib;$ORACLE_HOME/bin/dbfs_client dbfs_user@DEVDW  -o direct_io /data01/DBFS_DIR < 

$HOME/passwd.txt &
--or
-- for root also
export ORACLE_HOME=;export TNS_ADMIN=;export 

LD_LIBRARY_PATH=$ORACLE_HOME/lib;$ORACLE_HOME/bin/dbfs_client dbfs_user@DEVDW -o allow_root  -o direct_io 

/data01/DBFS_DIR < $HOME/passwd.txt &

11. with DBFS mounted create directory fully accessible by both client user & server group
#mkdir /data01/DBFS_DIR//EXT_DIR/
#c€hown dbfs_user: /data01/DBFS_DIR/EXT_DIR  # here 

owner=local user, group = DBFS DB oinstall group

12. unmount
fusermount -u /data01/DBFS_DIR



Rename/Change Location Datafile in ASM


To renamee/change location a datafile in ASM I followed below steps:

*** Considering database in archivelog mode

1. File the target file to rename.
select * from dba_data_files where tablespace_name='TBS_DW' order by file_name desc;

2. Put the file to offline from RMAN
SQL> ALTER DATABASE DATAFILE '+SYSTEMDG/oradb/datafile/tbs_dw.276.780864751' OFFLINE;

3. Execute copy from RAMN to the new location
RMAN> 
run{
allocate channel c1 device type disk;
COPY DATAFILE '+SYSTEMDG/oradb/datafile/tbs_dw.276.780864751' TO '+DATA1';
release channel c1;
}
---------- OUTPUT ---------
Starting backup at 17-APR-12
channel c1: starting datafile copy
input datafile file number=00631 name=+SYSTEMDG/oradb/datafile/tbs_dw.276.780864751
output file name=+DATA1/oradb/datafile/tbs_dw.915.780866609 tag=TAG20120417T192328 RECID=3 STAMP=780870208 --<<< use name of output file in the next rename stage
channel c1: datafile copy complete, elapsed time: 00:00:25
Finished backup at 17-APR-12
---------------------------

4. Rename the file from RMAN
RMAN> run {
allocate channel c1 device type disk;
set newname for datafile '+SYSTEMDG/oradb/datafile/tbs_dw.276.780864751' to '+DATA1/oradb/datafile/tbs_dw.915.780866609';
switch datafile all;
release channel c1;

---------- OUTPUT ---------
SQL> RECOVER DATAFILE '+DATA1/oradb/datafile/tbs_dw.915.780866609';
ORA-00279: change 3110310780083 generated at 04/17/2012 19:42:29 needed for
thread 8
ORA-00289: suggestion :
+RECO/oradb/archivelog/2012_04_17/thread_8_seq_692.1501.780864785
ORA-00280: change 3110310780083 for thread 8 is in sequence #692

Specify log: {=suggested | filename | AUTO | CANCEL}  -----<<< just enter
+RECO/oradb/archivelog/2012_04_17/thread_8_seq_692.1501.780864785
---------------------------

5. make the copied file online
SQL> ALTER DATABASE DATAFILE '+DATA1/oradb/datafile/tbs_dw.915.780866609' ONLINE;

6. check the file names again.
select * from dba_data_files where tablespace_name='TBS_DW' order by file_name desc;

7. drop using the old name
ASM: SQL> ALTER DISKGROUP ASMDSK2 DROP FILE tbs_dw.276.780864751;

Exadata Storage Cell Disk Change


I follow below steps while changing and faulty disk on Exadata. We are using Exadata storage SW version 11.2.2.4.0.

1. You have identified following disk to be replaced:
       
$ cellcli -e list physicaldisk detail
         name:                   18:4
         deviceId:               10
         diskType:               HardDisk
         enclosureDeviceId:      18
         errMediaCount:          471
         errOtherCount:          17
         foreignState:           false
         luns:                   0_4
         makeModel:              "SEAGATE ST360057SSUN600G"
         physicalFirmware:       0805
         physicalInsertTime:     2010-03-26T05:05:46+06:00
         physicalInterface:      sas
         physicalSerial:         E08WD8
         physicalSize:           558.9109999993816G
         slotNumber:             4
         status:                 critical

2. You need to make sure that there is no other disks are offline/missing in other cells(failgroup) :

select count(*),failgroup from gv$asm_disk group by failgroup;


3. Check ASM_POWER_LIMIT and if it is 0 then set it to non zero value so that after the disk replacement rebalance can happen and the newly added disks will have data written on them.
This parameter can be set to any value between 0 to 11.
--on asm instance
alter system set asm_power_limit=10;

4. Make sure there are no rebalance operation running on any instance on any node:

select * from gv$asm_operation;

5. Identify the ASM disks which are existing on the physical disk to be replaced:

a. On the cell node execute following:

- Find the lun number "0_11" in the list physicaldisk detail output
# cellcli -e list physicaldisk detail
.....
luns: 0_4
.....

- Find the celldisk created for this lun (physicaldisk)
# cellcli -e list celldisk where lun=0_4
         CD_04_axdwcel06         not present

- Find the griddisks created for this celldisk
# cellcli -e list griddisk where celldisk=CD_04_axdwcel06
         DATA1_CD_04_axdwcel06           not present
         DATA2_CD_04_axdwcel06           not present
         RECO_CD_04_axdwcel06            not present
         SYSTEMDG_CD_04_axdwcel06        not present



b. Execute following on any db node connecting to the ASM instance:

SQL> select name, substr(path,1,50) , mount_status, header_status, state from gv$asm_disk where name in ('DATA1_CD_04_AXDWCEL06','DATA2_CD_04_AXDWCEL06','SYSTEMDG_CD_04_AXDWCEL06','RECO_CD_04_AXDWCEL06');

NAME SUBSTR           (PATH,1,50)                        MOUNT_S HEADER_STA STATE
------------------------------ ----------------------------------------   ------
DATA_CD_11_DMORLCEL02 o/192.168.10.6/DATA_CD_11_dmorlcel02 CACHED  MEMBER  NORMAL
RECO_CD_11_DMORLCEL02 o/192.168.10.6/RECO_CD_11_dmorlcel02 CACHED  MEMBER  NORMAL



6. Now drop the disk from the diskgroup(s):

alter diskgroup DATA1 drop disk 'DATA1_CD_04_AXDWCEL06';
alter diskgroup DATA2 drop disk 'DATA2_CD_04_AXDWCEL06';
alter diskgroup SYSTEMDG drop disk 'SYSTEMDG_CD_04_AXDWCEL06';
alter diskgroup RECO drop disk 'RECO_CD_04_AXDWCEL06';

7. Wait until the Oracle ASM disks associated with the grid disks on the bad disk have been successfully dropped by querying the V$ASM_DISK_STAT view.

V$ASM_DISK_STAT displays disk statistics in the same way that V$ASM_DISK does, but without performing discovery of new disks. This results in a less expensive operation. However, since discovery is not performed, the output of this view does not include any data about disks that are new to the system.


8. Mark this disk at the hardware level to be serviced:

cellcli -e 'alter physicaldisk 18:4 serviceled on'

Here disk_name should be replaced by the name of the physical disk '18:4'.

This step will help in identifying the disk to be replaced at the h/w level. Please note that this command may  not work in version less than 11.2.2.2. but it is not mandatory. it just helps in identifying the disk to be replaced.

9. Now replace the disk at the h/w level. The service LED will glow for this disk as done in the previous step.

10. After replacing the disk Exadata s/w will automatically create celldisk, griddisks. You can tail the cell alert.log and ASM alert.log to see this activity. But the disks will not be added back to the ASM as we did not use FORCE option while dropping them. So we need to add them manually:

alter diskgroup DATA1 add  disk 'o/10.10.0.10/DATA1_CD_04_AXDWCEL06';
alter diskgroup DATA2 add  disk 'o/10.10.0.10/DATA2_CD_04_AXDWCEL06';
alter diskgroup RECO add  disk 'o/10.10.0.10/RECO_CD_04_AXDWCEL06';
alter diskgroup SYSTEMDG add  disk 'o/10.10.0.10/SYSTEMDG_CD_04_AXDWCEL06';

-- Location of the cell alert.log file:
/opt/oracle/cell11.2.2.4.0_LINUX.X64_110929/log/diag/asm/cell/axdwcel06/trace/alert.log

-- Location of the ASM alert.log
/u01/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log


11. Check until the rebalance completes using following sql:
select * from gv$asm_operation;

12. Once the rebalance is over check the status of the ASM disks again and verify that everything is fine:

SQL> select name, substr(path,1,50) , mount_status, header_status, state from gv$asm_disk where name in ('DATA1_CD_04_AXDWCEL06','DATA2_CD_04_AXDWCEL06','SYSTEMDG_CD_04_AXDWCEL06','RECO_CD_04_AXDWCEL06');

select * from gv$asm_disk

13. Verify all grid disks have been successfully put online using the following command from the cell where disk change is done:
    (Wait until asmmodestatus is ONLINE for all grid disks
      and Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.)
     cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

14. also check the alert history from the cell where disk change is done
cellcli -e list alerthistory

Shutting Down Exadata Storage Cell


Normally I follow below steps while shutting down an Exadata Storage Cell for maintenance purpose. (We were using Exadata storage SW version 11.2.2.4.0.) 


1. check disk repair time from ASM instance and ajust it if required.
select dg.name, a.value
  from v$asm_diskgroup dg, v$asm_attribute a
 where dg.group_number = a.group_number
   and a.name = 'disk_repair_time';

2. from both ASM & DB instance, make sure that all the DISKs are online

select header_status,mode_status,state,failgroup,count(*) from gv$asm_disk group by header_status,mode_status,state,failgroup;

3. from ASM instanvce wait until below query returns zero rows
select * from gv$asm_operation;

4. Next you will need to check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for the grid disks being listed:
   cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome 


5. inactivated all griddisks
cellcli -e ALTER GRIDDISK ALL INACTIVE

6. Confirm that the griddisks are now offline by performing the following actions:
   a. Execute the command below and the output should show asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM. Only then is it safe to proceed with shutting down or restarting the cell:
      cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
   b. List the griddisk to confirm that all show inactive:
      cellcli -e LIST GRIDDISK    

7. from both ASM & DB instance, make sure that all the DISKs are online

select header_status,mode_status,state,failgroup,count(*) from gv$asm_disk group by header_status,mode_status,state,failgroup;     

8. The following command will reboot Oracle Exadata Storage Server immediately:
   (When powering off Oracle Exadata Storage Servers, all storage services are automatically stopped.)
        # shutdown -y now   
        sync 
        sync
        init 0   

9. check validation log's last 20 lines all should bew PASSED or NOHUP RUN or BACKGROUND RUN
tail -20 /var/log/cellos/validations.log

10. Once the cell comes back online - you will need to reactive the griddisks:

        cellcli -e alter griddisk all active

11. Issue the command below and all disks should show 'active':

        cellcli -e list griddisk

12. Verify all grid disks have been successfully put online using the following command:
    (Wait until asmmodestatus is ONLINE for all grid disks 
      and Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.)
     cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

13. from both ASM & DB instance, make sure that all the DISKs are online

select header_status,mode_status,state,failgroup,count(*) from gv$asm_disk group by header_status,mode_status,state,failgroup;

14. Next you will need to check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for the grid disks being listed:
   cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome 
-- >>> check that all the grid disks are online

15. from ASM instanvce wait until below query returns zero rows
select * from gv$asm_operation;

16. check cell alert for any uncleared critical alerts
cellcli -e list alerthistory

17. check below from both DB & ASM 
select * from gv$instance;

Shutting Down a Exadata DB Node [same on normal RAC]


While we need to shutdown any of exadata(same for any RAC DB) DB Nodes I normally follow below checklist. 

1. from DB & ASM node run below query and cahek all the nodes are in open staus
select * from gv$instance;
2. login to the target DB node with oracle user and set .oracrs.sh env

3. now shutdown the instance on the target node using below command
/u01/app/11.2.0/grid/bin/srvctl stop instance -d ORADB -i

4. FROM DB node run belwo quert to be sure that target instance is now down
select * from gv$instance;

5. now login as root and run below command to shudown CRS
/u01/app/11.2.0/grid/bin/crsctl stop crs

-- from DB node run below command and make sure that crs is down [CRS-0184]
/u01/app/11.2.0/grid/bin/crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

6. FROM ASM run below query to make sure that ASM on tgaret hots is down
select * from gv$instance;

7. shutdown the DB machine

8. startup the DB machine

9. check with below command the all the cluster services are online except .gsd and .oc4j
/u01/app/11.2.0/grid/bin/crs_stat -t|grep OFFLINE
--from DB node with oracle user and .oracrs.sh env run below command to make sure that all the ASM disks are online
asmcmd
ASMCMD> ls

. oracrs.sh
sqlplus / as sysasm

10. if any disk is missing the from ASM instance of the target machine run below command to make the perticular disk online
alter diskgroup mount;

11. chek that DB instance is online or not if not automatically online use below command to make it online
srvctl start -d axdw -i

12. from ASM & DB instance user below query to check all the ASM and DB instances are online
select * from gv$instance;

13. from ASM & DB instance user below query to check all the ASM DISKS are online from all the instance
select header_status,mode_status,state,failgroup,count(*) from gv$asm_disk group by header_status,mode_status,state,failgroup;

14. check alertlog of both db & asm for the target node

Oracle 11gR2 RAC on HPUX Itanium 11.31


This was my RAC installation on around september last year on HPUX. 
Here I am sharing my action plan as it is I prepared on notepad.

Oracle Version : 11.2.0.3
OS : HPUX Itanium 11.31

1. configure NTP

2. configure DNS using SAM

3 . Create OS groups using the command below. Enter these commands as the 'root' user:
/usr/sbin/groupadd -g 503 oinstall
/usr/sbin/groupadd -g 504 dba
/usr/sbin/groupadd -g 505 asmadmin
/usr/sbin/groupadd -g 506 asmdba
/usr/sbin/groupadd -g 507 asmoper

4. Create the users that will own the Oracle software using the commands:

*** please use "/usr/bin/ksh" shell for oracle user
#/usr/sbin/useradd -u 503 -s /usr/bin/ksh -g oinstall -G dba,asmadmin,asmdba,asmoper -p /home/oracle oracle

5. Set the password for the oracle account. [Oracle6!7]

6. Sample /etc/hosts file will be as below: "**also configure DNS with reverse resolve":
127.0.0.1 localhost.localdomain localhost #dont add anyhing extra in this line
#lan1 - PUBLIC
10.101.4.57  ORAdb1.robi.com.bd ORAdb1
10.101.4.58  ORAdb2.robi.com.bd ORAdb2
#VIP
10.101.4.59  ORAdb1-vip.robi.com.bd ORAdb1-vip
10.101.4.60  ORAdb2-vip.robi.com.bd ORAdb2-vip
#lan3 - PRIVATE
172.16.4.57 ORAdb1-pvt
172.16.4.58 ORAdb2-pvt


7. Kernel parameters: (can also be fixed during installation using runfixup.sh)
ksi_alloc_max 32768
executable_stack 0
ksi_alloc_max 32768
max_thread_proc 1024
maxdsiz 1073741824 (1 GB)
maxdsiz_64bit 2147483648 (2 GB)
maxfiles 16384
maxfiles_lim 65536
maxssiz 134217728 (128 MB)
maxssiz_64bit 1073741824 (1 GB)
maxuprc 3686
msgmni 4096
msgtql 4096
ncsize 35840
nflocks 4096
ninode 34816
nkthread 7184
nproc 4096
semmni 4096
semmns 8192
semmnu 4096
semvmx 32767
shmmax 1073741824
shmmni 4096
shmseg 512
tcp_smallest_anon_port 9000
tcp_largest_anon_port 65500
udp_smallest_anon_port 9000
udp_largest_anon_port 65500


8. install below packages or upper versions:
Operating system: 
  HP-UX 11iV3 patch Bundle Sep/ 2008 (B.11.31.0809.326a)
HP-UX 11.31 packages and bundles:
  PHCO_41479 (or later) 11.31 diskowner(1M) cumulative patch
  PHKL_38038 vm cumulative patch
  PHKL_38938 11.31 SCSI cumulative I/O patch 
  PHKL_40941  Scheduler patch : post wait hang 
  PHSS_36354 11.31 assembler patch
  PHSS_37042  11.31 hppac (packed decimal)
  PHSS_37959  Libcl patch for alternate stack issue fix (QXCR1000818011)
  PHSS_39094 11.31 linker + fdp cumulative patch
  PHSS_39100 11.31 Math Library Cumulative Patch
  PHSS_39102 11.31 Integrity Unwind Library
  PHSS_38141 11.31 aC++ Runtime
  
To avoid the node evictions:  
  PHKL_40208
  PHKL_40372
  
HP-UX Logical Volume Manager (LVM): 
  PHCO_41479 (or later) 11.31 
C / C++ compiler: 
  HP C/aC++ A.06.20 (Swlist Bundle - C.11.31.04) - September 2008
C Compiler Patches:
  PHSS_39824 11.31 HP C/aC++ Compiler (A.06.23)
  PHSS_39826 11.31 u2comp/be/plug-in (C.06.23)
Gcc Compiler:
  Gcc 4.2.3
  X11MotifDevKit.MOTIF21-PRG
Oracle JDBC/OCI Drivers:
HPUX JDK 6.0.05
  HPUX JDK 5.0.15
Oracle ODBC:
  You can download and install the Driver Manager from the following URL:http://www.unixodbc.org
IBM MQ Series V. 6.0, client and server (optional):
  MQSERIES.MQM-CL-HPUX
  MQSERIES.MQM-SERVER
Motif 2.1 Development Environment package (optional):  
  TIBCO Rendezvous 7.2  
Perl:
Perl 5.8.8
Programming Software (optional):
  Pro*Cobol - Micro Focus Server Express 5.1
  Pro*FORTRAN - HP FORTRAN/90 - Sep 2008 release
SSH: 
  The required SSH software is the default SSH shipped with your operating system.


10 Creating Required Symbolic Links
This task is required only if the Motif 2.1 Development Environment package (X11MotifDevKit.MOTIF21-PRG) is not installed.
# cd /usr/lib
ln -s libX11.3 libX11.sl
ln -s libXIE.2 libXIE.sl
ln -s libXext.3 libXext.sl
ln -s libXhp11.3 libXhp11.sl
ln -s libXi.3 libXi.sl
ln -s libXm.4 libXm.sl
ln -s libXp.2 libXp.sl
ln -s libXt.3 libXt.sl
ln -s libXtst.2 libXtst.sl

11. Create the Oracle Inventory Directory
mkdir -p /u01/app/oraInventory
chown -R oracle:oinstall /u01/app/oraInventory
chmod -R 775 /u01/app/oraInventory

12. Creating the Oracle Grid Infrastructure Home Directory
mkdir -p /u01/app/11.2.0/grid
chown -R oracle:oinstall /u01/app/11.2.0/grid
chmod -R 775 /u01/app/11.2.0/grid

vi oracrs.sh
export ORACLE_SID=+ASM1/+ASM2
export ORACLE_HOME=/u01/app/11.2.0/grid
export PATH=$ORACLE_HOME/bin:$PATH

13. Creating the Oracle Base Directory
mkdir -p /u01/app/oracle/product/11.2.0/dbhome_1
chown -R oracle:oinstall /u01/app/oracle/
chmod -R 775 /u01/app

vi oradb.sh
export ORACLE_SID=ORADB1/ORADB2
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
export PATH=$ORACLE_HOME/bin:$PATH

su - oracle
vi .profile and append below on both node:
export ORACLE_HOSTNAME=ORAdb1.robi.com.bd

14. ASM DIsk Group sizing
DATA01     200G
SYSTEMDG   100G
RECO       100G


15. Configuring Disk Devices for Oracle ASM    
    
    a. To make sure that the disks are available, enter the following command on every node:
       # ioscan -fNnC disk
              
    b. If the ioscan command does not display device name information for a device, enter the following command to install the special device files for any new devices:

       # /usr/sbin/insf -e
    c. For each disk to add to a disk group, enter the following command on any node to verify that it is not already part of an LVM volume group:

       # /sbin/pvdisplay /dev/dsk/cxtydz

       If this command displays volume group information, the disk is already part of a volume group. 
       The disks that you choose must not be part of an LVM volume group.

    d. The disk names for the same disk can be different on each node. A disk can be identified as the same
      one via the WWID. The WWID of a disk can be checked via the following command:
        # scsimgr lun_map -D /dev/rdisk/disk29 | grep WWID

        ORADB1:/home/root #scsimgr lun_map -D /dev/rdisk/disk29 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b81efc7444e1e111
        ORADB2:/home/root #scsimgr lun_map -D /dev/rdisk/disk30 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b81efc7444e1e111

        ORADB1:/home/root #scsimgr lun_map -D /dev/rdisk/disk30 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b21efc7444e1e111
        ORADB2:/home/root #scsimgr lun_map -D /dev/rdisk/disk29 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b21efc7444e1e111

        ORADB1:/home/root #scsimgr lun_map -D /dev/rdisk/disk31 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b61efc7444e1e111
        ORADB2:/home/root #scsimgr lun_map -D /dev/rdisk/disk31 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00b61efc7444e1e111

        ORADB1:/home/root #scsimgr lun_map -D /dev/rdisk/disk32 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00ba1efc7444e1e111
        ORADB2:/home/root #scsimgr lun_map -D /dev/rdisk/disk32 | grep WWID
        World Wide Identifier(WWID)    = 0x6006016001102d00ba1efc7444e1e111

        
    e. create a special Oracle device directory and to use mknod to create device paths in this special Oracle folder
       # mkdir /dev/oracle
       # ll /dev/rdisk/disk29
       crw-r----- 1 bin sys 23 0x000019 Jan 16 12:16 /dev/dsk/cxtydz

       # mknod /dev/oracle/asmdisk1 c 23 0x000019
        
        on ORAdb1
        ---------
        #ll /dev/rdisk/disk29
        #mknod /dev/oracle/asm_systemdg_01 c 13 0x000013
        #ll /dev/rdisk/disk30
        #mknod /dev/oracle/asm_data01_01 c 13 0x000014
        #ll /dev/rdisk/disk31
        #mknod /dev/oracle/asm_data01_02 c 13 0x000015
        #ll /dev/rdisk/disk32
        #mknod /dev/oracle/asm_reco_01 c 13 0x000016

       Later during set-up, for the ASM instance, set the ASM_DISKSTRING parameter to /dev/oracle/*  #*/
    f. Modify the owner, group, and permissions on the character raw device files (ASM and Voting Disks) on all nodes:
       # chown -R oracle:asmadmin /dev/oracle                      #*/
       # chmod -R 660 /dev/oracle                                  #*/
  

16. Setting the Minor Number for Device Files

    a. Log in as the root user.

    b.Determine whether /dev/async exists. If the device does not exist, then use the following command to create it:

    # /sbin/mknod /dev/async c 101 0x4

    c. Alternatively, you can set the minor number value to 0x104 using the following command:

    # /sbin/mknod /dev/async c 101 0x104

    d. If /dev/async exists, then determine the current value of the minor number, as shown in the following example:

       #ll  -l /dev/async
       crw-rw-rw-   1 bin        bin        101 0x000000 Jul 26 15:45 /dev/async

    e. If the existing minor number of the file is not 0x4 or 0x104, then change it to an expected value using one of the following commands:
    # rm /dev/async
    # mknod /dev/async c 101 0x4
    or
    # mknod /dev/async c 101 0x104

       
17. Oracle Grid Infrastructure Install
a. As the grid user (Grid Infrastructure software owner) start the installer by running "runInstaller" from the staged installation media
b. Choose Skip Software Updates
c. Select radio button 'Install and Configure Grid Infrastructure for a Cluster' and click ' Next> '
d. Select radio button 'Advanced Installation' and click ' Next> 
e. Accept 'English' as language' and click  Next> 
f. Specify your cluster name and the SCAN name you want to use and click  Next> 
g. Uncheck Configure GNS
h. Use the Edit and Add buttons to specify the node names and virtual IP addresses you configured previously in your /etc/hosts file. 
   Use the 'SSH Connectivity' button to configure/test the passwordless SSH connectivity between your nodes.
   *** some times need to configure ssh manualy
i. Type in the OS password for the user 'oracle' and press 'Setup' and test and ok 'Next>'
j. Click on 'Interface Type' next to the Interfaces you want to use for your cluster and select the correct values for 'Public', 'Private' and 'Do Not Use' 
   When finished click ' Next> ' [we can use multiple NIC as private or public]
k. Select radio button Automatic Storage Management (ASM) and click ' Next> '
l. Select the 'DiskGroup Name' specify the 'Redundancy' and tick the disks you want to use,[if candidate disks are not shown then check permission for the /dev... folders] when done click ' Next> '
  [if no candidate disks found, then change discovery path to /dev/oracleasm/disks* or ORCL:*]
  NOTE: The number of voting disks that will be created depend on the redundancy level you specify:
  EXTERNAL will create 1 voting disk, NORMAL will create 3 voting disks, HIGH will create 5 voting disks   
  *** if disk are ok on both node and it is sure that "/dev/oracle/.." names are pointing to the same disk device hen ignore "PRVF-5150" error
m. Specify and conform the password you want to use and click ' Next> '
n. Select NOT to use IPMI and click ' Next> '
o. Assign groups as below:
   Oracle ASM DBA: asmdba
   Oracle ASM Operator: asmoper
   Oracle ASM Administrator: asmadmin
p. Specify the locations for your ORACLE_BASE and for the Software location and click ' Next> '
q. Specify the locations for your Inventory directory and click ' Next>'
r. Check that status of all checks is Succeeded and click ' Next> '
s. Wait for the OUI to complete its tasks
t. At this point you may need to run oraInstRoot.sh on all cluster nodes (if this is the first installation of an Oracle product on this system)
   Run them on the first node alone then go for other nodes
   Then press ok
   NOTE: root.sh should be run on one node at a time.
   if root.sh fails, then resolve it and run again
u. Wait for the OUI to finish the cluster configuration.

check OCR,Voting & OLR :
ocrcheck
crsctl query css votedisk
grid_home/cdata/.olr 


18. RDBMS Software Install   

a. #su - oracle
change into the directory where you staged the RDBMS software
./runInstaller

b. Provide your e-mail address, tick the check box and provide your Oracle Support Password 
   if you want to receive Security Updates from Oracle Support and click ' Next> ' 
c. Select the option 'Install Database software only' and click ' Next> '
d. Select 'Real Application Clusters database installation', and select all nodes. 
   Use the 'SSH Connectivity' button to configure/test the passwordless SSH connectivity between your nodes 
e. Type in the OS password for the oracle user and click 'Setup'   
f. To confirm English as selected language click ' Next> '
g. Make sure radio button 'Enterprise Edition'/'Standard Edition' is ticked, click ' Next> '
h. Specify path to your Oracle Base and below to the location where you want to store the software (Oraclehome). Click ' Next> '
i. Specify groups:
   Database Administrator: dba
   Database Operator:oinstall
j. Oracle Universal Installer performs prerequisite checks.
k. Check that the status of all checks is 'Succeeded' and click ' Next> '
l. Log in to a terminal window as root user and run the root.sh script on the first node. 
   When finished do the same for all other nodes in your cluster as well. When finished click 'OK'
   NOTE: root.sh should be run on one node at a time.
m. Click ' Close ' to finish the installation of the RDBMS Software.

19. Run ASMCA to create diskgroups

a. #su - grid
   cd /u01/11.2.0/grid/bin
   ./asmca
b. Click 'Create' to create a new diskgroup
c. Type in a name for the diskgroup, select the redundancy you want to provide and mark the tick box for the disks you want to assign to the new diskgroup   
d. Click 'OK'
e. Click 'Create' to create the diskgroup for the flash recovery area
f. Type in a name for the diskgroup, select the redundancy you want to provide and mark the tick box for the disks you want to assign to the new diskgroup.
g. Click 'OK'
h. Click 'Exit'


20. It is Oracles Best Practice to have an OCR mirror stored in a second diskgroup. 
    To follow this recommendation add an OCR mirror. Mind that you can only have one OCR in a diskgroup.
Action:
a. To add OCR mirror to an Oracle ASM diskgroup, ensure that the Oracle Clusterware stack is running and
run the following command as root from the GridInfrastructureHome? /bin directory:
b. # ocrconfig -add +DATA1
c. # ocrcheck

21. Run DBCA to create the database
a. #su - oracle
   cd /u01/app/oracle/product/11.2.0/db_1/bin
   ./dbca
b. Select 'Oracle Real Application Clusters database' and click 'Next'
c. choose option 'Create a Database' and click 'Next'
d. Select the database template that you want to use for your database and click 'Next'
e. Type in the name you want to use for your database and select all nodes before you click 'Next'
f. select the options you want to use to manage your database and click 'Next'
g. Type in the passwords you want to use and click 'Next'
h. Select the diskgroup you created for the database files and click 'Multiplex Redo Logs and Control Files'. 
   In the popup window define the diskgroup that should contain controlfiles and 
   redo logfiles and the diskgroup that should contain the mirrored files.   
   When all file destinations are correct click 'Next'
i. Specify the diskgroup that was created for the flash recovery area and define the size. 
   If the size is smaller than recommended a warning will popup.   
j. Select if you want to have sample schemas created in your database and click 'Next'
k. Review and change the settings for memory allocation, characterset etc. according to your needs and click 'Next'
l. Review the database storage settings and click 'Next'
m. Make sure the tickbox 'Create Database' is ticked and click 'Finish'
n. Review the database configuration details again and click 'OK'
o. The database is now created, you can either change or unlock your passwords or just click Exit to finish the database creation

22. For all the luns added in ASM, set LV timeout value = (# of paths * PV Timeout) + 10 seconds, using "lvchange -t timeinsec /dev/dsk/cxtydz" command
       in our case, EMC recommended PV timeout = 90
                                    # of path  = 4
       then check with # lvdisplay -v /dev/dsk/cxtydz


Sample change cluster resource
------------------------
ex: change listener port
------------------------
emca -deconfig dbcontrol db -repos drop -cluster
- srvctl modify listener -l -o -p
- restart the full cluster [ crsctl stop/start crs]
- check with "lsnrctl status", from both DB & AMS "show parameter local"
emca -config dbcontrol db -repos create -cluster


-----------------------------
ex: change scan listener port
-----------------------------
emca -deconfig dbcontrol db -repos drop -cluster
- srvctl modify scan_listener -p 1521
- alter system set remote_listener='ORAdb-scan.robi.com.bd:1521' scope=both;--from DB & ASM on all the db nodes
- Change the necessary files in $ORACLE_HOME/network/admin on server & client
- restart the full cluster [ crsctl stop/start crs]
- check with "lsnrctl status", from both DB & AMS "show parameter remote"
- from all DB nodes "lsnrctl status LISTENER", also check same for all scan lsitener
emca -config dbcontrol db -repos create -cluster
---------------
emca -deconfig dbcontrol db -repos drop -cluster
emca -config dbcontrol db -repos create -cluster



http://docs.oracle.com/cd/E11882_01/install.112/e10815/typinstl.htm#BABFEGHH
http://docs.oracle.com/cd/E11882_01/install.112/e10815/pre_hpux.htm#CIHFICFD
http://docs.oracle.com/cd/E11882_01/install.112/e10815/manpreins.htm#BABCHAED
http://docs.oracle.com/cd/E11882_01/install.112/e10815/pre_hpux.htm#CEGGJDIG
http://docs.oracle.com/cd/E11882_01/install.112/e10815/storage.htm#CDEECFGI
http://gjilevski.wordpress.com/2010/10/03/fresh-oracle-11-2-0-2-grid-infrastructure-installation-prvf-5150-prvf-5184/
MOS Doc: RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent) [ID 810394.1]