Tuesday, May 19, 2020

kfod commands for ASM





kfod disks=all   >>>>> this will will work only as grid/root user
kfod  op=groups status=true asmcompatibility=true dbcompatibility=true verbose=true

Note: from both OS accounts o/p should come as expected, otherwise Oracle database may fail to come up.


[grid@XXXXXX ~]$ kfod disks=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group
================================================================================
   1:     307196 MB /dev/oracleasm/disks/VOL1                grid     asmadmin
   2:     307199 MB /dev/oracleasm/disks/VOL2                grid     asmadmin
   3:     307196 MB /dev/oracleasm/disks/VOL3                grid     asmadmin
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
      +ASM /opt/app/19.5/grid_1
[grid@XXXXX ~]$ kfod  op=groups status=true asmcompatibility=true dbcompatibility=true verbose=true
--------------------------------------------------------------------------------
Group          Size          Free Redundancy Name           Compatibility ASM Compatibility
================================================================================
   1:     921588 MB     504980 MB     EXTERN DG1               10.1.0.0.0        19.0.0.0.0
[grid@ol7pt360dods3 ~]$




[oracle@XXXXX ~]$ kfod  op=groups status=true asmcompatibility=true dbcompatibility=true verbose=true
--------------------------------------------------------------------------------
Group          Size          Free Redundancy Name           Compatibility ASM Compatibility
================================================================================
   1:     921588 MB     504980 MB     EXTERN DG1               10.1.0.0.0        19.0.0.0.0


Also ensure the permissions on bin/oracle as as per below,

[grid@ol7pt360dods3 ~]$ cd $ORACLE_HOME/bin
[grid@ol7pt360dods3 bin]$ ls -ltr oracle
-rwsr-s--x 1 grid oinstall 442645304 May  8 07:28 oracle   >>> it's group should grid user primary ownership

[oracle@ol7pt360dods3 bin]$ ls -ltr oracle
-rwsr-s--x 1 oracle asmadmin 442641136 May  8 07:20 oracle  >>> group should always point to asm luns group only.
[oracle@ol7pt360dods3 bin]$

Friday, August 12, 2016


Storage addition (LUNs)  in RAC - 4 node setup
Brief Steps:-
·         Storage team will assign the LUNs to all the nodes in the cluster.
·         Unix team will re-scan the multipath config... to identify the newly added LUNs
è In case of non-visibility of these LUNs, there are 2 options
1.       Restart the multipath daemon services.
2.       Restart the complete server
We can check the same under /dev/mapper (if it’s multipath config) Otherwise you can check under /dev/disk/by-id
 DBA Activities:-
è cross checking the LUN’s with LUN ids in all nodes
è Disk partioning using fdisk (in any node ex:abc1 in my case)
è After partioning they won’t visible in other nodes, then we have to use ‘kpartx –a’ command to add them.
è Using ASM libraries... labeling the disks


Upon getting confirmation from both storage and UNIX team… we must crosscheck below things on all nodes in the cluster.
Under /dev/mapper -> check all the LUNs having same multipath names and LUN Id’s below is the example for it

First Node: abc1:root@[/dev/mapper]
------------
# multipath -ll|grep mpath2
mpath225 (3600601607c4133005386b1d6cf5ee611) dm-151 DGC,VRAID
mpath224 (3600601607c413300e8b4e365d35ee611) dm-158 DGC,VRAID
mpath223 (3600601607c413300e8394011d35ee611) dm-157 DGC,VRAID
mpath222 (3600601607c4133002e63b6c2d25ee611) dm-156 DGC,VRAID

Second Node: abc2:root@[/dev/mapper]
# multipath -ll | grep mpath2
mpath225 (3600601607c4133005386b1d6cf5ee611) dm-158 DGC,VRAID
mpath224 (3600601607c413300e8b4e365d35ee611) dm-157 DGC,VRAID
mpath223 (3600601607c413300e8394011d35ee611) dm-156 DGC,VRAID
mpath222 (3600601607c4133002e63b6c2d25ee611) dm-155 DGC,VRAID

Third node: abc3:root@[/dev/mapper]
# multipath -ll|grep mpath2
mpath225 (3600601607c4133005386b1d6cf5ee611) dm-158 DGC,VRAID
mpath224 (3600601607c413300e8b4e365d35ee611) dm-157 DGC,VRAID
mpath223 (3600601607c413300e8394011d35ee611) dm-156 DGC,VRAID
mpath222 (3600601607c4133002e63b6c2d25ee611) dm-155 DGC,VRAID

Fourth node: abc4:root@[/dev/mapper]
# multipath -ll|grep mpath2
mpath225 (3600601607c4133005386b1d6cf5ee611) dm-159 DGC,VRAID
mpath224 (3600601607c413300e8b4e365d35ee611) dm-158 DGC,VRAID
mpath223 (3600601607c413300e8394011d35ee611) dm-157 DGC,VRAID
mpath222 (3600601607c4133002e63b6c2d25ee611) dm-156 DGC,VRAID

Upon crosschecking all the LUNs are having matching LUN ids in all the nodes, Proceed with disk portioning,
Only disk partitioning and disk creation… I’m demonstrating here
ASM partitioning

# fdisk -l /dev/mapper/mpath222 ( this is to double check.. if partition already available or not)

Disk /dev/mapper/mpath222: 53.6 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/mapper/mpath222 doesn't contain a valid partition table



# fdisk /dev/mapper/mpath222
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 6527.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-104857599, default 63): 2048
Last sector or +size or +sizeM or +sizeK (2048-104857599, default 104857599):
Using default value 104857599

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 22: Invalid argument.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

To make the partition to be visible on each node, perform below steps on every node :-

kpartx -a /dev/mapper/mpath222
kpartx -a /dev/mapper/mpath223
kpartx -a /dev/mapper/mpath224
kpartx -a /dev/mapper/mpath225

Disk Creation:
# /usr/sbin/oracleasm createdisk ORADISK74   /dev/mapper/mpath222p1
Writing disk header: done
Instantiating disk: done
--
Upon creation... scan all the nodes:
14:48:47 root@abc2: /dev/mapper
# /usr/sbin/oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "ORADISK74"
Instantiating disk "ORADISK75"
Instantiating disk "ORADISK76"
Instantiating disk "ORADISK77"
14:50:52 root@itsolx0002: /dev/mapper

Just cross-verify
# /usr/sbin/oracleasm querydisk -d ORADISK74
Disk "ORADISK74" is a valid ASM disk on device /dev/dm-164[253,164]
15:52:06 root@itsolx0008: /dev/mapper

SQL> select PATH,HEADER_STATUS from v$asm_disk where HEADER_STATUS='PROVISIONED';

PATH                HEADER_STATU
------------------- ------------
ORCL:ORADISK77      PROVISIONED
ORCL:ORADISK74      PROVISIONED
ORCL:ORADISK75      PROVISIONED
ORCL:ORADISK76      PROVISIONED




Thanks.!

Sunday, May 10, 2015

root.sh script failed on first node

Today  tried to install 11gr2 rac ... as i wanted to test some scenario, but came across couple of errors and thought of sharing it here


Error : -

/u01/grid/bin/srvctl start nodeapps -n krish1 ... failed
FirstNode configuration failed at /u01/grid/crs/install/crsconfig_lib.pm line 9379.
/u01/grid/perl/bin/perl -I/u01/grid/perl/lib -I/u01/grid/crs/install /u01/grid/crs/install/rootcrs.pl execution failed



Root.sh output on first node :-

[root@krish1 ~]# /u01/grid/root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
  root wallet
  root wallet cert
  root cert export
  peer wallet
  profile reader wallet
  pa wallet
  peer wallet keys
  pa wallet keys
  peer cert request
  pa cert request
  peer cert
  pa cert
  peer root cert TP
  profile reader root cert TP
  pa root cert TP
  peer pa cert TP
  pa peer cert TP
  profile reader pa cert TP
  profile reader peer cert TP
  peer user cert
  pa user cert
Adding Clusterware entries to upstart
CRS-2672: Attempting to start 'ora.mdnsd' on 'krish1'
CRS-2676: Start of 'ora.mdnsd' on 'krish1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'krish1'
CRS-2676: Start of 'ora.gpnpd' on 'krish1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'krish1'
CRS-2672: Attempting to start 'ora.gipcd' on 'krish1'
CRS-2676: Start of 'ora.cssdmonitor' on 'krish1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'krish1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'krish1'
CRS-2672: Attempting to start 'ora.diskmon' on 'krish1'
CRS-2676: Start of 'ora.diskmon' on 'krish1' succeeded
CRS-2676: Start of 'ora.cssd' on 'krish1' succeeded

ASM created and started successfully.

Disk Group DATA created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 6845b39c31064fc8bfa54c7bbaea4ffd.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6845b39c31064fc8bfa54c7bbaea4ffd (/dev/oracleasm/disks/VOL1) [DATA]
Located 1 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'krish1'
CRS-2676: Start of 'ora.asm' on 'krish1' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'krish1'
CRS-2676: Start of 'ora.DATA.dg' on 'krish1' succeeded
/u01/grid/bin/srvctl start nodeapps -n krish1 ... failed
FirstNode configuration failed at /u01/grid/crs/install/crsconfig_lib.pm line 9379.
/u01/grid/perl/bin/perl -I/u01/grid/perl/lib -I/u01/grid/crs/install /u01/grid/crs/install/rootcrs.pl execution failed



Additional logs needs to be looked at, rootcrs_hostname.log

check the log and look for the errors, you will see below error
PRCR-1013 : Failed to start resource ora.ons 
PRCR-1064 : Failed to start resource ora.ons on node krish1


 cat rootcrs_krish1.log |grep  CRS

Error desc :-
2015-05-10 22:07:49: output for start nodeapps is  PRCR-1013 : Failed to start resource ora.ons PRCR-1064 : Failed to start resource ora.ons on node krish1 CRS-5016: Process "/u01/grid/opmn/bin/onsctli" spawned by agent "/u01/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/grid/log/krish1/agent/crsd/oraagent_oracle/oraagent_oracle.log" CRS-2674: Start of 'ora.ons' on 'krish1' failed
2015-05-10 22:07:49: output of startnodeapp after removing already started mesgs is PRCR-1013 : Failed to start resource ora.ons PRCR-1064 : Failed to start resource ora.ons on node krish1 CRS-5016: Process "/u01/grid/opmn/bin/onsctli" spawned by agent "/u01/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/grid/log/krish1/agent/crsd/oraagent_oracle/oraagent_oracle.log" CRS-2674: Start of 'ora.ons' on 'krish1' failed
2015-05-10 22:07:49: Running as user oracle: /u01/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL
2015-05-10 22:07:49: s_run_as_user2: Running /bin/su oracle -c ' /u01/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL '
2015-05-10 22:07:49: Succeeded in writing the checkpoint:'ROOTCRS_NODECONFIG' with status:FAIL
2015-05-10 22:07:49: 'ROOTCRS_NODECONFIG' checkpoint has failed
2015-05-10 22:07:49: Running as user oracle: /u01/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL
2015-05-10 22:07:49: s_run_as_user2: Running /bin/su oracle -c ' /u01/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_NODECONFIG -state FAIL '
2015-05-10 22:07:50: Succeeded in writing the checkpoint:'ROOTCRS_NODECONFIG' with status:FAIL

----
Cause :
Incorrect value used for localhost in /etc/hosts
IP address 127.0.0.1 should only map to localhost and/or localhost.localdomain, not anything else

Solution :
Edit localhost and ensure that entry for localhost is
127.0.0.1 localhost.localdomain localhost


For now, you have to do fresh installation only.. by making the necessary changes to /etc/hosts file.


Wednesday, April 15, 2015

How to monitor restore/recovery operation..?

Tips and techniques for monitoring a restore/recovery operation and determine whether it is indeed working, slow or hung.

In general, a restore should take approximately the same time as a backup, if not longer. Therefore,
if your backup took 10 hours to complete, it will take at least 10 hours to restore to the same host.
Another good indicator is to determine the duration of your previous restore/recovery operations.
Monitor the logs and views and observe the rate of change. Restore and recovery operations are quite resource intensive so it's important to understand if the process is working or hung.





RMAN Log

By default the result of an RMAN operation is written to the standard output. There is no default log file.
You will need to capture the result using either the SPOOL LOG option, or re-direct the standard output to a file.

Example of a restore datafile session:


RMAN> spool log to res5.out
RMAN> restore datafile 5;
RMAN> spool log off
$ cat res5.out

RMAN> restore datafile 5;

Starting restore at 27 DEC 2011 14:05:03 [1]
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=143 devtype=DISK [2]

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00005 to /opt/app/oracle/oradata/ORA102/example01.dbf [3]
channel ORA_DISK_1: reading from backup piece /opt/app/oracle/fra/ORA102/backupset/2011_12_27/o1_mf_nnndf_TAG20111227T122122_7hl7dm2n_.bkp [4]
channel ORA_DISK_1: restored backup piece 1
piece handle=/opt/app/oracle/fra/ORA102/backupset/2011_12_27/o1_mf_nnndf_TAG20111227T122122_7hl7dm2n_.bkp tag=TAG20111227T122122
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15 [5]
Finished restore at 27 DEC 2011 14:05:19 [6]



From the above we can see:
[1] date and time when the restore started
[2] the database's session ID in v$session - 143
[3] the datafile number and name where file will be restored to
[4] the backuppiece name and TAG
[5] the time taken for the restore of this datafile, and the channel used
[6] date and time when restore completed


Example of a recovery session:


RMAN> recover datafile 5;

Starting recover at 27 DEC 2011 14:05:55
using channel ORA_DISK_1

starting media recovery

channel ORA_DISK_1: starting archive log restore to default destination [1]
channel ORA_DISK_1: restoring archive log [2]
archive log thread=1 sequence=77
...
channel ORA_DISK_1: restoring archive log
archive log thread=1 sequence=89
channel ORA_DISK_1: reading from backup piece /opt/app/oracle/fra/ORA102/backupset/2011_12_27/o1_mf_annnn_TAG20111227T135926_7hlf4jrk_.bkp
channel ORA_DISK_1: restored backup piece 1
piece handle=/opt/app/oracle/fra/ORA102/backupset/2011_12_27/o1_mf_annnn_TAG20111227T135926_7hlf4jrk_.bkp tag=TAG20111227T135926
channel ORA_DISK_1: restore complete, elapsed time: 00:01:35
...
channel default: deleting archive log(s)
archive log filename=/opt/app/oracle/fra/ORA102/archivelog/2011_12_27/o1_mf_1_88_7hlfmkr4_.arc recid=116 stamp=770998049
channel default: deleting archive log(s) [3]
archive log filename=/opt/app/oracle/fra/ORA102/archivelog/2011_12_27/o1_mf_1_89_7hlfmbd8_.arc recid=104 stamp=770998043
media recovery complete, elapsed time: 00:00:01 [4]
Finished recover at 27 DEC 2011 14:07:32 [5]

[1] the archivelogs are restored to default archive destination. Prior to the restore you must ensure that there is space available for the restore of these archivelogs
[2] the archivelogs are restored from backup if they are not already on disk
[3] once recovery is completed RMAN will automatically remove them from disk
[4] time taken to recover this datafile
[5] date and time when recover completed


Alert.log

Only RMAN restore operations are written to the alert.log. User-managed restore sessions will not appear in the alert.log as they are performed outside of Oracle.


Tue Dec 27 14:05:14 EST 2011
Full restore complete of datafile 5 /opt/app/oracle/oradata/ORA102/example01.dbf.  Elapsed time: 0:00:07
  checkpoint is 2989194
  last deallocation scn is 399417



All recovery sessions, whether user-managed or RMAN will also be written to the alert.log. This is an example of a RMAN recovery session:

Tue Dec 27 14:05:55 EST 2011
alter database recover datafile list clear
Tue Dec 27 14:05:55 EST 2011Completed: alter database recover datafile list clear
Tue Dec 27 14:05:55 EST 2011
alter database recover if needed
 datafile 5
Media Recovery Start
 parallel recovery started with 2 processesORA-279 signalled during: alter database recover if needed
 datafile 5
...
Tue Dec 27 14:05:56 EST 2011
The input backup piece /opt/app/oracle/fra/ORA102/backupset/2011_12_27/o1_mf_annnn_TAG20111227T135926_7hlf4jrk_.bkp is in compressed format.Tue Dec 27 14:07:23 EST 2011 [1]
Archivelog restore complete. Elapsed time: 0:00:01 [2]
Archivelog restore complete. Elapsed time: 0:00:01
...
Tue Dec 27 14:07:31 EST 2011
alter database recover logfile '/opt/app/oracle/fra/ORA102/archivelog/2011_12_27/o1_mf_1_77_7hlfmdc7_.arc'
...
Tue Dec 27 14:07:31 EST 2011
Media Recovery Log /opt/app/oracle/fra/ORA102/archivelog/2011_12_27/o1_mf_1_77_7hlfmdc7_.arc
ORA-279 signalled during: alter database recover logfile '/opt/app/oracle/fra/ORA102/archivelog/2011_12_27 /o1_mf_1_77_7hlfmdc7_.arc' [3]
...
Tue Dec 27 14:07:31 EST 2011
Media Recovery Log /opt/app/oracle/fra/ORA102/archivelog/2011_12_27/o1_mf_1_87_7hlfmk96_.arc
Tue Dec 27 14:07:31 EST 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 88 Reading mem 0  Mem# 0: /opt/app/oracle/oradata/ORA102/redo01.log [4]
Tue Dec 27 14:07:31 EST 2011
Recovery of Online Redo Log: Thread 1 Group 2 Seq 89 Reading mem 0
  Mem# 0: /opt/app/oracle/oradata/ORA102/redo02.log
Tue Dec 27 14:07:31 EST 2011
Media Recovery Complete (ORA102)[5]

[1] this was in fact a compressed backuppiece. This information is only displayed in the alert.log rather than the RMAN restore log
[2] the time taken to restore the archivelog
[3] ORA-279 is informational - confirming the archivelog required for recovery
[4] for complete recovery, Oracle will also need to apply the redo from the online logs
[5] end of recovery



User Managed Recovery Log

This is an example of a user-managed recovery. We are performing the recovery via SQL*Plus:



$ sqlplus

SQL*Plus: Release 10.2.0.5.0 - Production on Wed Dec 28 09:59:41 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options

SQL> alter database datafile 5 offline; [1]

Database altered.

SQL> recover datafile 5; [2]
ORA-00279: change 2989857 generated at 12/27/2011 12:50:00 needed for thread 1
ORA-00289: suggestion :
/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_79_7hnmjl2y_.arc
ORA-00280: change 2989857 for thread 1 is in sequence #79 [3]


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

ORA-00279: change 2989860 generated at 12/27/2011 12:50:01 needed for thread 1
ORA-00289: suggestion :
/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_80_7hnmjmbc_.arc
ORA-00280: change 2989860 for thread 1 is in sequence #80
ORA-00278: log file
'/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_79_7hnmjl2y_.arc' no
longer needed for this recovery [4]


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto [5]
ORA-00279: change 2989874 generated at 12/27/2011 12:50:39 needed for thread 1
ORA-00289: suggestion :
/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_81_7hnmjmkf_.arc
ORA-00280: change 2989874 for thread 1 is in sequence #81
ORA-00278: log file
'/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_80_7hnmjmbc_.arc' no
longer needed for this recovery

...
ORA-00279: change 2991001 generated at 12/27/2011 12:58:00 needed for thread 1
ORA-00289: suggestion :
/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_87_7hnmjoc7_.arc
ORA-00280: change 2991001 for thread 1 is in sequence #87
ORA-00278: log file
'/opt/app/oracle/fra/ORA102/archivelog/2011_12_28/o1_mf_1_86_7hnmjkm0_.arc' no
longer needed for this recovery


Log applied.
Media recovery complete. [6]
SQL>

SQL> alter database datafile 5 online; [7]

Database altered.

[1] take the datafile offline in preparation for a restore from your user-managed backup

[2] once the datafile is restored from your user-managed backup, recover it

[3] the first archivelog required to recover this file

[4] we have pressed the ENTER key, hence asking Oracle to apply the log requested

[5] if there are a lot of archivelogs to apply and they are all in the archive directory, use the AUTO option for Oracle to apply the rest of the required archivelogs. Otherwise you will need to manually specify each archivelog requested, or press ENTER when prompted for each archivelog

[6] recovery is now complete

[7] place the datafile online, thus making it available for use again



Operating System Utilities

The file being restored should be increasing in size until its actual size. The timestamp should also be changing as the file is being updated by Oracle. Use an operating system utility such as "ls -lt" to see this information.


$ ls -ltr <full path and file name being restored>

eg:
$ ls -ltr /database/db251/asbs/BLOB_DOC_IMAGES_B12.dbf


You can also use operating system utilities such as vmstat, sar and iostat to monitor resource utilization. Is the hardware working to its full capacity? Where is the bottleneck? Are there other I/O intensive operations happening on the host? Install Oracle's OSWatcher utility for more information if required.

Note: In Unix, if the job is started interactively from a terminal without NOHUP or CRON, ensure that the Unix job itself is still running. Sometimes your terminal may terminate due to idle timeouts and leave the job in an orphaned status. If the restore operation is going to take a long time you should consider using NOHUP or CRON to avoid this problem.


If the file is being restored to ASM you should also be able to monitor its file size in ASM. Utilities such as "ls -l" will work also work in ASM.

Please also be aware of this issue:
    Note 882555.1 - RMAN is Not Restoring OMF Datafiles in Their Original Location



Data Dictionary Views

All RMAN operations will have a corresponding database session. Therefore you can query the data dictionary to check its progress.

1) RMAN sessions

SQL> -- RMAN sessions

set linesize 100 trimspool on

COLUMN sid      FORMAT 9999
COLUMN serial#  ALIAS SER# FORMAT 99999
COLUMN spid  FORMAT 9999
COLUMN username FORMAT a10
COLUMN status   FORMAT a2
COLUMN program  FORMAT a32
COLUMN logon_time form a15
COLUMN module form a30
COLUMN action form a35
COLUMN process form a14

SELECT
       s.sid ,
       s.serial# "ser#",
       s.username,
       to_char(s.logon_time,'DD-MM-RR hh24:mi') logon_time,
       s.osuser,
       s.process,
       p.spid ,
       s.machine,
       substr(s.status,1,1) status,
       s.program
FROM v$session s, v$process p
WHERE upper(s.program) like '%RMAN%'
AND   s.paddr = p.addr (+)
ORDER by s.logon_time, s.sid
/


2) The percentage of work completed


Run the following queries at least 3 times, at 5 minute intervals to see progress/change.

SQL>set echo on feedback on
    column path format a50
    set header off
    select
           sl.sofar, sl.totalwork,
           round(sl.sofar/sl.totalwork*100,2) "% Complete"
    from   v$session_longops sl, v$session s, v$process p
    where  p.addr = s.paddr
    and    sl.sid=s.sid
    and    sl.serial#=s.serial#
    and    opname LIKE 'RMAN%'
    and    opname NOT LIKE '%aggregate%'
    and    totalwork != 0
    and    sofar <> totalwork;


3) Session Waits

Are there any sessions in wait and what is it waiting for?


Run the following queries at least 3 times, at 5 minute intervals to see progress/change.


set linesize 200 trimspool on

col event form a25
col p1text form a15
col p1 form 999999
col p2text form a15
col p2 form 999999
col p3text form a10
col p3 form 9999
col waited form 9999
col waiting form 9999

select sid, event, p1text, p1, p2text, p2, p3text, p3,
wait_time waited, seconds_in_wait waiting
from gv$session_wait
where event not like 'SQL*Net%'
and event not like '%timer%'
and event not like 'rdbms%'
and event not like 'pipe%'
and event not like 'DIAG%'
and event not like 'Streams AQ%'
and event not like 'VKTM%'
and state = 'WAITING'
order by seconds_in_wait
/



4) Recovery Progress


What is the recovery progress?  V$RECOVERY_PROGRESS is only populated when RECOVERY is in progress. A restore operation will not populate this view. So if you think a recovery process is slow - is it really at the recovery phase, or still restoring from RMAN backuppieces?

This is an example of a recovery progress:



22:27:38 SQL> select START_TIME,TYPE,ITEM,UNITS,SOFAR,TOTAL from v$recovery_progress;

START_TIME                  TYPE            ITEM                             UNITS                         SOFAR      TOTAL
--------------------------- --------------- -------------------------------- ------------------------ ---------- ----------
12-nov-14 16:08:10          Media Recovery  Average Apply Rate               KB/sec                        29713          0
12-nov-14 16:08:10          Media Recovery  Redo Applied                     Megabytes                    660747          0
12-nov-14 16:08:10          Media Recovery  Last Applied Redo                SCN+Time                          0          0
12-nov-14 11:28:16          Media Recovery  Checkpoint Time per Log          Seconds                           6          6
12-nov-14 11:28:16          Media Recovery  Standby Apply Lag                Seconds                           0          0







Redo recovery rate is determined by a number of factors:

1) PARALLEL_EXECUTION_MESSAGE_SIZE

The default value for this parameter may not be big enough, thus consider increasing to its maximum operating system dependent value:

SQL> show parameter PARALLEL_EXECUTION_MESSAGE_SIZE
SQL> alter system set parallel_execution_message_size=65535 scope=spfile;
This parameter change needs the database to be restarted/remounted.



2) native I/O rates at the hardware level - consult with your system administrator/hardware vendor for this

3) recovery parallelism

This is operating system dependent - Oracle will startup the required number of parallel processes to perform this task.

If you feel the need to manually specify this, then the commands are:

SQL> RECOVER datafile x,y,z parallel (degree 32);

OR

SQL> recover parallel 32;



4) if after tuning the above, redo apply rate is still not acceptable then you can temporarily set db_block_checking to false to try and boost recovery performance.


Media Management Logs


If restoring from tape, confirm that it is indeed restoring from tape rather than waiting on the media manager to service the request. Is the tape busy or idle? Ask your media management support team to confirm the rate at which data is being read from tape.

Monday, March 9, 2015

Log Files in RAC Environment

Cluster Ready Services Daemon (crsd) Log Files

Log files for the CRSD process (crsd) can be found in the following directories:
ORA_CRS_HOME/log/hostname/crsd
the crsd.log file is archived every 10MB as crsd.101, crsd.102 ...
ORACLE_CRS_HOME/log//alert.log

Oracle Cluster Registry (OCR) Log Files

The Oracle Cluster Registry (OCR) records log information in the following location:
ORA_CRS_HOME/log/hostname/client

Cluster Synchronization Services (CSS) Log Files

You can find CSS information that the OCSSD generates in log files in the following locations:
ORA_CRS_HOME/log/hostname/cssd
OCSSD responsible for inter node health monitoring and instance endpoint recovery.
It runs as oracle user.  
The cssd.log file is archived every 20MB as cssd.101, cssd.102....

Event Manager (EVM) Log Files

Event Manager (EVM) information generated by evmd is recorded in log files in the following locations:
ORA_CRS_HOME/log/hostname/evmd

RACG Log Files

The Oracle RAC high availability trace files are located in the following two locations:
ORA_CRS_HOME/log/hostname/racg
$ORACLE_HOME/log/hostname/racg
Core files are in the sub-directories of the log directories. Each RACG executable has a sub-directory assigned exclusively for that executable. The name of the RACG executable sub-directory is the same as the name of the executable.

VIP Log Files

You can find VIP related log files under following location :
ORA_CRS_HOME/log/nodename/racg

OCR logs
OCR logs (ocrdump, ocrconfig, ocrcheck) log files are stored in $ORA_CRS_HOME/log//client/ directory. 

SRVCTL logs
srvctl logs are stored in two locations, $ORA_CRS_HOME/log//client/ and in $ORACLE_HOME/log//client/ directories.

Tuesday, January 20, 2015

12c agent deployment in silent mode
First ensure that we have the agent software available in our OMS home, to ensure that take a look at the below snaps,
Firstly goto SetupàExtensibilityàSelf Update:-
You should get a console like this :- Then click on Agent Software
Make sure that the required agent software is in applied state ( If not download it from the same page by clicking on appropriate host and version )

Tille now, we have ensured the agent software is availble in the OMS host,
Now proceed with getting the agent image from OMS host (Here onwards steps will be in command line only)

Login to OMS host  on putty:-
Step-1: On the OMS host launch the EMCLI client
emcli login -username=sysman -password=XXXXX
Step-2: To reflect the changes/newly downloaded software, we must synchronize OMS using
emcli sync

Step-3: Get the supported platforms list (It’s gonna display the info of hosts based on the agent software available in OMS host )
emcli get_supported_platforms

Step-4: Retrieve the agent image from OMS to some temporary location say /tmp
emcli get_agentimage -destination=/tmp  -platform=”Linux x86-64″ -version=”12.1.0.2.0″
Step-5: Move the same extracted file to target server where we need to deploy the agent
scp -r 12.1.0.2.0_AgentCore_226.zip oracle@lin001racd03a.txi.com:/tmp/agent_software

Step-6: unzip the same file on target server
Step-7: Edit the agent.rsp response file
OMS_HOST=” name.domain.com”
EM_UPLOAD_PORT=”4902″
AGENT_REGISTRATION_PASSWORD=”xxxx”
AGENT_INSTANCE_HOME=”/u01/app/oracle/agent12c/agent_inst”
AGENT_PORT=”3872″
ORACLE_HOSTNAME=” yyyy.domain.com”

Step-8: Run the agent deployment script

oracle:XXXXX@yyyyyy:~ $./agentDeploy.sh AGENT_BASE_DIR=/u01/app/oracle/agent12c RESPONSE_FILE=/tmp/agent_software/agent.rsp


Step-9: Once agent deployment done successfully run root.sh script and then check the status of the agent and configure the same in OEM console.

Monday, September 22, 2014

Troubleshooting OEM12c,

This  is quite natural  error at the time of configuring oms server / at the time of maintenance of server.
The behavior is something like below.
[oracle@oem12c bin]$./emctl status oms
WebTier is Up
Oracle Management Server is Down

even after we try couple of start/stop operations also, may not helpful in bring the oms up.

$./emctl start oms
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server is Down



In emctl.log we can see the below errors
·         Failed to connect to em login page
·         Encountered error while hitting page. Status codes are 404 for console and 500 for PBS
·         Connection exception when trying to hit OMS page. OMS is down


So all you have to do is, need to follow a sequence to bring it up,
 ·         stop the oms repository forcibly
       emctl stop oms -force
·         Bring the repository database listener first.
·         Check the listener services
       lsncrctl services
·         Next bring the repository database up.
·         Check the connectivity for sysman user from sqlplus,  in case of any problems with the password/account lock resolve them .
·         And then start the oms .
       emctl start oms



Now it's succefully started and you are supposed to get acces to the login page.

Thanks for reading the posting...!







Tuesday, July 8, 2014

oracle 11gR2 RAC “oc4j” and “gsd” resource offline



Oracle RAC GSD process

GSD (Global Services Daemon) is a cluserware process that was used in Oracle RAC 9i.
It runs on each node and coordinates with the cluster manager to receive requests from clients such as the DBCA, EM and the SRVCTL utility to execute tasks such as instance startup or shutdown.

With Oracle Clusterware 10g and 11g, the GSD resource does not have a function and will be disabled by default
It can be enabled in Oracle 10g or 11g if any Oracle 9i RAC components are present.


[oracle@rac2 ~]$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1       
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac2       
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1       
ora.eons       ora.eons.type  ONLINE    ONLINE    rac1       
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              
ora....network ora....rk.type ONLINE    ONLINE    rac1       
ora.oc4j       ora.oc4j.type  OFFLINE   OFFLINE              
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1       
ora.orcl.db    ora....se.type ONLINE    ONLINE    rac1       
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    OFFLINE   OFFLINE              
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1       
ora....SM2.asm application    ONLINE    ONLINE    rac2       
ora....C2.lsnr application    ONLINE    OFFLINE              
ora.rac2.gsd   application    OFFLINE   OFFLINE              
ora.rac2.ons   application    ONLINE    ONLINE    rac2       
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2       
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac2   

OC4J Resource Troubleshooting
[oracle@rac1 ~]$ srvctl start oc4j
OC4J could not be started
PRCR-1079 : Failed to start resource ora.oc4j
CRS-2501: Resource 'ora.oc4j' is disabled
we need to enable and start the resource using
[oracle@rac1 ~]$ srvctl enable oc4j
[oracle@rac1 ~]$ srvctl start oc4j
Check the resource status
[oracle@rac1 ~]$ crs_stat -t|grep oc4j
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1    

Now take a look at our GSD resource
[oracle@rac1 ~]$ srvctl status nodeapps
VIP rac1-vip is enabled
VIP rac1-vip is running on node: rac1
VIP rac2-vip is enabled
VIP rac2-vip is running on node: rac2
Network is enabled
Network is running on node: rac1
Network is running on node: rac2
GSD is disabled
GSD is not running on node: rac1
GSD is not running on node: rac2
ONS is enabled
ONS daemon is running on node: rac1
ONS daemon is running on node: rac2
eONS is enabled
eONS daemon is running on node: rac1
eONS daemon is running on node: rac2  
From the above output it is evident that gsd is disable so, enable that and start it 
[oracle@rac1 ~]$ srvctl enable nodeapps
PRKO-2414 : GSD is already enabled on node(s): rac1,rac2
PRKO-2415 : VIP is already enabled on node(s): rac1,rac2
PRKO-2416 : Network resource is already enabled.
PRKO-2417 : ONS is already enabled on node(s): rac1,rac2
PRKO-2418 : eONS is already enabled on node(s): rac1,rac2
[oracle@rac1 archive]$ srvctl start nodeapps
PRKO-2421 : Network resource is already started on node(s): rac1,rac2
PRKO-2420 : VIP is already started on node(s): rac1,rac2
PRKO-2420 : VIP is already started on node(s): rac1,rac2
PRKO-2422 : ONS is already started on node(s): rac1,rac2
PRKO-2423 : eONS is already started on node(s): rac1,rac2


Finally Check the resources status
[oracle@rac1 ~]$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1       
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1       
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac2       
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1       
ora.eons       ora.eons.type  ONLINE    ONLINE    rac1       
ora.gsd        ora.gsd.type   ONLINE    ONLINE    rac1       
ora....network ora....rk.type ONLINE    ONLINE    rac1       
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac1       
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1       
ora.orcl.db    ora....se.type ONLINE    ONLINE    rac1       
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    ONLINE    ONLINE    rac1       
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1       
ora....SM2.asm application    ONLINE    ONLINE    rac2       
ora....C2.lsnr application    ONLINE    ONLINE    rac2       
ora.rac2.gsd   application    ONLINE    ONLINE    rac2       
ora.rac2.ons   application    ONLINE    ONLINE    rac2       
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2       
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac2