This was pending from a long time.As i am little free these days, so thought of writing it down.A little background on the issue –
The RDBMS and ASM HOME had 10.2.0.3 version on AIX OS(version i don’t remember).A new disk 100Gb had to be added to the ASM diskgroup.As part of initial check, the consultant on site checked for the ownership and permission of the provided disk on both the nodes of RAC setup.The ownership on node 2 was incorrectly set(showed root:system).It was changed by the sysadmin and after re-checking the OS visibility of the disk,ownership and permission on both nodes, add disk command was fired(from node 1) and it failed.The alert log (+ASM2 instance) showed
Fri Oct 22 15:35:52 2010 NOTE: reconfiguration of group 2/0x1771b7a6 (ASMDB), full=1 NOTE: disk validation pending for group 2/0x1771b7a6 (ASMDB) ERROR: group 2/0x1771b7a6 (ASMDB): could not validate disk 3 SUCCESS: validated disks for 2/0x1771b7a6 (ASMDB) NOTE: PST refresh pending for group 2/0x1771b7a6 (ASMDB) NOTE: PST update: grp = 2, dsk = 3, mode = 0x4 Fri Oct 22 15:35:57 2010 ERROR: too many offline disks in PST (grp 2) Fri Oct 22 15:35:57 2010 NOTE: PST not enabling heartbeating (grp 2): group dismounted Fri Oct 22 15:35:57 2010 SUCCESS: refreshed PST for 2/0x1771b7a6 (ASMDB) ERROR: ORA-15040 thrown in RBAL for group number 2 Fri Oct 22 15:35:57 2010 Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_rbal_1159174.trc: ORA-15040: diskgroup is incomplete ORA-15066: offlining disk "" may result in a data loss ORA-15042: ASM disk "3" is missing Fri Oct 22 15:35:57 2010 Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_1060874.trc: ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [2], [3], [7], [], [], [], [] Fri Oct 22 15:36:00 2010 Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_1060874.trc: ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [2], [3], [7], [], [], [], [] Fri Oct 22 15:36:00 2010 CKPT: terminating instance due to error 469 Fri Oct 22 15:36:00 2010 Trace dumping is performing id=[cdmp_20101022153600] Fri Oct 22 15:36:02 2010 Shutting down instance (abort) License high water mark = 9 Fri Oct 22 15:36:05 2010 Instance terminated by CKPT, pid = 1060874 Fri Oct 22 15:36:07 2010 Instance terminated by USER, pid = 1536124 Fri Oct 22 15:36:11 2010 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0
Now things are interesting…insn’t it ;).So what all is mandatory to check before adding the disk to diskgroup :-
1. The OS visibility of the disk across all the nodes.
2. The ownership of the disk across all the nodes.
3. The permissions of the disk across all the nodes.
4. Checking the visibility of the disk using view GV$ASM_DISK.
The above 4 are the key points of areas (i suppose) to look before adding the disk.For the disk to be added the column HEADER_STATUS in GV$ASM_DISK must show as CANDIDATE.And this is what was not checked.I was called to check what could have the been reason for the same and also to add 2 more disk in the diskgroup.
After arriving on client’s site, i checked for the above 4 points and saw that the point 4 is where things went wrong.The view gv$asm_disk showed :-
SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk15'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 1 /dev/rhdisk15 CLOSED MEMBER
The above confirms that the disk (/dev/rhdisk15) was not discovered by the ASM instance on node 2 and hence the error ORA-15042: ASM disk “3” is missing.The only reason i could think of was that the disk was exclusively acquired by node1 as ownership and permissions were correctly set.To confirm the same,i had to make few MOS and google search.The Metalink Doc Id :- 400005.1 confirms the same issue.Also found this useful link http://oraclue.com/2009/03/10/kernel-files-editor-kfed-and-kfod-osm-discovery-utility/.
As i had to add 2 more disk , asked the sysadmin team to provide the disk.After they made it available, i started checking the pre-requisites and again point 4 seemed problematic.The disk was getting recognized only be node 1.
SQL>select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 1 /dev/rhdisk17 CLOSED CANDIDATE
The newly added disk showed CANDIDATE, but was recognized by only node 1.To confirm that the disk was acquired exclusively by node1 used the kfod utility along with truss as suggested by the metalink doc from node1.
truss -aefo kfod_17.txt kfod disks=all For the disk(rhdisk17) the output showed :- 1192140: 2490391: statx("/dev/rhdisk17", 0x0FFFFFFFFFFFCF00, 176, 010) = 0 1192140: 2490391: open("/dev/rhdisk17", O_RDONLY|O_LARGEFILE) Err#16 EBUSY 1192140: 2490391: open("/dev/rhdisk17", O_RDONLY|O_LARGEFILE) Err#16 EBUSY
So, it was time to catch the sysadmin.It was hard to make him understand the problem because he had only one sentence ‘Everything is OK from myside’.Finally after having discussions with him, he was ok in diagnosing the analyzing the problem.In between the time he was diagnosing,output of gv$asm_disk kept changing.
17:03:12 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 1 /dev/rhdisk17 CLOSED CANDIDATE 17:08:13 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 2 /dev/rhdisk17 CLOSED CANDIDATE 1 /dev/rhdisk17 CLOSED CANDIDATE 17:38:07 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 1 /dev/rhdisk17 CLOSED CANDIDATE
To resolve it below steps were taken on AIX server (luckily i was part of doing this) :-
1. Logged in as root user on the server. 2. [email protected]:/ $ cd /dev 3. [email protected]:/dev $ ls -l | grep rhdisk17 crw------- 1 root system 40, 19 Oct 25 18:03 rhdisk17 [email protected]:/dev $ chmod 660 rhdisk17 [email protected]:/dev $ ls -lrt rhdisk17 crw-rw---- 1 root system 40, 19 Oct 25 18:03 rhdisk17 [email protected]:/dev $ chown -R oracle:oinstall rhdisk17 [email protected]:/dev $ ls -lrt rhdisk17 crw-rw---- 1 oracle oinstall 40, 19 Oct 25 18:03 rhdisk17 [email protected]:/dev $ [email protected]:/dev $ ls -lrt hdisk17 brw------- 1 root system 40, 20 Oct 25 18:03 hdisk17 [email protected]:/dev $ lsattr -El hdisk17 PR_key_value none Persistant Reserve Key Value True cache_method fast_write Write Caching method False ieee_volname 600A0B800042193E000009984CC4D6EB IEEE Unique volume name False lun_id 0x0010000000000000 Logical Unit Number False max_transfer 0x100000 Maximum TRANSFER Size True prefetch_mult 1 Multiple of blocks to prefetch on read False pvid none Physical volume identifier False q_type simple Queuing Type False queue_depth 10 Queue Depth True raid_level 1 RAID Level False reassign_to 120 Reassign Timeout value True reserve_policy single_path Reserve Policy True rw_timeout 30 Read/Write Timeout value True scsi_id 0x10000 SCSI ID False size 51200 Size in Mbytes False write_cache yes Write Caching enabled False [email protected]:/dev $
The reserve_policy was set as single_path(by default).So, which ever node it was made visible later, had the exclusive lock on the disk.To encounter it the disk’s attribute must be changed so that they are shared by all the nodes.
[email protected]:/dev $ lsattr -El hdisk17 | grep lun_id lun_id 0x0010000000000000 Logical Unit Number [email protected]:/dev $ chdev -l hdisk17 -a reserve_policy=no_reserve hdisk17 changed [email protected]:/dev $ lsattr -El hdisk17 | grep reserve_policy reserve_policy no_reserve Reserve Policy True [email protected]:/dev $ lspv | grep hdisk17 hdisk17 none None [email protected]:/dev $
The same steps were performed on the node 2 to change the reserve_policy of the disk to no reserve and made it sharable.
Once it was completed checked the output from gv$asm_disk
18:28:29 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17'; INST_ID PATH MOUNT_S HEADER_STATU ---------- ------------------------- ------- ------------ 2 /dev/rhdisk17 CLOSED CANDIDATE 1 /dev/rhdisk17 CLOSED CANDIDATE
The ‘alter diskgroup add disk’ for rhdisk17 and rhdisk18 succeeded.
The issue with ‘rhdisk15’ was resolved by Oracle Support using the ‘dd’ command.
References :-
1. Disks Added to the ASM From One Node Are Not Discovered on the Other Node [ID 400005.1]
2. http://askdba.org/weblog/2008/05/ora-15063-asm-discovered-insufficient-amount-of-disks-2/
3. http://oraclue.com/2009/03/10/kernel-files-editor-kfed-and-kfod-osm-discovery-utility/
4. http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphcg/lspv.htm
5 thoughts on “ORA-15042: ASM disk "N" is missing”
Hi!
I just wanted to say that these troubleshooting steps are still relevant for 11gR2. One more item you can check if multi-pathing is set correctly:
‘oracleasm listdisks’
If your new disk is missing there, then have your sysadmin run the scandisks command. That is what fixed my ora-15042 error.
Thanks much!
Kevin
Hey Anand, excellent explanation…however I have seen this issue somewhere… 😉
Thanks buddy 😉
Regards,
Anand
Good One.
Thanks mate 🙂
Regards,
Anand