About

Martin Klier

usn-it.de

Oracle Grid Infrastructure / ASM – ERROR: failed to update diskgroup resource ora.DATA.dg

Recently, when installing a database on a grid infrastructure, dbca failed with the message that it wasn’t able to establish the dependency between the database and the corresponding grid infrastructure resource of the DATA ASM disk group I was using for my data files. Its name would have been “ora.DATA.dg”. (ERROR: failed to establish dependency between database MYDB and diskgroup resource ora.DATA.dg)

The Problem

When looking up

crsctl stat res -t

showed me , that there was no “ora.DATA.dg”, so no wonder. But a SELECT on v$asm_diskgroup confirmed that the diskgroup WAS there, it was even mounted!

Exploring possibilities

Okay, whatever might have been wrong when creating the diskgroup, just let’s create the resource, and off we would go. But Oracle refused to cooperate. SRVCTL only knows operations such as START, STOP, STATUS, ENABLE, DISABLE and REMOVE a diskgroup. I was not able to find something in the official documentations, that describes definitely how a resource for a diskgroup is created. All sources pointed to something automagic. When testing with GI and ASM on my test cluster, it became clear that the resource ora.DATA.dg is added to the resource list when the disk group DATA is mounted by ASM for the very first time. (Some folks on the net indicating that this would happen when the RDBMS is using this DG for the first time are mistaken, and are maybe confused by the experience that the DB access triggers ASM to mount this diskgroup.) But knowing this was no help for a start, because on my new production box this DID NOT happen, and nobody seemed to know why.

This was my manual mount command:

(+ASM1)$ sqlplus / as sysasm
<...>
SQL> alter diskgroup DATA mount;
Diskgroup altered.
SQL>

No error, but as expected, no such disk group resource in crsctl. Now let’s look into the alert log of my first ASM instance ( alert_+ASM1.log):

(+ASM1)$ tail -1000f $ORACLE_BASE/diag/+asm/+ASM1/trace/alert_+ASM1.log
<...>
NOTE: diskgroup resource ora.DATA.dg is online
ERROR: failed to update diskgroup resource ora.DATA.dg

Ah, here we go. But no trace file enlighted the darkness, in fact a grep on ora.DATA.dg in my trace directory found zero results. But this error message leaves two options open for consideration: a) ASM tried to switch an existing resource and fails, because it’s not there or b) ASM tries to create a resource and fails for an unknown reason. But option a) is a thought loop, so what about b)?

A Theory

Facts:

  • All ASM resources depend on ASM, and ASM depends on a listener and a listener depends on network. Oracle Grid Infrastructure usually checks for possibly failing dependencies BEFORE doing anything to avoid creating outages in vain.
  • All ASM processes are running as OS user ORACLE (here).
  • My manually created admin VIP “admin.vip1.res” was created by root (to bring along enough permissions for ifconfig) and oracle has no rights there (not even READ).
    (+ASM1)$ crsctl getperm resource admin.vip1.res
    Name: admin.vip1.res
    owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x

Maybe GI tries to look up something and can’t…?

Check

So let’s check something. Give oracle the right to fuss around with my handmade VIP:

[root@db_node1 ~]$ (+ASM1) crsctl setperm resource admin.vip1.res -u user:oracle:r-x
[root@db_node1 ~]$ (+ASM1) crsctl getperm resource admin.vip1.res
Name: admin.vip1.res
owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x,user:oracle:r-x

Dismount/mount the VIP: Same problem. 🙁 But in the last second before dismissing this direction: Why shouldn’t we restart the VIP before trying?

[root@db_node1 ~]$ (+ASM1) crsctl stop res admin.vip1.res 
CRS-2673: Attempting to stop 'klug.fs.klug' on 'db_node1'
CRS-2677: Stop of 'klug.fs.klug' on 'db_node1' succeeded

[root@db_node1 ~]$ (+ASM1) crsctl start res admin.vip1.res
CRS-2672: Attempting to start 'admin.vip1.res' on 'db_node1'
CRS-2676: Start of 'admin.vip1.res' on 'db_node1' succeeded

And retry:

(+ASM1)$ sqlplus / as sysasm
<...>
SQL> alter diskgroup DATA dismount;
Diskgroup altered.
SQL>

Still no resource in the list.

SQL> alter diskgroup DATA mount;
Diskgroup altered.
SQL>

Check again!

[oracle@db_node1 ~]$ (+ASM1) crsctl stat res -t
-------------------------------------------------
NAME           TARGET  STATE        SERVER       
-------------------------------------------------
Local Resources
-------------------------------------------------
<...>
ora.DATA.dg
ONLINE  ONLINE       db_node1
OFFLINE OFFLINE      db_node2
<...>

YAY! Ok, it’s still offline on Node 2, but it was a local ASM command to mount it, +ASM2 had no chance to know it.

Result

Give your admin-made VIPs at least the ACL rights “-u user:oracle:r-x” (I tried “r–” as well, but it’s not enough) to enable ASM hooking its resources into grid infrastructure. Definitely not documented, and thus, maybe nice to know.

Carry on
Martin

 

Oracle: DOAG Konferenz 2012 presentation – Child cursors / Mutex Waits
Oracle Grid Infrastructure: 11.2.0.3.5 PSU on AIX: gipchaLowerRecv: invalid message header hdr

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.