Oracle Grid Infrastructure: How to recover from a messed up ASM/CSSD diskstring

Oracle Grid Infrastructure 11.2 with voting files and OCR in an ASM diskgroup can be a little tricky if you mess up the voting file voodoo. You know the basic situation?

With Oracle Grid Infrastructure aka Oracle Clusterware, we are storing your cluster quorum and config repository (OCR) into a disk group. But CSSD needs the voting files before ASM is online: At its startup, CSSD scans the device headers of every device in the disk string (configured by you at ASM initial setup time). If it can find at least two valid voting files, the party takes place. Otherwise, your CSSD will cycle with appropriate error messages in $GRID_HOME/log/hostname/log/cssd/ocssd.log for each loop.

This is where I did find myself today: I changed the ASM diskstring to an insane value, and whoops – at next reboot, my Node1 cycled its CSSD forever in a few minute’s interval, and Node2 was caught in a rock solid reboot loop. Looking up the CSSD logfile, I saw that the CSSD had trouble identifying its voting files. (In fact, there have been multiple devices pointing to the same physical device. Interestingly, thus, CSSD dropped both of them. But this is not the issue of this post.)

Now, tell me, how do you change back the ASM disk_string parameter without having ASM running, and with no CSSD available, which is necessary to start it? And how do we tell the CSSD, that’s running fairly in advance of ASM, to scan the right devices?

By Martin Klier in Cluster and RAC,Oracle

