About

Martin Klier

usn-it.de

Oracle Grid Infrastructure: UDEV fixing ORA-15081: failed to submit an I/O operation to a disk

If you plan to install Oracle products, be prepared for surprise of random kind. 🙂 The new Grid Infrastructure is no exception of this rule: I tried to install it on a SuSE Linux Enterprise Server (SLES) 10.

Quick facts about the setup:

  • SLES 10 SP 2 64bit
  • Oracle Grid Infrastructure 11.2.0.1 x86_64
  • all devices are specified via UDEV symlinks like /dev/disk/by-id/scsi-1HITACHI_732342670003

Problem description

Nearly everything went well, but $GRID_HOME/root.sh failed this way:

ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
PROT-1: Failed to initialize ocrconfig

(By the way, see how to repeat root.sh here.)

This just tastes like catastrophe: Why does a full-fledged SAN-provided LUN has I/O errors? The cross-check with “dd” cooled me down immediately, because it did not show any problems, reading and writing provided full power, no stalls and no errors. So calmly looking up My Oracle Support (fka Metalink) brought me to Article ID 955550.1 that’s talking about ASMLIB. I did not use ASMLIB because I don’t like it, but they solved their problem by an owner change of the device used for OCR. The trouble described there is, that the Oracle Clusterware does not accept OS user “oracle” just as a member in unix group “disk” that has rw permissions on the device, no, user “oracle” has to be the owner of the device!

The way to a solution

I guessed it might be the problem here as well, and, true, changing the owner of my device “/dev/sdb” (target of symlink “/dev/disk/by-id/scsi-1HITACHI_732342670003”) to oracle:disk worked like a charm. But there’s always a “but”: My ASM disk for the OCR volume was, as stated above, specified via symlink to avoid device mikado (=pick-up sticks) 🙂 problems. So a new problem arised: How to tell Linux’ UDEV to change the owner of a single Linux device node for specified SAN LUN?

It did need some time of research, but the solution is really simple and thus easy to implement: Write a udev rule that references ID_SERIAL.
udevinfo -q all -p /sys/block/sdb
provides all metadata the driver has about the device in question. This is a sample output:

P: /block/sdb
N: sdb
S: disk/by-id/scsi-1HITACHI_732342670003
S: disk/by-path/pci-0000:08:01.0-fc-0x50060e801042ad70:0x0001000000000000
S: disk/by-id/ASMSYS
E: ID_VENDOR=HITACHI
E: ID_MODEL=DF600F
E: ID_REVISION=0000
E: ID_SERIAL=1HITACHI_732342670003
E: ID_TYPE=disk
E: ID_BUS=scsi
E: ID_PATH=pci-0000:08:01.0-fc-0x50060e801042ad70:0x0001000000000000

You can see, all I need for distinction between LUNs is the ID_SERIAL. The ID_* values are available as environment variables during rule parse, and a udev rule can have a match key like ENV{ID_SERIAL}==”something”, so I just modified a rule from /etc/udev/rules.d/60-persistent-storage.rules. As a goodie, its really easy to create user-styled symlinks, for example to beautify your ASM_DISKSTRING parameters!

Finish run

My rule now looks like:
KERNEL=="sd*[!0-9]", \
ENV{ID_SERIAL}=="1HITACHI_732342670007", \
SYMLINK+="disk/by-id/ASMSYS", \
OWNER="oracle"

(if you are using my example, keep in mind that udev rules have to be in one line, no breaks are allowed!)

As a last step, I created a new rules.d-file, “99-my-storage.rules” in rules.d directory, that’s hopefully not overwritten at update times.

Now all is well, and I think the solution is storageadmin-, reboot- and fool proof. Thanks to Mane for udev basics!

Just as always, be careful!
Martin Klier

Here some useful links:

Oracle Grid Infrastructure 11gR2: How to clean up to repeat root.sh
Oracle IMPDP, ORA-1555 and UNDO_RETENTION

2 thoughts on “Oracle Grid Infrastructure: UDEV fixing ORA-15081: failed to submit an I/O operation to a disk

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.