Oracle on SLES12 SP2 – Avoiding Cgroup Task Limit

Once upon a time, there was an Oracle RAC cluster 12.2.0.1 on SUSE Linux Enterprise Server (SLES) 12 SP2, that did not do well. Its database and ASM instances used to fail with:

ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3

Of course, in such a case you will check ulimits -u / limits.conf (nproc) and sysctl.conf (kernel.pid_max). But what if this does not help?

SUSE introduced a new Thread Limit using the PID cgroup controller. Its default value is 512, and obviously too small for any production RAC I ever saw. If your ohasd cgroup violates it, you will get one tiny and singular error message in dmesg and many of the above, uncertain ORA-27300 “fork failed with status:11”. And did you know what “11” is? it means “Try again”. :) Thank you.

cgroup: fork rejected by pids controller in /system.slice/ohasd.service

Next time running Oracle on SLES12 SP2, you may want to check /etc/systemd/system.conf, too. There is a parameter to configure the default max threads number per cgroup:

#DefaultTasksMax=512

Since it’s the default, it is commented. In my case, I wanted 65535, so now I am happy with:

DefaultTasksMax=65535

I hope you come about this blog post before spending too many hours on this pointless issue as I did. :)
Regards
Martin Klier

PS: That’s the link that says “Yes we changed a detail in our system with SP2”:
https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12-SP2/#fate-320358

PS2: Thank you to Stefan Köhler and many other idea-dispensers who helped to narrow down the issue!




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Oracle on SLES12 SP2 – Avoiding Cgroup Task Limit”

  1. Rafael Says:

    Hello Martin,

    i came across your blog while stucking a while after a sle12.2 upgrade and seeing my Oracle RAC processes dying immediately after startup. Your article saved me alot time and nerves. Thank you, great post!!!

    Greetings,
    Rafael

Leave a Reply