Once again, there was a lesson to be learned about the Oracle RDBMS. The occaison was a direct upgrade from Enterprise Edition 10gR2 to EE 11gR2, the application was kept untouched for good reason.
Problem
After running the new version for two hours, everything became incredibly slow, and the CPU load on the DB server was hitting 100%. Research in the Oracle Wait Interface quickly showed me the wait event: “Cursor: mutex S”. The mutex S is a serialization mutex for the cursor cache. It is involved, as soon as two sessions try to (hard/soft) parse the same statement in sense of SQL_IDs.
A quick look into the cursor cache with v$sql immediately showed an UPDATE statement that has more than 1000 child cursors for the same SQL_ID, it was very likely to have found the culprit, because creating new child cursors means expensive hard parsing, and searching the list of existing children means CPU load as well.
During the next hours, it turned out that on 11gR2 for this machine and constellation off peak hours (thus with low DB load), the critical mass was around 1500 cursors. Beyond this point, the server became slow as described above: The mutex S ate up all CPU power available. In peak time, with lots of DB activity and high basic load, the critical number of child cursors for this UPDATE was around 300, so about 1/5 of “peacetime”. With a growth of 200 new child cursors per minute, expect no fun!
So what can we do about it?
Read more…