Effecting Oracle Miracles With Standard Edition And Statspack (=without AWR)

For the most time, I have to work with Oracle Standard Edition database systems without any chance to use AWR, ASH or Diagnostics Pack in general. But performance problems have to be analysed in budget environments as well, and many people are complaining about Oracle’s policy in this area. But in my experience, it’s also possible to solve most response time issues with SE as well – maybe you have to go without a GUI or know a little bit more than the average Click-And-Buy EE DBA. (pardon!) :)

Today and from my position I can’t change the first fact for you, but maybe the latter: Here comes my personal Oracle SE grimoire – not much, but it works like a charm.

Read more…



R.I.P. Oracle Database 10g: “Oracle depends on humidity”

Or: “Why we should be happy about Adaptive Cursor Sharing
Premier Support for 10g R2 ended in July 2010, and Extended Support will end in July 2013, 11gR2 is widely used, and 12c is on the horizon. Maybe it’s time for writing a kind of obituary for my first Oracle love, and to spread a salutary anecdote Tom Kyte told me years ago.

 

The Story

There is a company of the smaller kind with around one hundred people working there, but very profitable. Their business depends on a busy (near-) real time booking system – the one that needs one of those smaller OLTP database systems most of you will know. As users usually do, they are not aware that this system does exist at all.

One rainy summer Friday, suddenly all booking user masks were dead slow. They had no full time DBA, and the on-site application supporter didn’t find the cause, short of his impression that it must be the database, what else? But it was a short Friday, and the users did not worry too much, just worked up old stuff you never finish during normal business. On Monday, the issue was forgotten.

Some weeks later, on the first Thursday in fall, the same issue occurred. Now it was middle of the week, the business impact was more serious, and management winced. The application supporter was forced to look deeper, and now he was sure: The database! Next day, everything was fine. Having a so-called “one time issue”, they postponed closer and expensive investigation to “next time, if it happens again at all”.

Two days later, they were nearly-frozen again, and in the following weeks they experienced more and more bad days like that. The IT department didn’t find anything, so they started pulling the usual straws of any kind. Random rebooting, hardware pimping, and so on. The most useful measure they did, was to collect information about everything unusual and write it down. The secretary who was told to do so disagreed that it’s useful, and for a joke, she started adding today’s weather on top of the daily recording. To her surprise, and what made her the hero of those days, she always had to start each bad day’s report with bad weather news: “rain”, “thunderstorm”, “snowing”. This was the point when IT started asking for reinforcements, and they asked me to come on-site on the next bad day.

I hadn’t to wait long, three days later I was called to come ASAP and thus, arrived at the company in heavy morning rain. First thing they told me, that they found out that afternoon rain does not impact the performance. Crazy folks, aren’t they?

Read more…



Speaking at Oracle OpenWorld 2012: “Resolving child cursor issues resulting in mutex waits”

On behalf of Klug GmbH integrierte Systeme and the IOUG, I will speak in the US for a second time this year: “Resolving child cursor issues resulting in mutex waits” at Oracle OpenWorld 2012 in San Francisco, Moscone Center.

It’s Session UGF10573 (User Group Forums)
Date and Time: Sunday, 9/30/12, 10:30 – 11:30
Venue / Room: Moscone West – 2016

Speaking at Oracle Open World 2012

Track:

DATABASE

Description:

In special situations, the Oracle Database generates too many child cursors for particular SQL-IDs. This results in high CPU load on the DB server, coming from heavy mutex access. This is visible as mutex wait events.

Read more…



Oracle: RHEL6 and Oracle Enterprise Linux 6 certified for Database 11.2.0.3

Hi Linux-DBAs,
Red Hat Enterprise Linux 6 and Oracle Enterprise Linux 6 are now certified for Oracle Database 11.2.0.3.

Please see this link:
https://support.oracle.com/CSP/main/article?cmd=show&id=1441282.1&type=NOT

Best regards
Martin



IOUG 2012 presentation: RESOLVING CHILD CURSOR ISSUES RESULTING IN MUTEX WAITS

Hi folks, thanks for attending my lesson!

Here comes the paper and presentation. If there are further questions, feel free to contact me – email and stuff is in the documents.

2012_893_Whitepaper (pdf)

2012_893_Presentation (pdf)

Regards
Martin



Upgrading Oracle Clusterware: [INS-40406] The installer detects no existing Oracle Grid Infrastructure software on the system

Upgrading Oracle Grid Infrastructure 11.2 (it’s just another marketing name for the Oracle Clusterware, formerly known as Cluster Ready Services CRS) usually is easy: The runInstaller offers an upgrade mode, and discovers nodes and versions without further effort.

But what, if not? The message will be “[INS-40406] The installer detects no existing Oracle Grid Infrastructure software on the system” and runInstaller just refuses to continue.

Finally I found out, that the culprit may be in the
“$ORACLE_BASE/../oraInventory/ContentxXML/inventory.xml” file. The Clusterware home may not be flagged as CRS there (CRS=”true” missing):

<HOME NAME="Ora11g_gridinfrahome2" LOC="/u01/app/11.2.0/grid"
                                 TYPE="O" IDX="6" CRS="true">
   <NODE_LIST>
      <NODE NAME="bs-klugdb1"/>
      <NODE NAME="bs-klugdb2"/>
   </NODE_LIST>

To repair it, the GI runInstaller has an UpdateNodeList parameter, so the command in this case looked like:

./runInstaller -updateNodeList
              ORACLE_HOME="/u01/app/11.2.0/grid" CRS=true

Well, that’s it. Easy if you know, and worth 3 hours of nighttime research, if not.

Have a good uptime
Martin



Speaking at IOUG COLLABORATE 12: “Resolving child cursor issues resulting in mutex waits”

On April 22-26, 2012 the Independent Oracle Users Group (IOUG) is holding the COLLABORATE 12 forum in Mandalay Bay Convention Center, Las Vegas (US-NV).

The COLLABORATE is a widely known event in the Oracle Community, and attracts lots of Oracle guys and girls from all over the world. I feel honored to contribute a presentation about how to detect and resolve child cursor issues resulting in mutex wait events. It will be Lesson #893 as part of the “Oracle Internals & Performance Bootcamp”, which is maintained by Craig Shallahamer. Here comes the official excerpt from the IOUG session planner:

 

Thursday, April 26, 2012

#893 – „Resolving child cursor issues resulting in mutex waits“

Read more…



DOAG Conference 2011 – Impressions and Look-at’s

Once again this year, the German Oracle Users Group has its annual conference and exhibition in Nuremberg. (DOAG Konferenz und Ausstellung 2011, Nürnberg). Being there is nearly a must for Oracle guys in German speaking countries.

As usual, here comes my unordered, incomplete and ad-hoc list of things I wrote down to have a closer look at in the next year, coming up during or from random talks I attended.

Day 1 (Tue 15.11.2011)

  • AVG_ROW_LENGTH of a table vs. Blocksize
  • CHAIN_CNT
  • analyze table X validate stucture cascade
  • Linux: Transcendent Memory
  • Linux: CleanCache and zcache
  • Linux: Cgroups
  • Linux: Transparent Huge Pages (wow!)
  • Linux: DTrace
  • Linux proprietary: Ksplice
  • View: registry$history for the REAL version number
  • Rolling Upgradable patches means minimal downtime on on Single Instance DBs, in combination with Out-of-Place-Upgrade)
  • Bug 10187168 in PSU 11.2.0.2.2 (_cursor_features_enables=1026
  • Typeset conversions: CSscan and DMU utilities
  • AIX patch following note 1246995 (Memory Footprint)

Read more…



Oracle Clusterware 11.2: ASM crashes at startup

These days, a customer’s Oracle Clusterware (2 nodes) crashed one ASM instance at every startup.

More Facts:

  • It was not possible to start it manually, too.
  • The CSSD was running.
  • For obvious reasons, CRSD did not start.
  • The other ASM instance in the cluster recognized CLUSTER RECONFIGURATION for a short period of time.

The ASM Alert Log file looked like:

Sun Nov 13 13:44:08 2011
 MMNL started with pid=21, OS id=7783
 lmon registered with NM - instance number 2 (internal mem no 1)
 Sun Nov 13 13:46:05 2011
 System state dump requested by (instance=2, osid=7684 (PMON)),
         summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_7706.trc
 Sun Nov 13 13:46:05 2011
 PMON (ospid: 7684): terminating the instance due to error 481
 Dumping diagnostic data in directory=[cdmp_20111113134605], requested by (instance=2, osid=7684 (PMON)),
         summary=[abnormal instance termination].
 Instance terminated by PMON, pid = 7684

Strange problem. Looking up device permissions, read write tests, rebooting the cluster in a downtime window – nothing.

To make a long story short: The NTP daemon did not get his time synchronisation, but was running. Thus, CTSS was in observer mode, and server time started drifting apart. Fixing NTP, fixed the cluster.

Nota bene
Martin



Linux Network bonding – setup guide

After looking up Linux bonding stuff for the thrid time, I planned to write an article aubout it. But there are lots of good blog posts on this, so just click here at unixfoo:

Linux Network bonding – setup guide

(strange link, I know, but it works)

Hope it helps for you next high avaliability project, like Oracle RAC, Oracle Grid Infrastructure or Oracle DataGuard.

Take care
Martin