Slides for COLLABORATE 2014 IOUG forum talk #C14LV

Hi folks,

here’s my slides for “YOUR machine and MY databases – a performing relationship?!” at Collaborate 2014 Las Vegas. Any feedback is greatly appreciated!

collaborate14-logo

2014_141_Klier_odp_v1

2014_141_Klier_v1_doc

Some pictures to come soon!

You may want to follow the #C14LV hashtag to see what’s going on. :)

 



SQL Server Analysis Look At’s

End of March, I had a chance to dig a bit deeper into MS SQL Server analysis. The german chapter of PASS (http://www.sqlpass.org) organized one of their “Essentials”, a condensed training day with a specific topic. This one was named “SQL Server Analysis” and was held by Andreas Wolter (MCSM) in Nuremberg. Here comes my “look-at’s” – the usual public scratchpad of what I want to review as follow-up of an event.

  • There are two SQL Server related IO benchmark tools (like ORION that I’m familiar with, by Oracle): SQLIOSIM and SQLIO.
  • Event Tracing for Windos (ETW) is supported for MSSQL since Version 2008, use Windows Performance Recorder for that.
  • Try “perfmon /report”
  • Windows Server 2008R2 and above supports “mountpoints” in addition to drive letters. Nice, finally aligning POSIX?
  • Profiler can load PERFMON data and correlate them (old style / deprecated)
  • SQL Server reports: Disk Usage Report shows, for example, resizing operations. So check your file sizing policy this way.
  • Extended Events (XEvents) take 2µs, Profiler events take 4ms (=> Profiler is factor 2000 slower; source: SQL Server & BI blog)
  • SQL Server index growth is different from Oracle: They have no 90/10 split on the growth end of the leaf list, SQL Server adds empty leaf nodes
  •  Extended Events editor to be found under “Administration” in SQL Server Managament Studio (SSMS)
  • Setting up a Performance Data Warehouse brings historical performance data. Do the following:
    1. Create a Data Collection Set
    2. Create a Performance DWH
    3. Create a data Collector
  • Look at tools:
    1. Event Notifications
    2. SQL Diag (Log/Trace collector)
    3. RML Utilities

So for sure this is not everything – the more you dig into a RDBMS, the more surprises and fields of knowledge-to-learn you will find. Let’s go ahead, next time.

Good luck
Martin Klier

Edit: Version supporting ETW



DOAG Würzburg: “Resolving child cursor issues resulting in mutex waits”

I feel honored that DOAG asked me once more for being a speaker at an event – this time it’s DOAG Regionalgruppe Würzburg. I will give my (updated) C12LV and Oracle Open World 2012 talk “Resolving child cursor issues resulting in mutex waits” there, supported by my new business Performing Databases GmbH.

My abstract:
In special situations, the Oracle Database generates too many child cursors for particular SQL-IDs. This results in high CPU load on the DB server, coming from heavy mutex access. This is visible as mutex wait events. The lecture will show how this situation arises, how the DBA can try to quick-fix it and how long-term solutions can be found. Additionally, we will have a closer look on the Oracle internal situation: Why does the DB use mutexes here, and how?

When?
Thursday (Donnerstag) 27.03.2014
5 p.m. (17 Uhr)

Where?
Würzburger Versicherungs AG
Bahnhofstraße 11
97070 Würzburg
(Map)

The presentation will be available after the talk at www.performing-databases.com and on the DOAG online-resource page.



Oracle Dataguard: ORA-00600 [krsu_upi_atc.7] – crash when primary ships the first time

After setting up a new Oracle Dataguard system (primary plus one standby DB), everything looked promising.

But after activating the log shipping from primary, and after archiving a redo log for the first time, the primary instance crashed with ORA-00600 [krsu_upi_atc.7]. Without the standby system available (DB idle or listener off), no error occurred.

******************************************************************
 LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
 ******************************************************************
 LGWR: Archival destination is a Primary RAC instance: 'ITWMT2'
 Errors in file /u01/app/oracle/diag/rdbms/itwmt/ITWMT/trace/ITWMT_lgwr_22151.trc  (incident=18089):
 ORA-00600: internal error code, arguments: [krsu_upi_atc.7], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /u01/app/oracle/diag/rdbms/itwmt/ITWMT/incident/incdir_18089/ITWMT_lgwr_22151_i18089.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Errors in file /u01/app/oracle/diag/rdbms/itwmt/ITWMT/trace/ITWMT_lgwr_22151.trc:
 ORA-00600: internal error code, arguments: [krsu_upi_atc.7], [], [], [], [], [], [], [], [], [], [], []
 LGWR (ospid: 22151): terminating the instance due to error 470
 Wed Mar 19 15:27:07 2014
 ORA-1092 : opitsk aborting process
 Wed Mar 19 15:27:07 2014
 System state dump requested by (instance=1, osid=22151 (LGWR)), summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/rdbms/itwmt/ITWMT/trace/ITWMT_diag_22141_20140319152707.trc
 Dumping diagnostic data in directory=[cdmp_20140319152707], requested by (instance=1, osid=22151 (LGWR)), summary=[abnormal instance termination].
 Instance terminated by LGWR, pid = 22151

Instance terminated by LGWR did not look promising. Plus no search-engine-of-choice hits, no MOS search result. But re-reading the configuration unveiled a very basic mistake: The DB_UNIQUE_NAME of the two databases (primary and standby) was the SAME – not exactly the purpose of a UNIQUE name…. Changing it on standby side, and off it went.

Let me tell you, read carefully.
Martin Klier



DOAG Database Conference Düsseldorf: “YOUR machine and MY database” was accepted

DOAG just informed me that my talk, “YOUR machine and MY database – a performing relationship!?” was accpeted for DOAG Database Conference Düsseldorf. Last year, the first Database Conference at Düsseldorf was a great success, I enjoyed it very much. I’m glad to be part of the speaker’s corps this year.

Hope to see you there!

June 3rd, 2014
Van der Valk Airport Hotel
Düsseldorf

Event: http://www.doag.org/de/events/konferenzen/doag-2014-datenbank.html

 

 

2014-DB-Banner-600x160



Klug GmbH integrierte Systeme wins Oracle Excellence Award Germany 2013 category “ISV”

Klug wins Oracle Excellence Award

I’m proud to announce that my employer, Klug GmbH integrierte Systeme, Teunz (Germany), won the Oracle Excellence Award Germany 2013 in the category of Independent Software Vendors (ISVs).

Read more…



Oracle Clusterware root.sh issue: USM driver install actions failed (oracleoks.ko)

As I already said in my last post about “Can’t install ohasd service“, setting up Oracle Clusterware 11.2.0.4 on SuSE Linux Enterprise Server (SLES) SP2 should work flawlessly, but sometimes it does not. :) This time, it was about the USM drivers.

USM driver install actions failed
/u01/app/grid/11.2.0/perl/bin/perl -I/u01/app/grid/11.2.0/perl/lib 
-I/u01/app/grid/11.2.0/crs/install
/u01/app/grid/11.2.0/crs/install/rootcrs.pl execution failed

USM drivers are components (Kernel object files, extension .ko) enabling ACFS – I don’t use it on this system, but root.sh (in fact, rootcrs.pl) needs a decent directory structure related to the Linux Kernel version: Again, the log file “$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<hostname>.log” was my friend: It unveiled, that the problem was somewhat related to loading oracleoks.ko. And this file is in directory “$GRID_HOME/install/usm/Novell/SLES11/x86_64/<your-kernel-version>/default/bin”. Trouble is, that good old SLES 11 SP2 has a Kernel that was not foressen by the Oracle folks implementing this piece of software.

Read more…



Oracle Clusterware root.sh fails: Can’t install ohasd service: Inappropriate ioctl for device crsconfig_lib.pm line 5427

Setting up Oracle Clusterware 11.2.0.4 on SuSE Linux Enterprise Server (SLES) SP2 should work flawlessly, but sometimes it does not. :) It turned out that this would become a pair of blog entries. Second one is about “USM driver install actions failed (oracleoks.ko)“. But step by step. On Saturday morning, root.sh failed with the following error:

Failed to install ohasd startup script, error: Can’t install ohasd service: Inappropriate IOCTL (I/O-Control) for device

Can’t install ohasd service: Inappropriate IOCTL (I/O-Control) for device at /u01/app/grid/11.2.0/crs/install/crsconfig_lib.pm line 5427.

/u01/app/grid/11.2.0/perl/bin/perl -I/u01/app/grid/11.2.0/perl/lib -I/u01/app/grid/11.2.0/crs/install /u01/app/grid/11.2.0/crs/install/rootcrs.pl execution failed

There are several “My-Oracle-Support” (MOS) entries (bug notes and documents) for root.sh failing in crsconfig_lib.pm, but not for line 5427 – and the line really matters! This script does a lot, and usually different things in different lines. :)

Whenever dealing with root.sh malfunctions, the rootcrs logfile ($GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<hostname>.log) is your best friend. It appears in a not-too-verbose style, and if rootcrs.pl invokes OS- or third party commands, it quotes those outputs in a useful way – Bravo Zulu for the Oracle scripters here.

In my particular case, the problem was related to Linux’ insserv command, thats used to integrate ohasd into the SYS V startup script structure. My IBM Storage Manager Agent (service SMagent) and Oracle’s Trace File Analyzer (service init.tfa) had a dependency loop (dumbass SMagent depends on $all, /*NO COMMENT*/). In my case, I happily removed the $all dependency, and off it went.

Good luck with your GI
Martin



Oracle on AIX: How to find out the process memory usage

Calculating memory on Unix is tricky business. Especially when a complex software like Oracle Database has shared memory segments like SGA and Code Area.

One might be convinced to use the following construction to calculate the overall memory footprint of Oracle processes running on this machine:

ps -elf |egrep " oracle* | ora_.*_* " | grep -v egrep \\
| awk '{sum += $10} END {print sum/1024/1024}'

But that’s bad, since the sum is based on the SZ column of the “ps -elf” command. Unfortunately, SZ displays the full core image, but most of it is shared (remember the Oracle Code Area from the architecture diagram). So we greatly overestimate the memory footprint this way.

aix-memory-calculation

When you use “ps v” for a given PID, you get it more detailled: SIZE is the non shared data rump, TSIZE the shared text component of the image. In sum, they roughly add up to SZ.
(Units are all in KB)

I tried to find a solution. This is the original, overestimated version:

# ps -elf |egrep " oracle* | ora_.*_* " | grep -v egrep \\
| awk '{sum += $10} END {print sum/1024/1024}'
19.0745
(GB)

This one extracts the PID from “ps -ef”, executes “ps v” for each and adds them up. The greps might be a bit ugly, but it works for Oracle. :)

# for X in $(ps -ef | egrep " oracle* | ora_.*_*  " | grep -v egrep | awk '{print $2}'); \\
do ps v $X | grep ora | awk '{print $6}'; done \\
| awk '{sizesum += $1} END {print sizesum/1024/1024}'
1.57206
(GB)

I ran both commands on the same prod database system within the same second, so the difference should be realistic.

Stay safe
Martin

Thanks to Maxym’s old blog entry for great impressions!

Additional reading:
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/aix_memory_usage_or_who_is_using_the_memory_and_how20?lang=en



Speaking at COLLABORATE 14: “YOUR machine and MY database – a performing relationship!?”

I’m excited to announce that IOUG accepted my talk

“YOUR machine and MY database – a performing relationship!?”

for COLLABORATE 14 in Las Vegas.

collaborate14-logo

I’d love to see you there – for tech talk, gossip and meeting old and new friends!

Abstract:

Databases affect machines, machines affect databases. Optimizing one is pointless without knowing the other. System administrators and database administrators will not necessarily have the same opinion – often because they know little about the opposite’s needs. This lecture was made to promote understanding – showing how the database can stress the server, and how the server can limit the database. And why two admins sometimes don’t speak the same language, not even with a developer as an interpreter.

  • Recall the different needs of different technical layers underneath a database system.
  • Understand the technical collaboration of hardware, operating system and database.
  • Plot ways how to avoid collisions, competition and concurrency.
  • Promote collaboration!

Date, time and location:

Thu, Apr 10, 2014
01:00 p.m. – 02:00 p.m.

Level 3, Lido 3003

The Venetian and Sands Expo Center
201 Sands Ave
Las Vegas, NV 89169
USA

Presentation and papers

2014_141_Klier_odp_v1
2014_141_Klier_v1_doc