Archive for the 'IT' Category

Oracle RAC 12.2 High Load on CPU from gdb when Node Missing

Recently I had to battle a new issue with the quite-new Oracle Database RAC Version 12.2.0.1 on Linux x86_64. The idea of RAC is, to compensate the loss of a node or service by restarting services on other nodes. But in my case, when one node in a two-node-Cluster was down (or the crs stack stopped with crsctl stop crs), there was very high CPU load on the surviving node. In a pattern of three minutes 100% on four threads, every five minutes. That’s a bit disappointing – if we loose a node, we want all the CPU power of the survivor for our services, not for debugging, as interesting the debug results may be afterwards – but the service must stay up!

Oracle RAC 12.2 High Load on CPU from GDB when Node Missing

So where does that load come from? A quick research with top allows some insight: There are several GDB processes running.

Oracle RAC 12.2. high CPU load from GDB (top)

Oracle RAC 12.2. high CPU load from GDB (top)

Digging deeper with pstree, shows their origin.

Oracle RAC 12.2. high CPU load from GDB (pstree)

Oracle RAC 12.2. high CPU load from GDB (pstree)

Seems like osysmond starts diagsnap.pl due to an error condition. And likewise, diagsnap is running the debugger. Okay, once or twice would be ok, but up to six gdb’s at the same time as in our case, and every five minutes – thanks, but no thanks.

Diagsnap is used by the Oracle 12.2 Autonomous Health Framework to create diagnostics information in case of cluster issues. Mostly that means data for the management repository database.

I was able to reproduce the issue on all Linux RACs with 12.2 I have running, also verified with at least two customer systems.

Workaround

The Cluster Health Monitor can be configured not to collect this kind of information – you will reduce the amount of data Oracle Support can use to help you diagnosing cluster outages. But on the pro side, you will have the CPU power of the surviving node for your services, and that was the plan.

Activate all nodes in the cluster, and on one node, run

~$ oclumon manage -diable diagsnap

Here we go:
oclumon disabling diagsnap

oclumon disabling diagsnap

After that, I was not able to reproduce the issue.

Solution

Oracle told me, that a real fix will be part of the first 12.2 PSU, so I estimate to see it working after July 17th, 2017.

Readme

You may not be familiar with Cluster Health Monitor and the oclumon CLI. This is a good point for starting to read about the topic:
https://docs.oracle.com/database/121/CWADD/troubleshoot.htm#CWADD92242

 

 

Hope this is helpful – do it like we did and test thoroughly before commissioning your new RAC! :)
Martin Klier



Restarting Oracle for Performance is like a Coitus for Virginity

Today I had another case of that … total breakdown of a live environment by infrastructure defect, bad enough. Customer rushing up the database and all apps services, just to see thousands of locks and hanging transactions. Big panic, frequent restarting of databases and applications. Getting worse, finally calling me.

Told them I’ll manage that, asked them to shut down all apps until I tell them to start up. Started the DB, read the newspaper for 10 minutes, then they started calling me. Told them, that I am working on it. After 20mins of reading the paper and answering calls, asked them to start the apps. => Voila, all was well.

 

The answer? Just let your SMON do its job.

 

Ok, this initially was not exactly about performance, but the initial mistake was old Windows-like thinking: Reboot it, a reboot will activate magic self-healing powers. Fail, crashing Oracle (what the service infrastructure did in this case) is a very bad thing to do, and even worse, it “registers a mortgage” on the system: As soon as you start up an instance for this database later, it has not only to crash-recover, but ALSO to roll back all open transactions after the DB was opened. The latter was the problem in the above case: SMON kicks in late. Now the swarm of busy application bees buzzing around after such a crash, has to share resources with SMON. And has to wait for still-open row locks, spinning on them, eating more resources… and now it is a performance issue – a pointless, home made performance catastrophe.

Keep calm
Martin



Oracle RMAN destination with variable restrains “not backed up” clause

Today I was a bit surprised – being always under the impression that Oracle RMAN should accept “backup backupset not backed up to destination ‘xyz'” AND work with it successfully. But I learned that in 11.2.0.4 the Oracle RMAN “to destination” containing variables restrains the “not backed up” clause. No error thrown, but just does not work as I would expect it to. Could not find something in the docs, guess we just have to accept it as a fact.

Oracle RMAN destination with variable restrains “not backed up” clause

What I tried is, to back up the same backupset twice, but with a “not backed up” clause at work. I’d expect that to work once at max, and a skip for the second turn. But as you can see, not:

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31gdh2_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838557.6796.935838559 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31h5yd_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838583.5200.935838585 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

Solution – without dynamic path name

Read more…



Oracle Database 12.2 on-premises release dates

Oracle recently announced the availability of Oracle Database 12c release 2 (12.2) for on-premises servers.

https://www.performing-databases.com/en/2017/01/oracle-database-12-2-on-premises-release-dates/

Linux x86-64, Solaris SPARC and Solaris x86-64: 15-Mar-2017

Other supported platforms (Windows x86-64, IBM AIX on POWER, IBM Linux on System z, HP UX Itanium): Q2CY2017

Source:
MOS DOc ID 742060.1



Oracle Enterprise Linux 7: How to stop bash tab completion from escaping the dollar $

Gnu bash logo - By Justindorfman - Own work, CC BY-SA 4.0

Did you ever wonder why bash in Oracle Linux, Red Hat Enterprise Linux and other recent Linuxes escapes the dollar sign ($) when completing paths with the tabulator (tab) key like that:

[oracle@oratest01 ~]$ (KLM12102) ls -l \$ORACLE_BASE/diag

Answer

In bash >= 4.2.29 you have a “direxpand” option, which is NOT set by default. The bash manual explains it this way:

direxpand
If set, bash replaces directory names with the results of word expansion when performing filename completion. This changes the contents of the readline editing buffer. If not set, bash attempts to preserve what the user typed.

Solution

Set this option with

shopt -s direxpand

For example in your profile or wherever you feel it is useful.

Bye and be careful, as usual
– Martin



Oracle: How to identify SQL doing TABLE ACCESS FULL on a given partition

In my last post, I described a quick show case for Oracle 12c Automatic Big Table Caching. But ABTC is just a crock, nevertheless a useful one. The biggest help from Automatic Big Table Caching for “sustainable” database performance management is the temperature of segments, heated up by TABLE ACCESS FULL. In my case, after enabling ABTC the hottest segment was one I did not expect: A sink containing lots of passive data, a LIST partition holding the passive status. (Not) nice, and now the next question is, who does that?

Finding SQL_IDs doing Full Table Scans on a given table is no magic, V$SQL_PLAN allows it in a very simple way:

select distinct sql_id, plan_hash_value
 from v$sql_plan
 where
 object_owner='ME'
 and operation='TABLE ACCESS'
 and options like '%FULL%'
 and object_name='MYTABLE'
 group by sql_id,plan_hash_value
 order by sql_id,plan_hash_value;

But the challenge is, to find every TABLE ACCESS FULL that comes from a PARTITION LIST operation. Thus, we need a query that allows looking back one row in V$SQL_PLAN, to decide if the TABLE ACCESS FULL is relevant:

How to identify SQL doing TABLE ACCESS FULL on a given partition

Read more…



Oracle Automatic Big Table Caching – an improvement, not a fix

Too much TABLE ACCESS FULL in your Oracle Database? Thus, SQL elapsed time too slow for the demand? Plenty of Buffer Cache to create a temporary fix? maybe you want to consider Automatic Big Table Caching. Usually, Oracle only does Full Table Caching for small tables. Big ones will only use the Buffer Cache for the current chunk of blocks that’s transported (depending on the access method). Now Oracle 12c’s Automatic Big Table Caching will reserve a part (by percent) of the Buffer cache for full table scans, its filling priority is based on a heat map for segments: The more full table scans you have, the higher the “temperature” will get, and the higher the priority becomes. I calculated the target size of the cache by simply using the size of the segment I hoped to get cached.

Activation is simple and needs no restart:

alter system set db_big_table_cache_percent_target=20 scope=memory;

See the success:

select * from V$BT_SCAN_OBJ_TEMPS;

And after a while, you can see which segments are operaed using the cache, and why (by temperature):

select ot.*,
 round((ot.cached_in_mem/size_in_blks) * 100,0) as Pct_in_memory,
 o.*
from dba_objects o, V$BT_SCAN_OBJ_TEMPS ot
where ot.dataobj#=o.object_id
order by temperature desc;

That’s amazing, and so simple and so intriguing and so SEDUCTIVE…! But after all it’s just a full table scan, and if we can get rid of it, we should still get rid of it, at least for OLTP environments. But for a quick fix (instead of using a hint) I think Automatic Big Table Caching has a real value.

Keep your eyes wide open and your head on
Martin Klier



Oracle 11g unable to extend datafile but ASM disk group shows free space

Sometimes Oracle Database 11gR2 complains about not being able to extend a (lob) segment with ORA-1691, but ASM monitoring based on USABLE_FILE_MB did not fire. Adding a new data file fails with ASM ORA-15041. I was under the impression this behaviour of ASM deserved some explanation.

Situation

Database Alert Log complains with ORA-1691:

ORA-1691: unable to extend lobsegment MYUSER.SYS_LOB0013128030C00003$$ 
  by 128 in tablespace USERS
ORA-1691: unable to extend lobsegment MYUSER.SYS_LOB0013128030C00003$$ 
  by 8192 in tablespace USERS

Ok, a quick look comparing dba_segments and the tablespace size – it’s full. So let’s extend it with one more datafile:

SQL> alter tablespace USERS add datafile size 2G autoextend on next 1G maxsize 32G;
 alter tablespace IWACS add datafile size 2G autoextend on next 1G maxsize 32G
 *
 FEHLER in Zeile 1:
 ORA-01119: Fehler bei der Erstellung der Datenbankdatei '+ORADATA'
 ORA-17502: ksfdcre:4 konnte Datei +ORADATA nicht erstellen
 ORA-15041: diskgroup "ORADATA" space exhausted

Uh-oh, ORA-15041? But as I well know, the customer is monitoring FREE_MB and USABLE_FILE_MB in v$asm_diskgroup…? Maybe something nasty is going on, let’s check if ASM has to work on balancing:

SQL> select * from v$asm_operation;
no rows selected

Read more…



My COLLABORATE16 look-at’s – an aftermath

From April 10th to April 14th 2016 many Oracle professionals congregated in Las Vegas. The three big U.S. user groups, the IOUG, the OAUG and the Quest User Group once again formed one of the best Oracle Conferences in the world: COLLABORATE16. I was honored to be there as a speaker, and glad to perform two lectures in front of packed rooms (downloads here). Thank you all who made it possible and who attended!

But this post is about what I took home from COLLABORATE16. I never take the time to write down everything. But wherever I go, I take notes what to look at when I am back. And as often, I’d like to share my very personal best-of list of souvenirs from Las Vegas. It’s ordered by time – first session first, and all of them are top notch state-of-the-tech speakers, no quality comments necessary. It’s always a privilege to meet you, thank you all for sharing your experience!

Linux/Unix tools for the DBA

by Tim Gorman

  • sar -w context switching
  • sar -W (virt mem swapping)
  • sar -B (virt mem paging)
  • sar %steal = time stolen by hypervisor
  • sar = sadc (collector) + sadf (formatter)
  • mpstat (per processor stats)
  • mpstat column icsw (involuntary context switches)
  • ipcs interprocess comm data structures or resources
    • shared mem
    • semaphores
    • message queues
  • Oracle sysresv = IPC resources for $ORACLE_SID
  • pmap = process mem mapping display
  • pmap -x for all oracle processes / max of shared plus sum of the privat comp = mem oracle uses
  • jstat = Java VM mem usage displays
  • jstat Stop The World STW means cleaning up old objects due to dependencies / inheritance chains (vs. young ones w/o stopping)
  • jstat -gc (FGC column -> STW events since process start)
  • dtrace – very powerful performance analysis tool
  • adb – obtain stack trace from core dump
  • Oracle alter system dump
  • Oracle sosreport
  • Oracle OSwatcher

Read more…



Oracle SQL showing current expensive queries plus most expensive object from execution plan

Sometimes, it’s useful to see what’s currently going on in your Oracle Database. This query is meant to help with this task, that haunts every DBA from time to time. It’s just using v$sql and v$sql_plan, so no need for Enterprise Edition and Diagnostics Pack.

Key features:

  • find top buffer getters per execution from v$sql
  • add most costly object access from corresponding execution plan (from all cursors in question), using v$sql_plan
  • shows possible duplicates of the latter with same cost in a list aggregate, to reduce number of lines
  • shows the percentage of how much cost impact the most expensive object access has. related to the overall cost of the execution plan.

Exclusions:

  • does not show plans without SQL and no SQL without plan
  • has a limited view on object impact: Accessing objects is rated, but not that they cause bigger join costs etc.

Read more…




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.