Archive for the 'IT' Category

Oracle 12.2 sysctl.conf Default Preinstall Settings

This are the currently active settings, the oracle-database-server-12cR2-preinstall.rpm delivers on Oracle Linux 7. Maybe it’s useful for you when setting up Oracle 12.2 on SUSE / SLES where you are on your own.

Maybe you also want to add Huge Pages:

# Performing Databases Huge Page Settings 
# for SGA size 200GB and huge page size of 2MB each
vm.nr_hugepages = 102401

# oracle-database-server-12cR2-preinstall setting for fs.file-max is 6815744
fs.file-max = 6815744

# oracle-database-server-12cR2-preinstall setting for kernel.sem is '250 32000 100 128'
kernel.sem = 250 32000 100 128

# oracle-database-server-12cR2-preinstall setting for kernel.shmmni is 4096
kernel.shmmni = 4096

# oracle-database-server-12cR2-preinstall setting for kernel.shmall is 1073741824 on x86_64
kernel.shmall = 1073741824

# oracle-database-server-12cR2-preinstall setting for kernel.shmmax is 4398046511104 on x86_64
kernel.shmmax = 4398046511104

# oracle-database-server-12cR2-preinstall setting for kernel.panic_on_oops is 1 per Orabug 19212317
kernel.panic_on_oops = 1

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_default is 262144
net.core.rmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_default is 262144
net.core.wmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_max is 1048576
net.core.wmem_max = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.all.rp_filter is 2
net.ipv4.conf.all.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.default.rp_filter is 2
net.ipv4.conf.default.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for fs.aio-max-nr is 1048576
fs.aio-max-nr = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500
net.ipv4.ip_local_port_range = 9000 65500

 



Oracle RAC 12.2 on SLES12 – How to disable Hardware Lock Elision Support

The Problem

In the last week, I had to set up an Oracle RAC (and thus, the Clusterware) version 12.2.0.1 on SuSE Linux Enterprise Server 12 SP2 (SLES12 Sp2). The gridSetup.sh failed latest in root.sh of the first node with a kinda non-intuitive error:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
2017/07/10 10:18:49 CLSRSC-119: Start of the exclusive mode cluster failed
Died at /u01/app/12.2.0/grid_1/crs/install/crsinstall.pm line 2053.
The command ‘/u01/app/12.2.0/grid_1/perl/bin/perl -I/u01/app/12.2.0/grid_1/perl/lib -I/u01/app/12.2.0/grid_1/crs/install /u01/app/12.2.0/grid_1/crs/install/rootcrs.pl ‘ execution failed

Thank you – for nothing.

The rootcrs.log in /tmp/GridAction<date> directory was a bit more enlightening, but not much:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
The exlusive mode cluster start failed, see Clusterware alert log for more information
Executing cmd: /u01/app/12.2.0/grid_1/bin/clsecho -p has -f clsrsc -m 119
Command output:
> CLSRSC-119: Start of the exclusive mode cluster failed 
>End Command output
CLSRSC-119: Start of the exclusive mode cluster failed
###### Begin DIE Stack Trace ######
Package File Line Calling 
--------------- -------------------- ---- ----------
1: main rootcrs.pl 287 crsutils::dietrap
2: crsinstall crsinstall.pm 2053 main::__ANON__
3: crsinstall crsinstall.pm 1963 crsinstall::perform_initial_config
4: crsinstall crsinstall.pm 653 crsinstall::perform_init_config
5: crsinstall crsinstall.pm 813 crsinstall::init_config
6: crsinstall crsinstall.pm 380 crsinstall::CRSInstall
7: main rootcrs.pl 446 crsinstall::new
####### End DIE Stack Trace #######

Eh… yes.

The error in the cluster alert log finally was better, and made me curious:

[ORAROOTAGENT(20948)]
CRS-8503: Oracle Clusterware process ORAROOTAGENT with operating system process ID 20948 experienced fatal signal or exception code 11.
Errors in file /u01/app/oracle/diag/crs/myserver08/crs/trace/ohasd_orarootagent_root.trc (incident=1):
CRS-8503 [__lll_unlock_elision()+48] [Signal/Exception: 11] [Instruction Addr: 0x7f5df148a4a0] [Memory Addr: (nil)] [] [] [] [] [] [] [] []

Ah. Of course. :)

Read more…



Oracle RAC 12.2 High Load on CPU from gdb when Node Missing

Recently I had to battle a new issue with the quite-new Oracle Database RAC Version 12.2.0.1 on Linux x86_64. The idea of RAC is, to compensate the loss of a node or service by restarting services on other nodes. But in my case, when one node in a two-node-Cluster was down (or the crs stack stopped with crsctl stop crs), there was very high CPU load on the surviving node. In a pattern of three minutes 100% on four threads, every five minutes. That’s a bit disappointing – if we loose a node, we want all the CPU power of the survivor for our services, not for debugging, as interesting the debug results may be afterwards – but the service must stay up!

Oracle RAC 12.2 High Load on CPU from GDB when Node Missing

So where does that load come from? A quick research with top allows some insight: There are several GDB processes running.

Oracle RAC 12.2. high CPU load from GDB (top)

Oracle RAC 12.2. high CPU load from GDB (top)

Digging deeper with pstree, shows their origin.

Oracle RAC 12.2. high CPU load from GDB (pstree)

Oracle RAC 12.2. high CPU load from GDB (pstree)

Seems like osysmond starts diagsnap.pl due to an error condition. And likewise, diagsnap is running the debugger. Okay, once or twice would be ok, but up to six gdb’s at the same time as in our case, and every five minutes – thanks, but no thanks.

Diagsnap is used by the Oracle 12.2 Autonomous Health Framework to create diagnostics information in case of cluster issues. Mostly that means data for the management repository database.

I was able to reproduce the issue on all Linux RACs with 12.2 I have running, also verified with at least two customer systems.

Workaround

The Cluster Health Monitor can be configured not to collect this kind of information – you will reduce the amount of data Oracle Support can use to help you diagnosing cluster outages. But on the pro side, you will have the CPU power of the surviving node for your services, and that was the plan.

Activate all nodes in the cluster, and on one node, run

~$ oclumon manage -disable diagsnap

Here we go:
oclumon disabling diagsnap

oclumon disabling diagsnap

After that, I was not able to reproduce the issue.

Solution

Oracle told me, that a real fix will be part of the first 12.2 PSU, so I estimate to see it working after July 17th, 2017.

Readme

You may not be familiar with Cluster Health Monitor and the oclumon CLI. This is a good point for starting to read about the topic:
https://docs.oracle.com/database/121/CWADD/troubleshoot.htm#CWADD92242

 

 

Hope this is helpful – do it like we did and test thoroughly before commissioning your new RAC! :)
Martin Klier



Restarting Oracle for Performance is like a Coitus for Virginity

Today I had another case of that … total breakdown of a live environment by infrastructure defect, bad enough. Customer rushing up the database and all apps services, just to see thousands of locks and hanging transactions. Big panic, frequent restarting of databases and applications. Getting worse, finally calling me.

Told them I’ll manage that, asked them to shut down all apps until I tell them to start up. Started the DB, read the newspaper for 10 minutes, then they started calling me. Told them, that I am working on it. After 20mins of reading the paper and answering calls, asked them to start the apps. => Voila, all was well.

 

The answer? Just let your SMON do its job.

 

Ok, this initially was not exactly about performance, but the initial mistake was old Windows-like thinking: Reboot it, a reboot will activate magic self-healing powers. Fail, crashing Oracle (what the service infrastructure did in this case) is a very bad thing to do, and even worse, it “registers a mortgage” on the system: As soon as you start up an instance for this database later, it has not only to crash-recover, but ALSO to roll back all open transactions after the DB was opened. The latter was the problem in the above case: SMON kicks in late. Now the swarm of busy application bees buzzing around after such a crash, has to share resources with SMON. And has to wait for still-open row locks, spinning on them, eating more resources… and now it is a performance issue – a pointless, home made performance catastrophe.

Keep calm
Martin



Oracle RMAN destination with variable restrains “not backed up” clause

Today I was a bit surprised – being always under the impression that Oracle RMAN should accept “backup backupset not backed up to destination ‘xyz'” AND work with it successfully. But I learned that in 11.2.0.4 the Oracle RMAN “to destination” containing variables restrains the “not backed up” clause. No error thrown, but just does not work as I would expect it to. Could not find something in the docs, guess we just have to accept it as a fact.

Oracle RMAN destination with variable restrains “not backed up” clause

What I tried is, to back up the same backupset twice, but with a “not backed up” clause at work. I’d expect that to work once at max, and a skip for the second turn. But as you can see, not:

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31gdh2_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838557.6796.935838559 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31h5yd_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838583.5200.935838585 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

Solution – without dynamic path name

Read more…



Oracle Database 12.2 on-premises release dates

Oracle recently announced the availability of Oracle Database 12c release 2 (12.2) for on-premises servers.

https://www.performing-databases.com/en/2017/01/oracle-database-12-2-on-premises-release-dates/

Linux x86-64, Solaris SPARC and Solaris x86-64: 15-Mar-2017

Other supported platforms (Windows x86-64, IBM AIX on POWER, IBM Linux on System z, HP UX Itanium): Q2CY2017

Source:
MOS DOc ID 742060.1



Oracle Enterprise Linux 7: How to stop bash tab completion from escaping the dollar $

Gnu bash logo - By Justindorfman - Own work, CC BY-SA 4.0

Did you ever wonder why bash in Oracle Linux, Red Hat Enterprise Linux and other recent Linuxes escapes the dollar sign ($) when completing paths with the tabulator (tab) key like that:

[oracle@oratest01 ~]$ (KLM12102) ls -l \$ORACLE_BASE/diag

Answer

In bash >= 4.2.29 you have a “direxpand” option, which is NOT set by default. The bash manual explains it this way:

direxpand
If set, bash replaces directory names with the results of word expansion when performing filename completion. This changes the contents of the readline editing buffer. If not set, bash attempts to preserve what the user typed.

Solution

Set this option with

shopt -s direxpand

For example in your profile or wherever you feel it is useful.

Bye and be careful, as usual
– Martin



Oracle: How to identify SQL doing TABLE ACCESS FULL on a given partition

In my last post, I described a quick show case for Oracle 12c Automatic Big Table Caching. But ABTC is just a crock, nevertheless a useful one. The biggest help from Automatic Big Table Caching for “sustainable” database performance management is the temperature of segments, heated up by TABLE ACCESS FULL. In my case, after enabling ABTC the hottest segment was one I did not expect: A sink containing lots of passive data, a LIST partition holding the passive status. (Not) nice, and now the next question is, who does that?

Finding SQL_IDs doing Full Table Scans on a given table is no magic, V$SQL_PLAN allows it in a very simple way:

select distinct sql_id, plan_hash_value
 from v$sql_plan
 where
 object_owner='ME'
 and operation='TABLE ACCESS'
 and options like '%FULL%'
 and object_name='MYTABLE'
 group by sql_id,plan_hash_value
 order by sql_id,plan_hash_value;

But the challenge is, to find every TABLE ACCESS FULL that comes from a PARTITION LIST operation. Thus, we need a query that allows looking back one row in V$SQL_PLAN, to decide if the TABLE ACCESS FULL is relevant:

How to identify SQL doing TABLE ACCESS FULL on a given partition

Read more…



Oracle Automatic Big Table Caching – an improvement, not a fix

Too much TABLE ACCESS FULL in your Oracle Database? Thus, SQL elapsed time too slow for the demand? Plenty of Buffer Cache to create a temporary fix? maybe you want to consider Automatic Big Table Caching. Usually, Oracle only does Full Table Caching for small tables. Big ones will only use the Buffer Cache for the current chunk of blocks that’s transported (depending on the access method). Now Oracle 12c’s Automatic Big Table Caching will reserve a part (by percent) of the Buffer cache for full table scans, its filling priority is based on a heat map for segments: The more full table scans you have, the higher the “temperature” will get, and the higher the priority becomes. I calculated the target size of the cache by simply using the size of the segment I hoped to get cached.

Activation is simple and needs no restart:

alter system set db_big_table_cache_percent_target=20 scope=memory;

See the success:

select * from V$BT_SCAN_OBJ_TEMPS;

And after a while, you can see which segments are operaed using the cache, and why (by temperature):

select ot.*,
 round((ot.cached_in_mem/size_in_blks) * 100,0) as Pct_in_memory,
 o.*
from dba_objects o, V$BT_SCAN_OBJ_TEMPS ot
where ot.dataobj#=o.object_id
order by temperature desc;

That’s amazing, and so simple and so intriguing and so SEDUCTIVE…! But after all it’s just a full table scan, and if we can get rid of it, we should still get rid of it, at least for OLTP environments. But for a quick fix (instead of using a hint) I think Automatic Big Table Caching has a real value.

Keep your eyes wide open and your head on
Martin Klier



Oracle 11g unable to extend datafile but ASM disk group shows free space

Sometimes Oracle Database 11gR2 complains about not being able to extend a (lob) segment with ORA-1691, but ASM monitoring based on USABLE_FILE_MB did not fire. Adding a new data file fails with ASM ORA-15041. I was under the impression this behaviour of ASM deserved some explanation.

Situation

Database Alert Log complains with ORA-1691:

ORA-1691: unable to extend lobsegment MYUSER.SYS_LOB0013128030C00003$$ 
  by 128 in tablespace USERS
ORA-1691: unable to extend lobsegment MYUSER.SYS_LOB0013128030C00003$$ 
  by 8192 in tablespace USERS

Ok, a quick look comparing dba_segments and the tablespace size – it’s full. So let’s extend it with one more datafile:

SQL> alter tablespace USERS add datafile size 2G autoextend on next 1G maxsize 32G;
 alter tablespace IWACS add datafile size 2G autoextend on next 1G maxsize 32G
 *
 FEHLER in Zeile 1:
 ORA-01119: Fehler bei der Erstellung der Datenbankdatei '+ORADATA'
 ORA-17502: ksfdcre:4 konnte Datei +ORADATA nicht erstellen
 ORA-15041: diskgroup "ORADATA" space exhausted

Uh-oh, ORA-15041? But as I well know, the customer is monitoring FREE_MB and USABLE_FILE_MB in v$asm_diskgroup…? Maybe something nasty is going on, let’s check if ASM has to work on balancing:

SQL> select * from v$asm_operation;
no rows selected

Read more…




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.