Archive for the 'Linux / Unix' Category

Oracle on SLES12 SP2 – Avoiding Cgroup Task Limit

Once upon a time, there was an Oracle RAC cluster 12.2.0.1 on SUSE Linux Enterprise Server (SLES) 12 SP2, that did not do well. Its database and ASM instances used to fail with:

ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3

Of course, in such a case you will check ulimits -u / limits.conf (nproc) and sysctl.conf (kernel.pid_max). But what if this does not help?

Read more…



Oracle 12.2 sysctl.conf Default Preinstall Settings

This are the currently active settings, the oracle-database-server-12cR2-preinstall.rpm delivers on Oracle Linux 7. Maybe it’s useful for you when setting up Oracle 12.2 on SUSE / SLES where you are on your own.

Maybe you also want to add Huge Pages:

# Performing Databases Huge Page Settings 
# for SGA size 200GB and huge page size of 2MB each
vm.nr_hugepages = 102401

# oracle-database-server-12cR2-preinstall setting for fs.file-max is 6815744
fs.file-max = 6815744

# oracle-database-server-12cR2-preinstall setting for kernel.sem is '250 32000 100 128'
kernel.sem = 250 32000 100 128

# oracle-database-server-12cR2-preinstall setting for kernel.shmmni is 4096
kernel.shmmni = 4096

# oracle-database-server-12cR2-preinstall setting for kernel.shmall is 1073741824 on x86_64
kernel.shmall = 1073741824

# oracle-database-server-12cR2-preinstall setting for kernel.shmmax is 4398046511104 on x86_64
kernel.shmmax = 4398046511104

# oracle-database-server-12cR2-preinstall setting for kernel.panic_on_oops is 1 per Orabug 19212317
kernel.panic_on_oops = 1

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_default is 262144
net.core.rmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_default is 262144
net.core.wmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_max is 1048576
net.core.wmem_max = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.all.rp_filter is 2
net.ipv4.conf.all.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.default.rp_filter is 2
net.ipv4.conf.default.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for fs.aio-max-nr is 1048576
fs.aio-max-nr = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500
net.ipv4.ip_local_port_range = 9000 65500

 



Oracle RAC 12.2 on SLES12 – How to disable Hardware Lock Elision Support

The Problem

In the last week, I had to set up an Oracle RAC (and thus, the Clusterware) version 12.2.0.1 on SuSE Linux Enterprise Server 12 SP2 (SLES12 Sp2). The gridSetup.sh failed latest in root.sh of the first node with a kinda non-intuitive error:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
2017/07/10 10:18:49 CLSRSC-119: Start of the exclusive mode cluster failed
Died at /u01/app/12.2.0/grid_1/crs/install/crsinstall.pm line 2053.
The command ‘/u01/app/12.2.0/grid_1/perl/bin/perl -I/u01/app/12.2.0/grid_1/perl/lib -I/u01/app/12.2.0/grid_1/crs/install /u01/app/12.2.0/grid_1/crs/install/rootcrs.pl ‘ execution failed

Thank you – for nothing.

The rootcrs.log in /tmp/GridAction<date> directory was a bit more enlightening, but not much:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
The exlusive mode cluster start failed, see Clusterware alert log for more information
Executing cmd: /u01/app/12.2.0/grid_1/bin/clsecho -p has -f clsrsc -m 119
Command output:
> CLSRSC-119: Start of the exclusive mode cluster failed 
>End Command output
CLSRSC-119: Start of the exclusive mode cluster failed
###### Begin DIE Stack Trace ######
Package File Line Calling 
--------------- -------------------- ---- ----------
1: main rootcrs.pl 287 crsutils::dietrap
2: crsinstall crsinstall.pm 2053 main::__ANON__
3: crsinstall crsinstall.pm 1963 crsinstall::perform_initial_config
4: crsinstall crsinstall.pm 653 crsinstall::perform_init_config
5: crsinstall crsinstall.pm 813 crsinstall::init_config
6: crsinstall crsinstall.pm 380 crsinstall::CRSInstall
7: main rootcrs.pl 446 crsinstall::new
####### End DIE Stack Trace #######

Eh… yes.

The error in the cluster alert log finally was better, and made me curious:

[ORAROOTAGENT(20948)]
CRS-8503: Oracle Clusterware process ORAROOTAGENT with operating system process ID 20948 experienced fatal signal or exception code 11.
Errors in file /u01/app/oracle/diag/crs/myserver08/crs/trace/ohasd_orarootagent_root.trc (incident=1):
CRS-8503 [__lll_unlock_elision()+48] [Signal/Exception: 11] [Instruction Addr: 0x7f5df148a4a0] [Memory Addr: (nil)] [] [] [] [] [] [] [] []

Ah. Of course. :)

Read more…



Oracle RAC 12.2 High Load on CPU from gdb when Node Missing

Recently I had to battle a new issue with the quite-new Oracle Database RAC Version 12.2.0.1 on Linux x86_64. The idea of RAC is, to compensate the loss of a node or service by restarting services on other nodes. But in my case, when one node in a two-node-Cluster was down (or the crs stack stopped with crsctl stop crs), there was very high CPU load on the surviving node. In a pattern of three minutes 100% on four threads, every five minutes. That’s a bit disappointing – if we loose a node, we want all the CPU power of the survivor for our services, not for debugging, as interesting the debug results may be afterwards – but the service must stay up!

Oracle RAC 12.2 High Load on CPU from GDB when Node Missing

So where does that load come from? A quick research with top allows some insight: There are several GDB processes running.

Oracle RAC 12.2. high CPU load from GDB (top)

Oracle RAC 12.2. high CPU load from GDB (top)

Digging deeper with pstree, shows their origin.

Oracle RAC 12.2. high CPU load from GDB (pstree)

Oracle RAC 12.2. high CPU load from GDB (pstree)

Seems like osysmond starts diagsnap.pl due to an error condition. And likewise, diagsnap is running the debugger. Okay, once or twice would be ok, but up to six gdb’s at the same time as in our case, and every five minutes – thanks, but no thanks.

Diagsnap is used by the Oracle 12.2 Autonomous Health Framework to create diagnostics information in case of cluster issues. Mostly that means data for the management repository database.

I was able to reproduce the issue on all Linux RACs with 12.2 I have running, also verified with at least two customer systems.

Workaround

The Cluster Health Monitor can be configured not to collect this kind of information – you will reduce the amount of data Oracle Support can use to help you diagnosing cluster outages. But on the pro side, you will have the CPU power of the surviving node for your services, and that was the plan.

Activate all nodes in the cluster, and on one node, run

~$ oclumon manage -disable diagsnap

Here we go:
oclumon disabling diagsnap

oclumon disabling diagsnap

After that, I was not able to reproduce the issue.

Solution

Oracle told me, that a real fix will be part of the first 12.2 PSU, so I estimate to see it working after July 17th, 2017.

Readme

You may not be familiar with Cluster Health Monitor and the oclumon CLI. This is a good point for starting to read about the topic:
https://docs.oracle.com/database/121/CWADD/troubleshoot.htm#CWADD92242

 

 

Hope this is helpful – do it like we did and test thoroughly before commissioning your new RAC! :)
Martin Klier



Oracle Enterprise Linux 7: How to stop bash tab completion from escaping the dollar $

Gnu bash logo - By Justindorfman - Own work, CC BY-SA 4.0

Did you ever wonder why bash in Oracle Linux, Red Hat Enterprise Linux and other recent Linuxes escapes the dollar sign ($) when completing paths with the tabulator (tab) key like that:

[oracle@oratest01 ~]$ (KLM12102) ls -l \$ORACLE_BASE/diag

Answer

In bash >= 4.2.29 you have a “direxpand” option, which is NOT set by default. The bash manual explains it this way:

direxpand
If set, bash replaces directory names with the results of word expansion when performing filename completion. This changes the contents of the readline editing buffer. If not set, bash attempts to preserve what the user typed.

Solution

Set this option with

shopt -s direxpand

For example in your profile or wherever you feel it is useful.

Bye and be careful, as usual
– Martin



How to disable Oracle ACFS drivers / registry resource

Sometimes an installed ACFS can cause trouble, especially if we don’t or can’t use it (e.g. when not using an UEK Linux kernel, like with SuSE Linux Enterprise Server SLES). There’s lots of documentation how to create and maintain ACFS file systems, but how to get rid of ACFS at all wasn’t so easy to guess. I had to find out how to disable Oracle ACFS drivers, because the grid infrastructure did not stop successfully when using “crsctl stop crs” or “/etc/init.d/ohasd stop”:
CRS-2799: Failed to shut down resource ‘ora.drivers.acfs’ – and thus, it was not possible to upgrade the grid infrastructure 11.2.0.3 to version 12.1.0.2 (rootupgrade.sh fails on first node, also when trying to stop the CRS for the same reason).

How to disable Oracle ACFS drivers - runInstaller during upgrade

Read more…



Chemnitzer Linux Tage 2015

Last weekend, I had the chance to attend the Linux Days Chemnitz. It’s an annual meeting of the German Linux family with roundabout 2,500 attendees and a FANTASTIC atmosphere. It was two days of hearing tech talks, enjoying rich nerd content and talkin’ shop.

IMG_20150321_151204

Great atmosphere in the Great Hall

freie-kultur

Great entrance gift: “Free Culture” by Lawrence Lessig. Thank you very much!

Read more…



New German Linux Forum (forum.linux-talk.de)

In the last weeks, some folks were busy to build a new German Linux Forum “forum.linux-talk.de”, since the predecessor was systematically ruined by the commercial owners.

Especially Jean (wdp) and Hendrik (Nilpferd) invested much time and money into building the new environment. So the new forum is completely free of ads and commercials, and the content is QA’ed by a team of experienced Linux admins as moderators.

Please hang out there, and help us to (re)build a cool community.

Cheers
Martin Klier (Usn)



Oracle on Linux: How to hide your password when using a wrapper script

Sometimes, a DBA has to write an externally called wrapper script for various Oracle-supplied commands accepting password inputs. A prominent and simple example is SQL*plus (sqlplus).

The Problem

The process list shows all parameters of a command that’s currently executed.

wrapper1.sh

Accepts all connection infos on the command line:

#!/bin/bash
 cmdstring="sqlplus ${1}/${2}@${3}"
 echo "Executed command: $cmdstring"
 sleep 999

Called by:

$ ./wrapper1.sh system manager mydb
 Executed command: sqlplus system/manager@mydb

But now, the password is visible in the process list:

$ ps aux | grep wrapper1.sh | grep -v grep
 1000     20769  0.0  0.0  13808  1444 pts/1    S+   15:19   0:00 /bin/bash ./wrapper1.sh system manager mydb Read more...


Oracle Clusterware root.sh issue: USM driver install actions failed (oracleoks.ko)

As I already said in my last post about “Can’t install ohasd service“, setting up Oracle Clusterware 11.2.0.4 on SuSE Linux Enterprise Server (SLES) SP2 should work flawlessly, but sometimes it does not. :) This time, it was about the USM drivers.

USM driver install actions failed
/u01/app/grid/11.2.0/perl/bin/perl -I/u01/app/grid/11.2.0/perl/lib 
-I/u01/app/grid/11.2.0/crs/install
/u01/app/grid/11.2.0/crs/install/rootcrs.pl execution failed

USM drivers are components (Kernel object files, extension .ko) enabling ACFS – I don’t use it on this system, but root.sh (in fact, rootcrs.pl) needs a decent directory structure related to the Linux Kernel version: Again, the log file “$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<hostname>.log” was my friend: It unveiled, that the problem was somewhat related to loading oracleoks.ko. And this file is in directory “$GRID_HOME/install/usm/Novell/SLES11/x86_64/<your-kernel-version>/default/bin”. Trouble is, that good old SLES 11 SP2 has a Kernel that was not foressen by the Oracle folks implementing this piece of software.

Read more…




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.