Oracle: Create a Materialized View with Fast Refresh on commit and debug upcoming errors

Sometimes, our users and customers need rapid access to complex data structures. Materialized Views in their various forms are a very potent way to shift efforts from the query to insert/update time of base data. That’s the nature of a Materialized View: One way or another, we need to put the data into a redundant structure. With Fast Refresh, this way goes through Materialized View Logs, causing additional round trips and buffer modifications during transactions, or correctly, during DML operations.

Anyway, sometimes it’s just the best way to solve a problem, so this is how to create a Materialized View in Oracle Database 12.1 and after, with Fast Refresh on commit including debugging with DBMS_MVIEW.EXPLAIN_MVIEW.

Create a Materialized View with Fast Refresh on commit

Let’s drop the Base View first

drop view MYVIEW;

Define a Base View (not mandatory, direct queries would also work, even better)

create view MYVIEW as 
select a.id,
       b.id,
       c.id,
       a.rowid, -- ROWID is necessary if table has not Primary Key Constraint defined
       b.rowid,
       c.rowid
 from MYTABLE_A a, MYTABLE_B b, MYTABLE_C c
 where a.id=b.aid
       and b.id=c.bid
;

Read more…



Oracle on Windows: ASM instance terminated by LMON / ORA-27300 IPC_TCPConnectCheck failed with status -1

Recently I had an issue with a two-node Grid Infrastructure on Windows 2012R2. After an infrastructure-caused cluster restart (irresponsible SAN hardware patching :) ), everything was running on Node 2, and Node 1 could not join the cluster any more.

No easy solution: On CSSD level, there was no issue (network and disk heartbeat worked, according to ocssd.log). It turned out, that the ASM instance on Node 1 started, but its LMON could not communicate with the already-running ASM on Node 2: Instance terminated by LMON. No really speaking ORA error messages in its alert log.

But on the working Node 2, the ASM alert log shows
ORA-27300: OS system dependent operation:IPC_TCPConnectCheck failed with status: -1

Guessing from the module name, I started thinking about the network – and yes, somebody activated the Windows Firewall on Node 1. Strange that the errors did not show up on the node causing the error, but I was glad to have found the culprit.

How to check Windows firewall:
netsh advfirewall show currentprofile

Syntax that will always help with annoying firewalls, but has to be clarified by security:
netsh advfirewall set currentprofile state off

Lessons learned:

  1. People tend to introducing new problems during fixing others (in this case, messing with the Windows Firewall config during looking for a SAN problem), so DBAs, adapt your thinking to that.
  2. Obvious, but easy to forget: When diagnosing RAC / Clusterware issues, look into logs on all nodes (or build a central ADR)

Us usual, take care and think about the (other) box
Martin



Oracle on SLES12 SP2 – Avoiding Cgroup Task Limit

Once upon a time, there was an Oracle RAC cluster 12.2.0.1 on SUSE Linux Enterprise Server (SLES) 12 SP2, that did not do well. Its database and ASM instances used to fail with:

ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3

Of course, in such a case you will check ulimits -u / limits.conf (nproc) and sysctl.conf (kernel.pid_max). But what if this does not help?

Read more…



Oracle 12.2 sysctl.conf Default Preinstall Settings

This are the currently active settings, the oracle-database-server-12cR2-preinstall.rpm delivers on Oracle Linux 7. Maybe it’s useful for you when setting up Oracle 12.2 on SUSE / SLES where you are on your own.

Maybe you also want to add Huge Pages:

# Performing Databases Huge Page Settings 
# for SGA size 200GB and huge page size of 2MB each
vm.nr_hugepages = 102401

# oracle-database-server-12cR2-preinstall setting for fs.file-max is 6815744
fs.file-max = 6815744

# oracle-database-server-12cR2-preinstall setting for kernel.sem is '250 32000 100 128'
kernel.sem = 250 32000 100 128

# oracle-database-server-12cR2-preinstall setting for kernel.shmmni is 4096
kernel.shmmni = 4096

# oracle-database-server-12cR2-preinstall setting for kernel.shmall is 1073741824 on x86_64
kernel.shmall = 1073741824

# oracle-database-server-12cR2-preinstall setting for kernel.shmmax is 4398046511104 on x86_64
kernel.shmmax = 4398046511104

# oracle-database-server-12cR2-preinstall setting for kernel.panic_on_oops is 1 per Orabug 19212317
kernel.panic_on_oops = 1

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_default is 262144
net.core.rmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_default is 262144
net.core.wmem_default = 262144

# oracle-database-server-12cR2-preinstall setting for net.core.wmem_max is 1048576
net.core.wmem_max = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.all.rp_filter is 2
net.ipv4.conf.all.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.default.rp_filter is 2
net.ipv4.conf.default.rp_filter = 2

# oracle-database-server-12cR2-preinstall setting for fs.aio-max-nr is 1048576
fs.aio-max-nr = 1048576

# oracle-database-server-12cR2-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500
net.ipv4.ip_local_port_range = 9000 65500

 



Oracle RAC 12.2 on SLES12 – How to disable Hardware Lock Elision Support

The Problem

In the last week, I had to set up an Oracle RAC (and thus, the Clusterware) version 12.2.0.1 on SuSE Linux Enterprise Server 12 SP2 (SLES12 Sp2). The gridSetup.sh failed latest in root.sh of the first node with a kinda non-intuitive error:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
2017/07/10 10:18:49 CLSRSC-119: Start of the exclusive mode cluster failed
Died at /u01/app/12.2.0/grid_1/crs/install/crsinstall.pm line 2053.
The command ‘/u01/app/12.2.0/grid_1/perl/bin/perl -I/u01/app/12.2.0/grid_1/perl/lib -I/u01/app/12.2.0/grid_1/crs/install /u01/app/12.2.0/grid_1/crs/install/rootcrs.pl ‘ execution failed

Thank you – for nothing.

The rootcrs.log in /tmp/GridAction<date> directory was a bit more enlightening, but not much:

CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
The exlusive mode cluster start failed, see Clusterware alert log for more information
Executing cmd: /u01/app/12.2.0/grid_1/bin/clsecho -p has -f clsrsc -m 119
Command output:
> CLSRSC-119: Start of the exclusive mode cluster failed 
>End Command output
CLSRSC-119: Start of the exclusive mode cluster failed
###### Begin DIE Stack Trace ######
Package File Line Calling 
--------------- -------------------- ---- ----------
1: main rootcrs.pl 287 crsutils::dietrap
2: crsinstall crsinstall.pm 2053 main::__ANON__
3: crsinstall crsinstall.pm 1963 crsinstall::perform_initial_config
4: crsinstall crsinstall.pm 653 crsinstall::perform_init_config
5: crsinstall crsinstall.pm 813 crsinstall::init_config
6: crsinstall crsinstall.pm 380 crsinstall::CRSInstall
7: main rootcrs.pl 446 crsinstall::new
####### End DIE Stack Trace #######

Eh… yes.

The error in the cluster alert log finally was better, and made me curious:

[ORAROOTAGENT(20948)]
CRS-8503: Oracle Clusterware process ORAROOTAGENT with operating system process ID 20948 experienced fatal signal or exception code 11.
Errors in file /u01/app/oracle/diag/crs/myserver08/crs/trace/ohasd_orarootagent_root.trc (incident=1):
CRS-8503 [__lll_unlock_elision()+48] [Signal/Exception: 11] [Instruction Addr: 0x7f5df148a4a0] [Memory Addr: (nil)] [] [] [] [] [] [] [] []

Ah. Of course. :)

Read more…



Oracle RAC 12.2 High Load on CPU from gdb when Node Missing

Recently I had to battle a new issue with the quite-new Oracle Database RAC Version 12.2.0.1 on Linux x86_64. The idea of RAC is, to compensate the loss of a node or service by restarting services on other nodes. But in my case, when one node in a two-node-Cluster was down (or the crs stack stopped with crsctl stop crs), there was very high CPU load on the surviving node. In a pattern of three minutes 100% on four threads, every five minutes. That’s a bit disappointing – if we loose a node, we want all the CPU power of the survivor for our services, not for debugging, as interesting the debug results may be afterwards – but the service must stay up!

Oracle RAC 12.2 High Load on CPU from GDB when Node Missing

So where does that load come from? A quick research with top allows some insight: There are several GDB processes running.

Oracle RAC 12.2. high CPU load from GDB (top)

Oracle RAC 12.2. high CPU load from GDB (top)

Digging deeper with pstree, shows their origin.

Oracle RAC 12.2. high CPU load from GDB (pstree)

Oracle RAC 12.2. high CPU load from GDB (pstree)

Seems like osysmond starts diagsnap.pl due to an error condition. And likewise, diagsnap is running the debugger. Okay, once or twice would be ok, but up to six gdb’s at the same time as in our case, and every five minutes – thanks, but no thanks.

Diagsnap is used by the Oracle 12.2 Autonomous Health Framework to create diagnostics information in case of cluster issues. Mostly that means data for the management repository database.

I was able to reproduce the issue on all Linux RACs with 12.2 I have running, also verified with at least two customer systems.

Workaround

The Cluster Health Monitor can be configured not to collect this kind of information – you will reduce the amount of data Oracle Support can use to help you diagnosing cluster outages. But on the pro side, you will have the CPU power of the surviving node for your services, and that was the plan.

Activate all nodes in the cluster, and on one node, run

~$ oclumon manage -disable diagsnap

Here we go:
oclumon disabling diagsnap

oclumon disabling diagsnap

After that, I was not able to reproduce the issue.

Solution

Oracle told me, that a real fix will be part of the first 12.2 PSU, so I estimate to see it working after July 17th, 2017.

Readme

You may not be familiar with Cluster Health Monitor and the oclumon CLI. This is a good point for starting to read about the topic:
https://docs.oracle.com/database/121/CWADD/troubleshoot.htm#CWADD92242

 

 

Hope this is helpful – do it like we did and test thoroughly before commissioning your new RAC! :)
Martin Klier



Restarting Oracle for Performance is like a Coitus for Virginity

Today I had another case of that … total breakdown of a live environment by infrastructure defect, bad enough. Customer rushing up the database and all apps services, just to see thousands of locks and hanging transactions. Big panic, frequent restarting of databases and applications. Getting worse, finally calling me.

Told them I’ll manage that, asked them to shut down all apps until I tell them to start up. Started the DB, read the newspaper for 10 minutes, then they started calling me. Told them, that I am working on it. After 20mins of reading the paper and answering calls, asked them to start the apps. => Voila, all was well.

 

The answer? Just let your SMON do its job.

 

Ok, this initially was not exactly about performance, but the initial mistake was old Windows-like thinking: Reboot it, a reboot will activate magic self-healing powers. Fail, crashing Oracle (what the service infrastructure did in this case) is a very bad thing to do, and even worse, it “registers a mortgage” on the system: As soon as you start up an instance for this database later, it has not only to crash-recover, but ALSO to roll back all open transactions after the DB was opened. The latter was the problem in the above case: SMON kicks in late. Now the swarm of busy application bees buzzing around after such a crash, has to share resources with SMON. And has to wait for still-open row locks, spinning on them, eating more resources… and now it is a performance issue – a pointless, home made performance catastrophe.

Keep calm
Martin



Oracle RMAN destination with variable restrains “not backed up” clause

Today I was a bit surprised – being always under the impression that Oracle RMAN should accept “backup backupset not backed up to destination ‘xyz'” AND work with it successfully. But I learned that in 11.2.0.4 the Oracle RMAN “to destination” containing variables restrains the “not backed up” clause. No error thrown, but just does not work as I would expect it to. Could not find something in the docs, guess we just have to accept it as a fact.

Oracle RMAN destination with variable restrains “not backed up” clause

What I tried is, to back up the same backupset twice, but with a “not backed up” clause at work. I’d expect that to work once at max, and a skip for the second turn. But as you can see, not:

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31gdh2_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838557.6796.935838559 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

RMAN> backup backupset 4954 not backed up to destination '/u03/orabackupNAS/sun/backupset/%Y_%M_%D';

Starting backup at 13-FEB-17
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=17295, stamp=935838125, piece=1
channel ORA_DISK_1: starting piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece +FRADG1/sun/autobackup/2017_02_13/s_935838125.6660.935838127
piece handle=/u03/orabackupNAS/sun/backupset/2017_02_13/SUN/backupset/2017_02_13/o1_mf_ncsnf_TAG20170213T110205_db31h5yd_.bkp comment=NONE
channel ORA_DISK_1: finished piece 1 at 13-FEB-17
channel ORA_DISK_1: backup piece complete, elapsed time: 00:00:03
Finished backup at 13-FEB-17

Starting Control File and SPFILE Autobackup at 13-FEB-17
piece handle=+FRADG1/sun/autobackup/2017_02_13/s_935838583.5200.935838585 comment=NONE
Finished Control File and SPFILE Autobackup at 13-FEB-17

Solution – without dynamic path name

Read more…



Oracle Database 12.2 on-premises release dates

Oracle recently announced the availability of Oracle Database 12c release 2 (12.2) for on-premises servers.

https://www.performing-databases.com/en/2017/01/oracle-database-12-2-on-premises-release-dates/

Linux x86-64, Solaris SPARC and Solaris x86-64: 15-Mar-2017

Other supported platforms (Windows x86-64, IBM AIX on POWER, IBM Linux on System z, HP UX Itanium): Q2CY2017

Source:
MOS DOc ID 742060.1



Oracle Enterprise Linux 7: How to stop bash tab completion from escaping the dollar $

Gnu bash logo - By Justindorfman - Own work, CC BY-SA 4.0

Did you ever wonder why bash in Oracle Linux, Red Hat Enterprise Linux and other recent Linuxes escapes the dollar sign ($) when completing paths with the tabulator (tab) key like that:

[oracle@oratest01 ~]$ (KLM12102) ls -l \$ORACLE_BASE/diag

Answer

In bash >= 4.2.29 you have a “direxpand” option, which is NOT set by default. The bash manual explains it this way:

direxpand
If set, bash replaces directory names with the results of word expansion when performing filename completion. This changes the contents of the readline editing buffer. If not set, bash attempts to preserve what the user typed.

Solution

Set this option with

shopt -s direxpand

For example in your profile or wherever you feel it is useful.

Bye and be careful, as usual
– Martin