About

Martin Klier

usn-it.de

Oracle Database 18c changed MRP checkpointing from log boundary to time-based

The gap

The story started with an always-unpleasant alert log message:

2019-08-08T09:46:32.915070+02:00 
PR00 (PID:10646): Media Recovery Waiting for T-1.S-2306 
PR00 (PID:10646): Fetching gap from T-1.S-2306 to T-1.S-2405 
2019-08-08T09:46:32.925843+02:00 
Errors in file /u01/app/oracle/diag/rdbms/xxxp2/XXXP2/trace/XXXP2_mz00_10650.trc: 
ORA-01110: data file 1: '+ORADATA/XXXP2/DATAFILE/system.263.1009981381'

<... for all my data files ...>

2019-08-08T09:48:27.919844+02:00 
PR00 (PID:10646): FAL: Failed to request gap sequence 
PR00 (PID:10646): GAP - thread 1 sequence 2306-2405 
PR00 (PID:10646): DBID 123456789 branch 994167271 
PR00 (PID:10646): FAL: All defined FAL servers have been attempted 
PR00 (PID:10646): ------------------------------------------------------------------------- 
PR00 (PID:10646): Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization 
PR00 (PID:10646): parameter is defined to a value that's sufficiently large 
PR00 (PID:10646): enough to maintain adequate log switch information to resolve 
PR00 (PID:10646): archived redo log gaps. 
PR00 (PID:10646): -------------------------------------------------------------------------

The reason

Looking into v$archived_log: The log sequences 2306-2405 ARE applied on standby. And thus deleted by policy from the Fast Recovery Area.

Okay, this explains the effect, but not the root cause. So we had to open a SR with Oracle Support.

The real reason

Seems we have  BUG 29056767 – DATAFILES CHECKPOINT NOT UPDATED AT STANDBY DATABASE WHEN MRP IS ENABLED 
As a workaround to clarify if this really is the issue, MOS suggested to set an underscore parameter: _time_based_rcv_ckpt_target”=0
The fix for it is included in RU 19.4 and 20.1. FOr 18.x you have to request a backport as of today.

Yes, it worked as far as I can see as of today, but what does it do? It re-sets the so-called “time-based recovery checkpoint target” from 180 seconds to zero. What does THIS mean?

The change in 18c

Oracle changed the checkpointing behaviour of the Media Recovery Process (MRP) from “logfile boundary” to “time-based, every 180 seconds”. (And obviously messed up to catch all cases in conjunction with Data Guard…)

For my understanding, that’s a big change in the way an Oracle Database instance works – the check point at log switch is one of the fundamentals taught to DBAs all over the world. So is the similarity of work being done within Primary and Standby in a Data Guard environment. In this case, Oracle changed both and made the Standby side (again) a bit more specialized for its purpose. I think it’s time (again) to re-think our understanding of the Big Red DB Engine, at least for a part of it.

I can only guess the reasons for a change like this: Maybe they wanted to reduce the impact at log switch time, and reduce potential data loss to maximum of 180 seconds?

For sure we can expect much change in Oracle in the future – they went out of the monolith corner, and had a tendency into highly specialized, but smaller background processes for some time now. Next logical step is to adapt old structures and procedures to being more dynamic and specialized. I am looking forward to it!

Take care
Martin Klier

DOAG Regionalgruppe Nürnberg Anfahrt / Map

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.