How to fail over a client transparently in a dataguard switchover/failover scenario

Hi,

with using dataguard (Oracle’s hot/cold standby database solution) you have the challenge to make a failover or a roleswitch between the primary and the standby database transparent to the clients. If you don’t, you will have a (partially) loss of service: Clients tnsnames.ora’s will need reconfiguration or the application has to use another TNS connection name now. This paper shows a different, very smart way. If you don’t want to read the whole thing and just have OCI and sql*plus clients to service, have a look at my quick summary. I tested it and use it for production now.

You have:

  1. a working dataguard setup with two database servers, one is primary, the other one is (physical) standby
  2. one or more client(s) with oracle client installed – I used 10.2.0.3, don’t know for sure if older versions are possible, too.

Use this as service entry in your client’s tnsnames.ora:

TESTDG =
 (DESCRIPTION =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1521))
    (LOAD_BALANCE = no)
   )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = HA-SERVICE)
    )
  )

With this syntax, you make sure that the client always uses the ADDRESS entry that provides the specified SERVICE_NAME. If not, the entry will be left out.

Now we have to specify the SERVICE_NAME parameter on the database dynamically: If the system is the primary, please add “HA-SERVICE” to the SERVICE_NAME parameter, and if it is the standby, remove the entry from there.
Since I don’t like to do that by myself, I talked Oracle into doing it for me. So first, we create a service – why? Because if a service is started, its network name is added to the SERVICE_NAME parameter list! :)

Execute this on your primary DB (you might need to execute

?/rdbms/admin/dbmssrv.sql

first to create DBMS_SERVICE):

exec DBMS_SERVICE.CREATE_SERVICE(
   service_name => 'HA-SERVICE',
   network_name => 'HA-SERVICE',
   aq_ha_notifications => true,
   failover_type => 'SELECT',
   failover_retries => 180,
   failover_delay=> 1);

The procedure should work out-of-the-box. Now check, if you can start and stop the service, and if its NETWORK_NAME shows up in the SERVICE_NAME parameter of your primary DB:

exec DBMS_SERVICE.START_SERVICE('HA-SERVICE');
show parameter service_name;
exec DBMS_SERVICE.STOP_SERVICE('HA-SERVICE');

If you made a mistake, DBMS_SERVICE.DELETE_SERVICE or DBMS_SERVICE.MODIFY_SERVICE might be your friend. But if all went well, let’s proceed.
Now, the joke continues: You can set up a trigger “after startup on database” in schema SYS – do you hear me?

create or replace trigger manage_HASERVICE
after startup on database
DECLARE
   role VARCHAR(30);
BEGIN
   select database_role into role from v$database;
   if role = 'PRIMARY' then
      DBMS_SERVICE.START_SERVICE('HA-SERVICE');
   else
      DBMS_SERVICE.STOP_SERVICE('HA-SERVICE');
   end if;
END;

We are asking V$DATABASE for the instance’s role, and decide whether to start the freshly created service or not. We are done – I love it. Don’t forget to fire a log switch if you don’t use real time apply – it’s no harm if you do it anyway:

alter system switch logfile;
alter system checkpoint;

Now test the scenario with a graceful switchover and check if the SERVICE_NAME parameter is ste as expected. If yes, perform the end to end test with your client(s). With a recent client and

failover_type => 'SELECT'

specified (it is if you did what I suggested above), you can start a SQL select on node1 as primary, perform a dataguard switchover while it runs, and get your statement finished on the new primary node2.

You’re done!

Additionally just a litte backgroud to the TNS (Transparent Network Substrate): It’s a classical layer 5 protocol. That’s good, because during the switchover/failover process your TCP connections will be terminated by option

aq_ha_notifications => true

mentioned above. But your SQL session will persist as long as your client-specific timeouts in SQLNET.ORA are allowing that. So TNS makes a session failover possible at all: The client does a reconnect on layer 4 (tcp), finds HA-SERVICE not provided by node1 and continues to node2. Smart thing, all you need to do is waiting for the new primary machine. By the way, this is exactly how client failover in RAC environments works.

Have a good time and take care,
Usn




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “How to fail over a client transparently in a dataguard switchover/failover scenario”

  1. Anju Garg Says:

    Here are two related links:

    Automatic client failover
    http://oracleinaction.com/client-failover-dg/

    Automatic job failover
    http://oracleinaction.com/job-failover-dg/

  2. Dave Gregoire Says:

    This worked perfectly with an 11.2.0.2 database and listener. It also worked when the 11.2 listener was shutdown and a 10.2.0.4 listener was brought up. But, when I create a 10.2.0.4 database and rant through the exact scenario, the clients will not connect using the service name. I wondered if the ‘-’ in the service name could be a problem, removed and recreated the service name without the dash but still no connection. Yes, the database is up, and I can connect remotely using a connect string that connected via SID.

    Any thoughts or comments would be appreciated as this really is great with the 11.2 database…

    -dave

  3. Vish Says:

    Hi, We have 2-node RAC primary and single node physical stby. We need to test failover where we want to pull the plug from Primary site (invoking site failure). Could you please guide me how to convert the new Primary (stby) to Stby again? How to make Primary site work as primary again. Your speedy reponse would be much appreciated

  4. usn Says:

    Hi Vish,

    the easiest way is to have flashback logs active on primary, and use them to flashback the DB back to before failover.

    Regards
    Martin

Leave a Reply