Convert the physical standby to snapshot standby database in oracle dataguard

A Snapshot Standby database still receives redo data from the primary but it does not apply the redo data until after it converted back to a physical standby. Keep in mind that a snapshot standby database cannot be the target of a switchover or failover. A snapshot must be converted back to a physical standby prior to performing a role transition. Flashback Database technology is used in the conversion process so the Fast (Flash) Recovery area must be configured.

This document will detail the steps to manually convert a physical standby to a snapshot standby.

Convert the Physical Standby Database into a Snapshot Standby Database

On the standby database stop redo apply.

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> 
Next convert the standby database to a snapshot standby.


SQL> alter database convert to snapshot standby;

Database altered.

SQL> 
Once the conversion is complete all that is left is to open the database.


SQL> alter database open;

Database altered.

SQL> 
You can verify the role change by querying the DATABASE_ROLE from V$DATABASE.


SQL> select database_role from v$database;

DATABASE_ROLE
----------------
SNAPSHOT STANDBY

SQL> 
While the standby is in snapshot standby mode you are free to run transactions against the snapshot standby.

While the standby is in snapshot mode it still continues to receive redo data from the primary but it does not apply the redo data. You verify the transport by switching logs on the primary and looking at the alert log on the standby.


Thu Jun 17 11:15:41 2010
RFS[6]: Selected log 5 for thread 1 sequence 981 dbid 459961910 branch 719914169
Thu Jun 17 11:15:41 2010
Archived Log entry 1513 added for thread 1 sequence 980 ID 0x1b7c5492 dest 2:
RFS[6]: Selected log 4 for thread 1 sequence 982 dbid 459961910 branch 719914169
Thu Jun 17 11:15:42 2010
When the standby was converted to a snapshot a guaranteed restore point was created. You can see this in the alert log for the standby.


Thu Jun 17 09:44:44 2010
RVWR started with pid=30, OS id=9171
Allocated 3981120 bytes in shared pool for flashback generation buffer
Created guaranteed restore point SNAPSHOT_STANDBY_REQUIRED_06/17/2010 09:44:44
krsv_proc_kill: Killing 3 processes (all RFS)
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival

When the snapshot standby is converted back into a physical standby this restore point will be used to flashback the standby to its original state prior to the conversion. If any operation is performed on the snapshot that cannot be reversed with Flashback Database will prevent the snapshot standby from being converted back to a physical standby.

Convert the Snapshot Standby Database back to a Physical Standby Database

Shutdown the snapshot standby database and bring it back up in the mount state.


SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount 
ORACLE instance started.

Total System Global Area  830930944 bytes
Fixed Size                  2217912 bytes
Variable Size             603981896 bytes
Database Buffers          222298112 bytes
Redo Buffers                2433024 bytes
Database mounted.
SQL> 

Next convert the snapshot to a physical standby.

SQL> alter database convert to physical standby;

Database altered.

SQL>

In the standby alert log you can see that Flashback restore completed and the restore point was dropped.


Thu Jun 17 11:48:29 2010
alter database convert to physical standby
ALTER DATABASE CONVERT TO PHYSICAL STANDBY (standby)
krsv_proc_kill: Killing 4 processes (all RFS)
Flashback Restore Start
Flashback Restore Complete
Stopping background process RVWR
Deleted Oracle managed file /u01/app/flash_recovery_area/STANDBY/flashback/o1_mf_61nf6w8g_.flb
Deleted Oracle managed file /u01/app/flash_recovery_area/STANDBY/flashback/o1_mf_61ngk05r_.flb
Guaranteed restore point  dropped
Clearing standby activation ID 461943091 (0x1b88b133)
The primary database controlfile was created using the
'MAXLOGFILES 16' clause.
There is space for up to 13 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 52428800;
Completed: alter database convert to physical standby

Shutdown the database and bring it back to the mount state.


SQL> shutdown immediate
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  830930944 bytes
Fixed Size                  2217912 bytes
Variable Size             603981896 bytes
Database Buffers          222298112 bytes
Redo Buffers                2433024 bytes
Database mounted.
SQL> 
If you take a look in the alert log you will see that the archive logs shipped when the standby was a snapshot standby are now applied.


Media Recovery Log /u01/app/oracle/oradata/standby/arch/1_970_719914169.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_971_719914169.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_972_719914169.dbf
Media Recovery Log /u01/app/oracle/oradata/standby/arch/1_973_719914169.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_974_719914169.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_975_719914169.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_976_719914169.dbf
Media Recovery Log /u01/app/oracle/oradata/standby/arch/1_977_719914169.dbf

Using Snapshot Standby you can leverage your standby for testing or other special purposes temporarily will still protecting your primary database.

Oracle data guard switch over and fail over

1. [PRIMARY] Switch log file on primary database.

SQL>alter system switch logfile;
2. [PRIMARY] Check switchover status before switching database.

SQL>select switchover_status from v$database;
You must see “TO_STANDBY” as result.

3. [PRIMARY] Switch primary database to standby database.

SQL>alter database commit to switchover to physical standby with session shutdown;

SQL>shutdown immediate;

SQL>startup nomount;

SQL>alter database mount standby database;
4. [PRIMARY] Defer for archive log apply. Because I didn’t set  my standby database as primary yet.

SQL>alter system set log_archive_dest_state_2=defer;
5. [Standby] Switch standby database to primary. Check switchover status before switching database.

SQL>select switchover_status from v$database;
You must see “TO_PRIMARY” as result. Now let’s swtich

SQL>alter database commit to switchover to primary;

SQL>shutdown immediate;

SQL>startup;
Our switchover process is successfully completed .
6. [PRIMARY] Start real-time recovery process..

SQL>recover managed standby database using current logfile disconnect;
Finally let’s open our database with “Read Only with Apply”.

SQL>recover managed standby database cancel;

SQL>alter database open;

SQL>recover managed standby database using current logfile disconnect;
FAILOVER:

In short, the failover is the deformation of the production (primary) database and activating standby database as the primary. It is not reversible. When enabled, re-create the standby database. What to do in case of failover:

(Important note: PRIMARY is the primary server and Standby is the standby server)

1. [PRIMARY] If the primary database is accessible and running, then it must provided  to send redo buffer to the standby database.

SQL> alter system flush redo to standby_db_name;

SQL>alter system archive log current;
If you don’t receive an error, you can continue with step 5th. In this case, the system can be opened by zero data loss. If you receive an error, We continue with step 2 to open the system at least data loss.

2. [Standby] We must run the following query to learn last applied archive log sequence number.

SQL> SELECT UNIQUE THREAD# AS THREAD, MAX(SEQUENCE#) OVER (PARTITION BY thread#) AS LAST from V$ARCHIVED_LOG;
3. [PRIMARY’dan Standby’ye] If you can access archive logs which are not copied to standby then copy archives to standby. After copy archive log files we must register them to standby database. This operation must be done for every thread.

SQL> alter database register physical logfile '/oracle/ora11g/dbs/arch/ TALIP_991834413_1_102.arc ';
4. [Standby] Check the standby database for redo gap. If there is a gap then we must copy archive log files and register.

SQL> SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;

SQL> alter database register physical logfile '/oracle/ora11g/dbs/arch/ TALIP_991834413_1_101.arc ';
As a result of the above query until it returns to zero.

5. [Standby] Stop the redo apply process in standby database.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
6. [Standby] Finish to apply archive logs copied from primary.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
If you get an error, it means there are redo logs not applied. Consider 2th and 4th steps. You can also continue with following command;

SQL> ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;
In this situation you can open database in 8th step.  If you get no error, continue with 7th step.

7. [Standby] Switch standby database to primary database.

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
8. [Standby] Open database.

SQL> ALTER DATABASE OPEN;
After opening standby database as primary with failover you must take full backup.

Oracle Data guard Protection modes

Maximum Protection
In the Maximum Protection mode a transaction is only confirmed as “Committed”, when the data has been written both local and into at least one Standby Redolog file. If the Standby Database or the network between the databases breaks down, transactions could not be performed any longer, the primary database shuts down automatically. Oracle recommends to only use Maximum Protection mode, if at least two Standby databases exist.

For the Maximum Protection mode the following parameters must be set for the Redo transport:

AFFIRM
SYNC
Maximum Availibility
This mode is a compromise between data security and performance. First the Maximum Availability mode works just like the Maximum Protection mode. That means all transactions are transmitted simultaneously and the commit of the transaction is only sent, when the transaction is saved both local and on at least one Standby Redolog file. With the 12c Oracle feature “FastSync” the performance can be increased a bit by already confirming the transaction, when the data reached the Standby side, so when it is located in that memory.

Unlike the Maximum Protection mode, the primary database continues working after a short time, if the Standby side breaks down or a network error occurs. By this it switches to the Maximum Performance mode automatically, that means transactions are committed immediately. Once the Standby database is available again it automatically switches back to the Maximum Protection mode.

Following parameters are responsible for the Redolog transport in Maximum Availability mode:

AFFIRM
SYNC
NET_TIMEOUT
The parameter NET_TIMEOUT (default 30) tells the time (in seconds) after that the database shall switch to the Maximum Performance mode.

Maximum Performance
The Maximum Performance mode is used, when the primary database must not be compromised. That means transactions are confirmed, once they are saved into the local Redolog files and asynchronously transmitted to the Standby database. So in case of a breakdown of the primary database you can expect a loss of transactions.

Following parameters are responsible for the Redolog transport here:

NOAFFIRM
ASYNC

Oracle RAC GRID Daemons and Background process


Oracle Cluster Specific Daemons:
Crsd :
The CRS daemon (crsd) manages cluster resources based on configuration information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes.

Cssd :
 Cluster Synchronization Service (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interfaces with your clusterware to manage node membership information. CSS has three separate processes: the CSS daemon (ocssd), the CSS Agent (cssdagent), and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides input/output fencing. This service formerly was provided by Oracle Process Monitor daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node. 
Diskmon :
Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon daemon is always started when ocssd is started. 
Evmd :
Event Manager (EVM): Is a background process that publishes Oracle Clusterware events 
Mdnsd :
Multicast domain name service (mDNS): Allows DNS requests. The mDNS process is a background process on Linux and UNIX, and a service on Windows. 
Gnsd :
Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS process performs name resolution within the cluster. 
Ons :
Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application Notification (FAN) events 
Oraagent :
oraagent: Extends clusterware to support Oracle-specific requirements and complex resources. It runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1). 
Orarootagent :
Oracle root agent (orarootagent): Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP address 
Oclskd :
Cluster kill daemon (oclskd): Handles instance/node evictions requests that have been escalated to CSS .
Gipcd :
Grid IPC daemon (gipcd): Is a helper daemon for the communications infrastructure 
Ctssd :
Cluster time synchronisation daemon(ctssd) to manage the time syncrhonization between nodes, rather depending on NTP. 

 RAC Background Process:

LMSn — Global Cache Service Process: It can mainly handle the cache fusion part. It handles the consistent copies of blocks that are transferred between instances. It receives the request from the LMD to perform lock requests. It rolls back any uncommitted transactions. There can be up to 10 LMS process running and can be started dynamically if demand requires. It also handles the global deadlock detections and monitors for the lock conversion timeouts.

LMON    — Global Enqueue Service Monitor: This process manages the GES, it maintains the consistency of GCSmemory in case of process death. It also responsible for cluster reconfiguration and locks reconfiguration.

LMD     — Global Enqueue Service Daemon: This manages eneque manager service request for the GCS. It also handles the deadlock detections and remote resource requests the other instances.

LCK0    — Instance Enqueue Process: Manages the instance resource requests and cross instance call operations for shared resources. It builds a list of invalid lock elements and validates the lock elements during recovery.

DIAG    — Diagnosability Daemon

GCS ensures a single system image of the data even though the data is accessed by multiple instances.

GES maintains or handles the synchronization of the dictionary cache, library cache, transaction locks, and DDL locks. In other words, GES manages enqueues other than data blocks. To synchronize access to the data dictionary cache, latches are used in exclusive (X) mode and in single-node cluster databases. Global enqueues are used in cluster database mode

RAC/Grid startup sequence in Oracle 11gR2


OHASD has access to OLR (oracle local registry). OHASD then reads the OLR content and initialize accordingly. 

OHASD brings up GPnP (ora.gpnpd)Daemon and CSS (ora.cssd) Daemon. 

CSS Daemon has access to the GPNP Profile stored on the local file system. I even found a copy of GPNP Profile directly stored  in OLR (in Oracle 12c release 2)

The Voting Files locations on ASM Disks are accessed by CSSD with well-known pointers in the ASM Disk headers and CSSD is able to complete initialization and start or join an existing cluster.

OHASD starts an ASM instance. The ASM instance uses special code to locate the contents of the ASM SPFILE, if it is stored in a Diskgroup. 

With an ASM instance operating and its Diskgroups mounted, access to Clusterware’s OCR is available to CRS.

OHASD then starts CRSD (ora.crsd)damon with access to the OCR in an ASM Diskgroup.

And thus Clusterware completes initialization and brings up other cluster managed resources defined in OCR.

Level 1: OHASD Spawns:
  • cssdagent – Agent responsible for spawning CSSD.
  • orarootagent – Agent responsible for managing all root owned ohasd resources.
  • oraagent – Agent responsible for managing all oracle owned ohasd resources.
  • cssdmonitor – Monitors CSSD and node health (along wth the cssdagent).
Level 2: OHASD rootagent spawns:
  • CRSD – Primary daemon responsible for managing cluster resources.
  • CTSSD – Cluster Time Synchronization Services Daemon
  • Diskmon
  • ACFS (ASM Cluster File System) Drivers
Level 2: OHASD oraagent spawns:
  • MDNSD – Used for DNS lookup
  • GIPCD – Used for inter-process and inter-node communication
  • GPNPD – Grid Plug & Play Profile Daemon
  • EVMD – Event Monitor Daemon
  • ASM – Resource for monitoring ASM instances
Level 3: CRSD spawns:
  • orarootagent – Agent responsible for managing all root owned crsd resources.
  • oraagent – Agent responsible for managing all oracle owned crsd resources.
Level 4: CRSD rootagent spawns:
  • Network resource – To monitor the public network
  • SCAN VIP(s) – Single Client Access Name Virtual IPs
  • Node VIPs – One per node
  • ACFS Registery – For mounting ASM Cluster File System
  • GNS VIP (optional) – VIP for GNS
Level 4: CRSD oraagent spawns:
  • ASM Resouce – ASM Instance(s) resource
  • Diskgroup – Used for managing/monitoring ASM diskgroups.
  • DB Resource – Used for monitoring and managing the DB and instances
  • SCAN Listener – Listener for single client access name, listening on SCAN VIP
  • Listener – Node listener listening on the Node VIP
  • Services – Used for monitoring and managing services
  • ONS – Oracle Notification Service
  • eONS – Enhanced Oracle Notification Service
  • GSD – For 9i backward compatibility
  • GNS (optional) – Grid Naming Service – Performs name resolution