| Oracle® Real Application Clusters Administrator's Guide 10g Release 1 (10.1) Part Number B10765-01 |
|
|
View PDF |
This chapter explains i nstance recovery and how to use Recovery Manager (RMAN) to back up and restore Oracle Real Application Cluster (RAC) databases. This chapter also describes RAC instance recovery as well as parallel backup and recovery with SQL*Plus. The topics in this chapter includ e:
Instance failure occurs when software or hardware problems disable an instance. After instance failure, Oracle automatically uses the online redo logs to perform recovery as described in this section.
Insta nce recovery in RAC does not include the recovery of applications that were running on the failed instance. Oracle clusterware restar ts the instance automatically. You can also use callout programs as described in the example in the Oracle Real Application Clusters Deployment and Performance Guide to trigger application re covery.
Applications that were running continue by using failure recognition and recovery. This provides consistent and uninte rrupted service in the event of hardware or software failures. When one instance performs recovery for another instance, the survivin g instance reads online redo logs generated by the failed instance and uses that information to ensure that committed transactions ar e recorded in the database. Thus, data from committed transactions is not lost. The instance performing recovery rolls back transactions that were active at the time of the failure and rel eases resources used by those transactions.
|
Note: All online redo logs must be accessible for i nstance recovery. Therefore, Oracle recommends that you mirror your online redo logs. |
When multiple node failures occur, as lo ng as one instance survives, RAC performs instance recovery for any other instances that fail. If all instances of a RAC database fail, then Oracle a utomatically recovers the instances the next time one instance opens the database. The instance performing recovery can mount the dat abase in either shared or exclusive mode from any node of a RAC database. This recovery procedure is the same for Oracle running in s hared mode as it is for Oracle running in exclusive mode, except that one instance performs instance recovery for all the failed inst ances.
Oracle provides RMAN for backing up and restoring the database. RMAN enables you to b ack up, copy, restore, and recover datafiles, control files, SPFILEs, and archived redo logs. RMAN is included with the Oracle server and it is installed by default. You can run RMAN from the command line or you can use it from the Backup Manager in Oracle Enterpris e Manager. The procedures for using RMAN in RAC environments do not differ substantially from those for Oracle single-instance enviro nments. Refer to the Oracle Backup and Recovery documentation set for more information about single-instance RMAN backup procedures.< /p>
RMAN can restore the server parameter file either to the default location or to a location that you specify. This procedure is descri bed in Oracle Database Backup and Recovery Basics.
< /div>You cannot spe
cify a net service name that uses Oracle Net to distribute RMAN connections to more than one instance. In any RMAN connection made th
rough a net service name, each net service name must specify only one instance. This applies to all RMAN connections, whether from th
e command line or through the CONNECT clause in ALLOCATE CHANNEL or CONFIGURE
% rman TARGET SYS/oracle@node2 CATALOG rman/cat@catdb
When making backups using channels connected to different instances, each allocated channel can connect to a different instance in the cluster, and each channel connection must resolve to only one instance. For example, configure automatic channels as follows:
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CO NFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node2'; CON FIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node3';
During a backup, the instances to which the channels conne ct must be either all mounted or all open. For example, if the node1 instance has the database mounted while the node2 and node3 inst ances have the database open, then the backup fails.
In some cluster d atabase configurations, some nodes of the cluster have faster access to certain datafiles than to other datafiles. RMAN automatically detects this, which is known as node affinity awareness. When deciding which channel to use to back up a particular datafile, RMAN g ives preference to the nodes with faster access to the datafiles that you want to back up. For example, if you have a three-node clus ter, and if node 1 has faster read/write access to datafiles 7, 8, and 9 than the other nodes, then node 1 has greater node affinity to those files than nodes 2 and 3.
To use node affinity, configure RMAN channels on the nodes of the cluster that have affinit y to the datafiles you want to back up. For example, use the syntax:
CONFIGURE CHANNEL 1 DEVICE TYPE sb t CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TY PE sbt CONNECT 'user3/password3@node3';
Refer to Ora
cle Database Recovery Manager Reference for more information about the CONNECT clause of the CONFIGURE CH
ANNEL statement.
BACKUP command. For example, ass
ume that you run the following command on node 1 of a three-node cluster:
BACKUP DATABASE PLUS ARCHIVEL OG;
In this case, RMAN attempts to back up all datafiles, archived redo logs, and SPFILEs. Because the datafiles are eith er cluster file system files or files on a shared disk, RMAN can read them. However, RMAN cannot back up any of the archived redo log s that the local node cannot read. The archiving scenarios in Chapter 6, " Configuring Recovery Manager and Archiving " explain how to configure the environment so that all archived redo logs are accessible by the node performing th e backup.
The BACKUP command must be able to delete the archived redo logs from disk after backing them up. The following scr
ipt is an example of one method for deleting the archived redo logs from each node after backing them up:
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node1'; DELETE ARCHIVELOG LIKE '%arc_dest_1%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node2'; DELETE ARCH IVELOG LIKE '%arc_dest_2%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DIS K CONNECT 'SYS/oracle@node3'; DELETE ARCHIVELOG LIKE '%arc_dest_3%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL;< /div>
When configuring the backup options for RAC, you have three possible configurations:
Network Backup Server. A dedicated backup server performs and manages backups for the cluster and the cluster database. None of the nodes have local backup appliances.
One Local Drive. One node has access to a local backup appliance and performs and manages backups for the cluster database. All nodes of the cluster should be on a cluster file system to be able to read all datafiles, archived redo logs, and SPFILEs. Oracle re commends that you do not use the non-cluster file system archiving scheme if you have backup media on only one local drive.
Multiple Drives. Each node has access to a local backup appliance and can write to its own local backup media.
In the cluster file system scheme, any node can access all the datafiles, archived redo logs, and SPFILEs. In the non-
cluster file system scheme, you must write the backup script so that the backup is distributed to the correct drive and path for each
node. For example, node 1 can back up the archived redo logs whose path names begin with /arc_dest_1, node 2 can back u
p the archived redo logs whose path names begin with /arc_dest_2, and node 3 can back up the archived redo logs whose pa
th names begin with /arc_dest_3.
RMAN
automatically performs autolocation of all files that it needs to back up or restore. This feature is automatically enabled whenever
the allocated channels use different CONNECT or PARMS settings.
If you use the non-cluster file syst em local archiving scheme, then a node can only read the archived redo logs that were generated by an instance on that node. RMAN nev er attempts to back up archived redo logs on a channel it cannot read.
During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node 1, while log 1002 is backed up to the drive att ached to node 2. If you then allocate channels that connect to each node, then the channel connected to node 1 can restore log 1001 ( but not 1002), and the channel connected to node 2 can restore log 1002 (but not 1001).
This section describes the options for backup scenarios.
In a cluster file system backup scheme, each node in the cluster has read access to all the datafiles, archived red o logs, and SPFILEs. This includes Automated Storage Management (ASM), cluster file systems, and Network Attached Storage (NAS).
This scheme assume s that only one node in the cluster has a local backup appliance such as a tape drive. In this case, run the following one-time confi guration commands:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1; CONFIGURE DEFAULT DEVICE TYPE TO sbt;Because any node performing the backup has read/write access to the archived redo logs written by the other nodes, the backup script for any node is simple:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;In this case, the tape drive receives all datafiles, archived redo logs, and SPFILEs.
This scheme assumes that each node in the clus ter has its own local tape drive. Perform the following one-time configuration so that one channel is configured for each node in the cluster. This is a one-time configuration step. For example, enter the following at the RMAN prompt:
C ONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/pas sword1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user 3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK. The following b
ackup script, which you can run from any node in the cluster, distributes the datafiles, archived redo logs, and SPFILE backups among
the backup drives:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
For example, if the datab ase contains 10 datafiles and 100 archived redo logs are on disk, then the node 1 backup drive can back up datafiles 1, 3, and 7 and logs 1-33. Node 2 can back up datafiles 2, 5, and 10 and logs 34-66. The node 3 backup drive can back up datafiles 4, 6, 8 and 9 as w ell as archived redo logs 67-100.
In a non-cluster file system environment, each node can back up only its own local archive d redo logs. For example, node 1 cannot access the archived redo logs on node 2 or node 3 unless you configure the network file syste m for remote access. To configure NFS, distribute the backup to multiple drives. However, if you configure NFS for backups, then you can only back up to one drive.
This scheme assumes that each node in the cluster has its own local tape drive. Perform the following one-time config uration to configure one channel for each node in the cluster. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONN ECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK.
p>
Develop a production backup script for whole database backups that you can run from any node. The RMAN autolocation feature ens ures that the channel allocated on each node only backs up the archived redo logs that are located on that node. The following exampl e uses automatic channels to make a database and archived redo log backup:
BACKUP DATABASE PLUS ARCHIVE LOG DELETE INPUT;
In this example, the datafile backups, archived redo logs, and SPFILE backups are distributed among the
different tape drives. However, channel 1 can only read the logs archived locally on /arc_dest_1. This is because the a
utolocation feature restricts channel 1 to only back up the archived redo logs in the /arc_dest_1 directory and because
node 2 can only read files in the /arc_dest_2 directory, channel 2 can only back up the archived redo logs in the
/arc_dest_2 directory, and so on. The important point is that all logs are backed up, but they are distributed among
the different drives.
Media recovery must be user -initiated through a client application, whereas instance recovery is automatically performed by the database. In these situations, u se RMAN to restore backups of the datafiles and then recover the database. The procedures for RMAN media recovery in RAC environments do not differ substantially from the media recovery procedures for single-instance environmen ts. The node that performs the recovery must be able to restore all the required datafiles. That node must also be able to either rea d all the required archived redo logs on disk or be able to restore them from backups.
This section de scribes the RMAN restore scenarios.
The restore and r ecovery procedures in a cluster file system scheme do not differ substantially from Oracle single-instance scenarios.
First, refer to "Backing Up to One Local Drive in the Cluster File System Archiving Scheme" to perform the one-time configuration.
In this example, assume that node 3 performs the backups. If node 3 is available for the restore and recovery processing, and if all the existing archived redo logs have been backed up or are on dis k, then run the following commands to perform complete recovery:
RESTORE DATABASE; RECOVER DATABASE; < /pre>If node 3 performed the backups but is unavailable, then configure a media management device for one of the remaining nodes and make the backup media from node 3 available to this device.
First, refer to "Backing Up to Multiple Drives in the Cluster File Sy stem Archiving Scheme" to perform the one-time configuration so that one channel is configured for each node in the cluster. If a ll existing archived redo logs have been backed up or are on disk, then run the following commands for complete recovery from any nod e in the cluster:
RESTORE DATABASE; RECOVER DATABASE;
Because RMAN autolocates the backups before restoring them, the channel connected to each node only restores the files that were backed up to the tape drive attached to t he node.
In this scheme, each node archives locally to a different directory. For example, node 1 archives to /arc_de
st_1, node 2 archives to /arc_dest_2, and node 3 archives to /arc_dest_3. You must configure NFS so
that the recovery node can read the archiving directories on the remaining nodes. The restore and recovery procedure depends on wheth
er the backups are distributed or nondistributed.
First, refer to "Backing Up to Multiple Drives in a Non-Cluster File System Backup Scheme". If all nodes are available and if all archived redo logs have been backed up, then you can perform a complete restore and recovery by mo unting the database and running the following commands from any node:
RESTORE DATABASE; RECOVER DATABAS E;
The recovery node begins a server session on each node in the cluster. Because this example assumes that database back ups are distributed, the server sessions restore the backup datafiles from the tape drives attached to each node. Because the NFS con figuration enables each node read access to the other nodes, the recovery node can read and apply the archived redo logs located on t he local and remote disks. No manual transfer of archived redo logs is required.
You can use parallel instance recovery, parallel failure recovery, and p arallel media recovery in RAC databases. Refer to Oracle Databas e Backup and Recovery Advanced User's Guide for more information on these topics.
< div class="Sect2">With RMAN's RESTORE and RECOVER command
s, Oracle automatically makes parallel the following three stages of recovery:
When restoring datafiles, t he number of channels you allocate in the RMAN recover script effectively sets the parallelism that RMAN uses. For example, if you al locate five channels, you can have up to five parallel streams restoring datafiles.
Similarly, wh en you are applying incremental backups, the number of channels you allocate determines the potential parallelism.
RMAN applies archived redo logs using a specific number of parallel processes as determined by the setting for the <
a id="sthref288" name="sthref288">RECOVERY_PARALLELISM initialization parameter. This is described under the topic <
a href="#i491067">"Setting the RECOVERY_ PARALLELISM Parameter".
Media recovery parallelism is controlled by the PARALLEL clause of th
e ALTER DATABASE RECOVER statement.
If you have user-managed methods to back up and recover your database, then you can parallelize instance and me dia recovery using either of the procedures described in this section.
The RECOVERY_PARALLELISM initialization parameter specifies the number
of processes that participate in instance and crash recovery. One process reads the archived redo log files sequentially and dispatc
hes redo information to several recovery processes. The recovery processes then apply the changes from the archived redo log files to
the datafiles. A value of 0 or 1 indicates that Oracle performs recovery serially by one process. The value of this parameter cannot exceed the value of the parameter.
On mult
iple-CPU systems, the default for instance, crash, and media recovery is to operate in parallel mode. You can, however, enforce the u
se of serial recovery by using either the RECOVERY_PARALLELISM parameter or the NOPARALLEL clause of the DATABASE STATEMENT.
When you use the RECOVER statement to parallelize instance and media recovery, the allocation of recovery processes to insta
nces is operating system-specific. The DEGREE keyword of the PARALLEL clause can either signify the number
of processes on each instance of a RAC database or the number of processes to distribute across all instances.
To use a flash recovery area in RAC, you must place it on an ASM disk, a Cluster File System, or on a shared directory that is configured through NFS for each RAC instance. In other words, the flash recovery area must b e shared among all the instances of a RAC database. In addition, set the parameter DB_RECOVERY_FILE_DEST to the same value on all ins tances.