| Oracle® High Availability Architecture and Best Practices 10g Release 1 (10.1) Part Number B10726-01 |
|
![]() Previous |
![]() Next< /font> |
This chapter describes the detailed recovery operations that are referred to in the outages and solutions tables in Chapter 9, "Recovering from Outages". It includes the following topics:
This chapter describes the detailed reco very operations that are referred to in the outages and solutions tables in Chapter 9, "Recovering from Outages". Table 10-1 summarizes the recovery operations that are described in this c hapter.
| Recovery Operation | Description |
|---|---|
|
In the complete site fai lover scenario, existing connections fail and new connections are routed to a secondary or failover site. This occurs when there is a true disaster and where the application stack is replicated. In the partial site failover s cenario, the primary site is intact, and the middle-tier applications need to be redirected after the database has failed over or swi tched over to a standby database on the secondary site. This configuration is not recommended if performance decreases dramatically b ecause of the greater latency between the application servers and the database. | |
|
A Data Guard failover is invoked in the database tier because of an unscheduled outa ge. If complete recovery is attempted, then there is minimal or no data loss. A subsequent complete or partial site failover must occ ur. The previous production database can be converted to a standby database by flashing back the database. | |
|
A Data Guard switchover occurs in the database tier. Th e previous production database becomes the new standby database while the previous standby database becomes the new production databa se. A subsequent complete or partial site failover must occur. This is a scheduled or planned outage. | |
|
RAC Recovery< /p> |
RAC automatically handles instance and node failures at a given sit e to provide continued access to the backend database. The reconnection to available instances can be transparent from an application standpoint and occurs only on the primary site. The application service migrates to the available or designated instances. | <
/tr>
|
When a standby node requires maint enance, you can switch to another standby instance to avoid any impact on the production database and to ensure that the standby reco very does not fall behind. When the standby cluster requires maintenance or the standby cluster fails, if the maximum protection mode is enabled, then the production database needs to be downgraded to maximum availability mode or maximum performance mode. |
tr>
|
|
Application failover td> |
The client or application tier automatically fails over to one or more s urviving RAC instances when an instance or node failure occurs at the primary site. |
|
When data failure occurs due to media corruption or media damage, you can use different recovery options that use the flash reco very area or switch over to the standby database. Alternatively, you can rebuild tables by fast imports or rebuild indexes in paralle l. | |
|
When a user error causes transactional or logical data inconsistencies, you can resolve these problems by using flashback erro r correction technology at the row, transaction, table, or database levels. | |
|
With RAC, you can apply patches for specific customer issues incrementally to one node or instance at a time, which enables continual application and database availability. | |
|
Data Guard enables you to upgrade the database on the sta ndby database and perform a switchover, which minimizes the overall scheduled outage time for applying patch sets or software upgrade s for the database. | |
|
Many scheduled outages related to the data server involve some reorganization of the database objects. They need to be accomplished w ith continued availability of the database. Oracle online object reorganization is used to manage the scheduled outages. |
This section describes the fo llowing client failover scenarios:
In the complete site failover scenario, existing connections fail and new connections are routed to a secondary or failover site. This occurs when t here is a true disaster and where the application stack is replicated.
In the partial site failover scenario, the primary site is intact, and the middle-tier applications need to be redirected after the database has failed o ver or switched over to a standby database on the secondary site. This configuration is not recommended if performance decreases sign ificantly because of the greater latency between the application servers and the database.
A wide-area traffic manager is implemented on the primary and secondary sites to provide the sit e failover function. The wide-area traffic manager can redirect traffic automatically if the primary site or a specific application o n the primary site is not accessible. It can also be triggered manually to switch to the secondary site for switchovers. Traffic is d irected to the secondary site only when the primary site cannot provide service due to an outage or after a switchover. If the primar y site fails, then user traffic is directed to the secondary site.
Figure 10-1 illustrates the network routes before site failover. Client requests enter the client tier of the primary site and travel by the WAN traffic manager. Client requests are sent through the firewall into the demilitarized zone (DMZ) to the ap plication server tier. Requests are then forwarded through the active load balancer to the application servers. They are then sent th rough another firewall and into the database server tier. The application requests, if required, are routed to a RAC instance. Respon ses are sent back to the application and clients by a similar path.
Text description of the illustration maxav013. gif
Figure 10-2 illustrates the network routes a fter site failover. Client or application requests enter the secondary site at the client tier and follow exactly the same path on th e secondary site that they followed on the primary site.
Text description of the illustration maxav028.gif
The following steps describe what happens to network traffic during a failover or switchover. p>
Failover also depends on the cl ient's web browser. Most browser applications cache the DNS entry for a period of time. Consequently, sessions in progress during an outage may not fail over until the cache timeout expires. The only way to resume service to such clients is to close the browser and restart it.
This usually occurs after the database has been failed over or switched over to the secondary site and the middle- tier applications remain on the primary site. The following steps describe what happens to network traffic during a partial site fail over:
Figure 10-3 shows the network routes after partial site failover. Client and application requ ests enter the primary site at the client tier and follow the same path to the database server tier as in Figure 10-1. When the requests enter the database server tier, they are routed to the database tier of the secondary site through any additional switches, routers, and possible firewalls.
Text description of the illustration max av031.gif
Failover is the operation of taking the produ ction database offline on one site and bringing one of the standby databases online as the new production database. A failover operat ion can be invoked when an unplanned catastrophic failure occurs on the production database, and there is no possibility of recoverin g the production database in a timely manner.
Data Guard enables you to fail over by issuin g the SQL statements described in subsequent sections, by using Oracle Enterprise Manager, or by using the Oracle Data Guard broker c ommand-line interface.
| See Also:
Oracle Data Guard Broker for information about using Enterprise Manager or the Data Guard broker command-line for database failover |
Data Guard failover is a series of steps to convert a sta ndby database into a production database. The standby database essentially assumes the role of production. A Data Guard failover is a ccompanied by a site failover to fail over the users to the new site and database. After the failover, the secondary site contains th e production database. The former production database needs to be re-created as a new standby database to restore resiliency. The sta ndby database can be quickly re-created by using Flashback Database. See "Restoring the Standby Databas e After a Failover".
During a failover operation, lit tle or no data loss may be experienced. The complete description of a failover can be found in Oracle Data Guard Concepts and Administration.
The rest of this section includes the following topics:
Data Guard failover should be used only in the case of an emergency and should be initiated due to an unplanned outage such as:
< p class="BP">A failover requires that the initial production database be re-created as a standby database to restore fault tolerance to your environment. The standby database can be quickly re-created by using Flashback Database. See "R estoring the Standby Database After a Failover".Do not use Data Guard failover when the problem can be fixed locally in a timely manner or when Data Guard switchover can be used. For failover with complete recovery scenarios, either the production database is not acce ssible or cannot be restarted. Data Guard failover should not be used where object recovery or flashback technology solutions provide a faster and more efficient alternative.
This section includes the following topics:
SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;
| See Also:
Oracle Data Guard Concepts and Administration f or more information about what to do if a gap exists |
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
SELECT SWITCHOVER_STATUS FROM V$DATABASE;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
ALTER DATABASE REGISTER L OGICAL LOGFILE 'file_name';
NODELAY and FINISH clauses.
ALTER DATABASE START LOGICAL STANDBY APPLY NODELAY FINISH;
ALTER DATABASE ACT IVATE LOGICAL STANDBY DATABASE;
Failover has completed, and the new production database is available to process transactions.
A database switchover performed by Oracle Data Guard is a planned transition that includes a series of steps to switch roles between a standby database and a production database. Thus, following a successful switchover operation, the standby database a ssumes the production role and the production database becomes a standby database. In a RAC environment, a switchover requires that o nly one instance is active for each database, production and standby. At times the term "switchback" is also used within the scope of database role management. A switchback operation is a subsequent switchover operation to return the roles to their original state. p>
Data Guard enables you to change these roles dynamically by issuing the SQL statements descr ibed in subsequent sections, or by using Oracle Enterprise Manager, or by using the Oracle Data Guard broker command-line interface. Using Oracle Enterprise Manager or the Oracle Data Guard broker command-line interface is described in Oracle Data Guard Broker.
Th is section includes the following topics:
Sw itchover is a planned operation. Switchover is the capability to switch database roles between the production and standby databases w ithout needing to instantiate any of the databases. Switchover can occur whenever a production database is started, the target standb y database is available, and all the archived redo logs are available. It is useful in the following situations:
Switchover is not possible or practical u nder the following circumstances:
Do not use Data G uard switchover when local recovery solutions provide a faster and more efficient alternative. The complete description of a switchov er can be found in Oracle Data Guard Concepts and Administration.
If you are not using Oracle Enterprise Manager, then the high-level steps in this section can be executed with SQL*Plus. These steps are described in detail in Oracle Data Guard Concepts and Admin istration.
This section includes the following topics:
To identify active se ssions, execute the following query:
SELECT SID, PROCESS, PROGRAM a> FROM V$SESSION WHERE TYPE = 'USER' AND SID <> (SELECT DISTINCT SID FROM V$MYSTAT);
'TO STANDBY'.
SELECT SWITCHOVER_STATUS FROM V$DATABASE;
ALTER DATABASE COMMIT TO SWITCHOVER TO STAN DBY [WITH SESSION SHUTDOWN];
STARTUP MOUNT; RECOVER MANAGED STANDBY DATA BASE USING CURRENT LOGFILE DISCONNECT;
ALTER DATABASE COMMIT TO SWIT CHOVER TO PRIMARY [WITH SESSION SHUTDOWN];
ALTER DATABASE PREPARE TO SWITCHOVER TO LOGICAL STANDBY;
Following this step, logs start to ship in both directions, although the current production database does n ot process the logs coming from the current logical standby database.
ALTER DATABASE PR EPARE TO SWITCHOVER TO PRIMARY;
This is the phase where current transactions on the production database are cancelled. All DML-related cursors are invalidated, preventing new records from b eing applied. The end of redo (EOR) marker is recorded in the online redo log and then shipped (immediately if using real-time apply) to the logical standby database and registered.
ALTER DATABASE COMMIT TO SWITCHOVER TO LOGICAL STANDBY;
ALTER DATABASE COMMIT TO SWITCHOVER TO PR IMARY;
If real-time apply is required, execute the following statement:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
Otherwise execute the following statement:
ALTER DATABASE START LOGICAL STANDBY APPLY;
This section includes the following topics:
This section includes the following topics:
Instance failure occurs when software or hardware problems disable an instance. After ins tance failure, Oracle automatically uses the online redo log file to perform database recovery as described in this section.
Instance recovery in RAC does not in clude restarting the failed instance or the recovery of applications that were running on the failed instance. Applications that were running continue by using failure recognition and recovery as described in Oracle Real Application Clusters Installation and Configuration Guide. This provides consistent an d uninterrupted service in the event of hardware or software failures. When one instance performs recovery for another instance, the surviving instance reads redo log entries generated by the failed instance and uses that information to ensure that committed transac tions are recorded in the database. Thus, data from committed transactions is not lost. The instance that is performing recovery roll s back uncommitted transactions that were active at the time of the failure and releases resources used by those transactions.
When multiple node failures occ ur, as long as one instance survives, RAC performs instance recovery for any other instances that fail. If all instances of a RAC dat abase fail, then Oracle automatically recovers the instances the next time one instance opens the database. The instance that is perf orming recovery can mount the database in either shared or exclusive mode from any node of a RAC database. This recovery procedure is the same for Oracle running in shared mode as it is for Oracle running in exclusive mode, except that one instance performs instance recovery for all the failed instances in exclusive node.
Service reliability is achieved by configuring and failing over among redundant instances. More instances are enabled to pr ovide a service than would otherwise be needed. If a hardware failure occurs and adversely affects a RAC database instance, then RAC automatically moves any services on that instance to another available instance. Then Cluster Ready Services (CRS) attempts to restar t the failed nodes and instances.
An installation can specify the "preferred" and "availabl
e" configuration for each service. This configuration describes the preferred way to run the system, and is used when the service fir
st starts up. For example, the ERP service runs on instance1 and instance2, and the HR
code> service runs on instance3 when the system first starts. instance2 is available to run HR
in the event of a failure or planned outage, and instance3 and instance4 are available to run ERP. The service configuration can be designed several ways.
RAC recognizes when a failure affects a service and automatically fails over the service and redistributes the clients across the surviving instance supporting th e service. In parallel, CRS attempts to restart and integrate the failed instances and dependent resources back into the system. Noti fication of failures occur at various levels including notifying external parties through Enterprise Manager and callouts, recording the fault for tracking, event logging, and interrupting applications. Notification occurs from a surviving fault domain when the fail ed domain is out of service. The location and number of fault domains serving a service is transparent to the applications. Auto rest art and recovery are automatic, including all the subsystems, not just database.
< h3 class="H2">RAC Recovery for Scheduled OutagesThis section includes the following topics:
When an outage occurs, RAC automatically restarts essential components. Components that are eligible f
or automatic restart include instances, listeners, and the database as well as several subcomponents. Some scheduled administrative t
asks require that you prevent components from automatically restarting. To perform scheduled maintenance that requires a CRS-managed
component to be down during the operation, the resource must be disabled to prevent CRS from trying to automatically restart the comp
onent. For example, to take a node and all of its instances and services offline for maintenance purposes, disable the instance and i
ts services using either Enterprise Manager or SRVCTL, and then perform the required maintenance. Otherwise, if the node
fails and then restarts, then CRS attempts to restart the instance during the administrative operation.
For a scheduled outage that requires an instance, node, or other component to be isolated, RAC provides the ability to relocate, disable, and enable services. Relocation migrates the service to another instance. The sessions can also be relocated. These interfaces also allow services, instances and databases to be selectively disabled while a repair, change, or upgrade is made and re-enabled after the change is complete. This ensures that the service is not started at the instance being repaired because of a dependency or a start operation on the service. The service is disabled on the instance at the b eginning of the planned outage. It is then enabled at the end of the maintenance outage.
Fo
r example, to relocate the SALES service from instance1 to instance3 in order to perform sched
uled maintenance on node1,the tasks can be performed using Enterprise Manager or SRVCTL commands. The follo
wing shows how to use SRVCTL commands:
SALES service to instance3.
srvctl relocate s ervice -d PROD -s SALES -i instance1 -t instance3
SALES service on instance1 to prevent it from being relocated to instance1 while maintenance is performed.
srvctl disable service -d PROD -s SALES -i instance1
instance1.
srvctl stop instance -d PROD -i instance1
instance1.
srvctl start instance -D PROD -i instance1
SALES service on instance1
code>.
srvctl enable service -d PROD -s SALES -i instance1
<
/li>
- If desired, relocate the
SALES service running on in
stance3 back to instance1.
srvctl relocate service -d PROD -s SALES -i
instance3 -t instance1
This section applies to MAA, with RAC and Data Guard on each site.
A standby database can have multiple standby instances. Only one instance can have the managed recovery proc ess (MRP) or the logical standby apply process (LSP). The instance with the MRP or LSP is called the apply instance.
When you have a RAC-enabled standby database, you can fail over the apply instance of the standby RAC envir onment. Failing over to another apply instance may be necessary when incurring a planned or unplanned outage that affects the apply i nstance or node. Note the difference between apply instance failover, which utilizes multiple instances of the standby database at th e secondary site, and Data Guard failover or Data Guard switchover, which converts the standby database into a production database. T he following occurs as a result of apply instance failover:
For apply failover to work correctly, "Configuration Best Practices for MAA"< a href="configbp.htm#1013204"> must be followed:
W hen you follow these configuration recommendations, apply instance failover is automatic for a scheduled or unscheduled outage on the primary instance, and all standby instances have access to archived redo logs. By definition, all RAC standby instances already have access to standby redo logs because they must reside on shared storage.
The method of rest arting the physical standby managed recovery process (MRP) or the logical standby apply process (LSP) depends on whether Data Guard B roker is being used. If the Data Guard Broker is in use, then the MRP or LSP is automatically restarted on the first available standb y instance if the primary standby instance fails. If the Data Guard Broker is not being used, then the MRP or LSP must be manually re started on the new standby instance. Consider using a shared file system, such as a clustered file system or a global file system, fo r the archived redo logs. A shared file system enables you to avoid reshipment of any unapplied archived redo logs that were already shipped to the standby.
| See Also:
Oracle Data Guard Concepts and Administration for details about setting up cross-instance archiving |
If apply instance fai lover does not happen automatically, then follow these steps to restart your production database, if necessary, and restart MRP or LS P following an unscheduled apply instance or node outage:
From the targeted stand by instance, run the following query.
SELECT OPEN_MODE, DATABASE_ROLE FROM V$DATABASE;
% lsnrctl status listener_name
% tnsping standby_database_connection_service_name
If the connection cannot be made, then consult Oracle Net Services Administrator's Guide for further troubleshooting.
< /a>Use the following statements for a physical st andby database:
RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
Use the following statements for a logical standby database:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
Optionally, copy the archived redo logs to the new apply host.
The copy is not necessary for a physical standby database. For a physical standby database, when the managed recovery process detects an archive gap, it requests the production archived redo logs to be resent automatically.
For a logical standby database, unapplied archive file names that have already been registered ar e not resent automatically to the logical standby database. These archived redo logs must be sent manaully to the same directory stru cture in the new apply host. You can identify the registered unapplied archived redo logs by executing a statement similar to the fol lowing:
SELECT LL.FILE_NAME, LL.THREAD#, LL.SEQUENCE#, LL.FIRST_CHANGE#, LL.NEXT_CHANGE#, LP.APPLIED_SCN, LP.READ_SCN FROM DBA_LOGSTDBY_LOG LL, DBA_LOGSTDBY_PROGRESS LP WHERE LEAST(LP.APPLIED_SCN, LP.READ_SCN) <= LL.NEXT_CHANGE#;
< /a>
Compare the results of the statement to the contents of the STANDBY_ARCHIVE_DEST directory.
| See Also:
a>
"Oracle9i Data Guard: SQL Apply Best Practices" at |
Query V$ARCHIVE_STATUS and V$ARCHIVED_DEST_STATUS.
SELECT NAME_SPACE, STATUS, TARGET, LOG_SEQUENCE , TYPE,PROCESS, REGISTER , ERROR FROM V$ARCHIVE_DEST WHERE STATUS!='INACTIV E'; SELECT * FROM V$ARCHIVE_DEST_STATUS WHERE STATUS!='INACTIVE';
Issue the following queries to ensure that the sequence n umber is advancing over time.
Use the following statements for a physical standby database :
SELECT MAX(SEQUENCE#), THREAD# FROM V$LOG_HISTORY GROUP BY THREAD#; SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, CLIENT_PROCESS FROM V$MANAGED_ STANDBY;
Use the following statements for a logical standby database:
a> SELECT MAX(SEQUENCE#), THREAD# FROM DBA_LOGSTDBY_LOG GROUP BY THREAD#; SELECT APPLIED_SCN FROM DBA_LO GSTDBY_PROGRESS;
Recovering from a data failure is an unscheduled outage scenario. A data failure is usually, but not always, caused by some activity or failure that occurs outside the database, even though the problem may be evident within the database.
Data failure can affect the following types of database objects:
UNDO tablespace, temporary tablespaceSPFILE)Data failure can be categorized as eit her datafile block corruption or media failure:
In all environments, you can resolve a data failure outage by one of the following methods:
In a Data Guard environment, you can also use a Data Guard switchover or failover to a sta ndby database to recover from data failures.
Another category of related outages that resul t in database objects becoming unavailable or inconsistent are caused by user error, such as dropping a table or erroneously updating table data. Information about recovering from user error can be found in "Recovering from User Error w ith Flashback Technology".
The rest of this section i ncludes the following topics:
A corrupt datafile block can be accessed, but the contents within the block are invalid or inconsistent. The typi cal cause of datafile corruption is a faulty hardware or software component in the I/O stack, which includes, but is not limited to, the file system, volume manager, device driver, host bus adapter, storage controller, and disk drive.
The database usually remains available when corrupt blocks have been detected, but some corrupt blocks may cause widesprea d problems, such as corruption in a file header or with a data dictionary object, or corruption in a critical table that renders an a pplication unusable.
See Also:
|
The rest of this section includes the following topics:
A data fault is detected when it is recognized by the user, administrator, RMAN backup, or application becaus e it has affected the availability of the application. For example:
SYSTEM tablespace caused by a failing disk controllerRegul arly monitor application logs (which may be distributed across the data server, middle-tier and the client machines), the alert log, and Oracle trace files for errors such as ORA-1578 and ORA-1110
ORA-01578: ORACLE data bloc k corrupted (file # 4, block # 26) ORA-01110: data file 4: '/u01/oradata/objrs/obj_corr.dbf'
After you have identified datafile block corruptio n, follow these steps:
Use the following methods to determine the extent of the corruption:
Gather the file number, file name, and block number from the error messages. For example:
ORA- 01578: ORACLE data block corrupted (file # 22, block # 12698) ORA-01110: data file 22: '/oradata/SALES/user s01.dbf'
The file number is 22, the block number is 12698, and t
he file name is /oradata/SALES/users01.dbf.
Record additional error messages that appear in the alert log, Oracle trace files, or application logs. Note that log files may be distributed across the data server, middle tier, and client machines.
Use Oracle detection tools to find other data failure problems that may exist on the same disk
or set of disks that have not yet been reported. For example, if the file number 22 has corrupt blocks, then it is prudent to run the
RMAN BACKUP VALIDATE DATAFILE 22 command to detect additional corruption. Table 10-2
summarizes the Oracle tools that are available to detect datafile block corruption.
Some corruption problems are caused by faulty hardware. If ther e is a hardware fault or a suspect component, then it is sensible to either repair the problem, or make disk space available on a sep arate disk subsystem before proceeding with a recovery option.
If there are multiple errors , if there are operating system-level errors against the affected file, or if the errors are transient and keep moving about, then th ere is little point in proceeding until the underlying problem has been addressed or space is available on alternative disks. Ask you r hardware vendor to verify system integrity.
In a Data Guard environment, a switchover can be performed to bring up the application and restore service quickly while the corruption problem is handled offline.
Using the file ID (fid) and bloc
k ID (bid) gathered from error messages and the output from Oracle block checking utilities, determine which da
tabase objects are affected by the corruption by using a query similar to the following:
SE LECT tablespace_name, partition_name, segment_type, owner, segment_name FROM dba_e xtents WHERE file_id = fid AND bid BET WEEN block_id AND block_id + blocks - 1;
The following is an exam ple of the query and its resulting output:
SQL> select tablespace_name, partition_name, segment_type, 2 owner, segment_name from dba_extents 3 where file_id=4 and 11 betw een block_id and block_id + blocks -1; TABLESPACE_NAME PARTITION_NAME SEGMENT_TYPE OWNER SEGMENT_NAME --------------- -------------- ------------ ----- ------------ USERS TABLE SCOTT EMP
The integrity of a table or index can be determined by using the ANALYZE statement.
The recommended recovery methods are summarized in Table 10-3 and Table 10-4. The recovery methods depend on whether Data Guard is being used.
Table 10-3 summarizes recovery methods for data failure when Data Guard is not used.
| Object Affected | Extent of Problem | < strong>Action | ||
|---|---|---|---|---|
|
Data dictionary or |
N/A |
|||
|
Applicat ion segment (user table, index, cluster) |
Widespread or unknown < /td> | |||
|
Applica tion segment (user table, index, cluster) |
Localized |
td> | ||
|
|
N/A |
No impact to permanent objects. Re-create temporary tablespace if required. | ||
| Object Affected | Extent of Problem | Impact to Applica tion | Cost of Local Recovery | Action |
|
Data dictionary or |
N/A |
N/A |
N/A |
|
|
Application segment (user table, index, cluster) |
Low |
Low |
||
|
Application segment (user table, index, cluster) |
N/A |
High |
N/A |
|
|
Application segment (user table, index, cluster) |
N/A |
N/A |
High |
|
|
Application segment (user table, index, cluster) |
Localized |
Low |
Low |
|
|
|
N/A |
N/A |
N/A |
No impact to permanen t objects. Re-create temporary tablespace if required. |
The proper recovery method to use depends on the following criteria, as indicated in Table 10-3 and Table 10-4:
UNDO segment: The object affected is owned by SY
S or is part of UNDO tablespaceAn object may be critical for the application to function. This includes objects that are critical for the performance and usability of the application. It could also be a history, reporting, or logging table, which may not be as critical. It could also be an object that is no longer in use or a te mporary segment.
This criterion only applies to a Data Guard environment and should be use d to decide between recovering the affected object locally and using Data Guard failover. Possible values are:
This criterion only applies to a Data Guard environment and should be used to decide between recovering the affect ed object locally and using Data Guard failover. This is not a business cost which is assumed to be implicit in deciding how critical an object is to the application but cost in terms of feasibility of recovery, resources required and their impact on performance and total time taken.
Cost of local recovery should include the time to restore and recover t he object from a valid source; the time to recover other dependent objects like indexes, constraints, and related tables and its inde xes and constraints; availability of resources like disk space, data or index tablespace, temporary tablespace; and impact on perform ance and functionality of current normal application functions due to absence of the corrupt object.
Corruption may be localized so that it affects a known number of blocks within one or a few objects, or it may be widespread so that it affects a large portion of an object .
When media failure occurs, follow t hese steps:
Use the following methods to determine the extent of the media failure:
Gather the file number a nd file name from the error messages reported. Typical error messages are ORA-1157, ORA-1110 and ORA-1115, ORA-1110.
For example, from the following error message:
ORA-01115: IO error reading block from file 22 (block # 12698) ORA-01110: data file 22: '/oradata/SALES/users01.dbf'
Record additional error messages that appear in the system logs, volume manager logs, alert log, Oracle trace files, or application logs. Note that log files may be distributed across the data server, middle tier, and the cl ient machines.
If there i s a hardware fault or a suspect component, then it is sensible to either repair the problem or make disk space available on a separat e disk subsystem before proceeding with a recovery option.
If there are multiple errors, if there are operating system-level errors against the affected file, or if the errors are transient and keep moving about, then there is little point in proceeding until the underlying problem has been addressed or space is available on alternative disks. Ask your ha rdware vendor to verify system integrity.
The appropriate recovery action depends on what type of file is affected by the media failure. Table& nbsp;10-5 shows the type of file and the appropriate recovery.
| Type of File | Recovery Acti on |
|---|---|
|
Datafile |
Media failure of a datafile is resolved in the same manner in which widespread datafile block corruption is handled. |
|
Control file |
Loss of a control file causes the primary database to shut down. The steps to recover from control
file failure include making a copy of a good control file, restoring a backup copy of the control file, or manually creating the cont
rol file with the
See Also: "Performing User-Managed Flashback and Recovery" in Oracle Database Backup and Recovery Advanced User's Guide |
|
Standby c ontrol file |
Loss of a standby control file causes the standby datab ase to shut down. It may also, depending on the primary database protection mode, cause the primary database to shut down. To recover from a standby control file failure, a new standby control file must be created from the primary database and transferred to the sta ndby system. See Also: "Creating a Physical Standby Database" in Oracle Data Guard Concepts and Administration< /a> |
|
Online redo log file |
If a media failure has affected the online redo logs of a database, then the appropriate recovery procedure depends on the following:
See Also: "Advanced User-Managed Recovery Scenarios" in Oracle Database Backup and Recovery Advanced User's Guide If the online redo log failure causes the primary database to shut down and incomplete recovery must be used to ma ke the database operational again, then Flashback Database can be used instead of restoring all datafiles. Use Flashback Database to take the database back to an SCN before the SCN of the lost online redo log group. The resetlogs operation that is done as part of th e Flashback Database procedure reinitializes all online redo log files. Using Flashback Database is faster than restoring all datafil es. If the online redo log failure causes the primary database to shut down in a Data Guard environment, it may be desirable to perform a Data Guard failover to reduce the time it takes to restore service to users and to redu ce the amount of data loss incurred (when using the proper database protection mode). The decision to perform a failover (instead of recovering locally at the primary site with Flashback Database, for example) depends on the estimated time to recover from the outage at the primary site, the expected amount of data loss, and the impact the recovery procedures taken at the primary site may have on the standby database. For example, if the decision is to recover at the primary site, then t he recovery steps may require a Flashback Database and open resetlogs, which may incur a full redo log file of lost data. A standby d atabase will have less data loss in most cases than recovering at the primary site because all redo data is available to the standby database. If recovery is done at the primary site and the standby database is ahead of the point to which the primary database is rec overed, then the standby database must be re-created or flashed back to a point before the resetlogs SCN on the primary database. See Also: "Creating a Physical Standby Database" in Oracle Data Guard Concepts and Administration |
|
Standby redo log file p> |
Standby redo log failure affects only the standby database in a Data Guard environment. Most standby redo log failures are handled automatically by the standby database without affecting the primary da tabase. However, if a standby redo log file fails while being archived to, then the primary database treats it as a log archive desti nation failure. See Also: "Dete rmine the Data Protection Mode" |
|
Loss of an archived redo log does not affect availability of the primary database directly, but it may significantly affec t availability if another media failure occurs before the next scheduled backup, or if Data Guard is being used and the archived redo log had not been fully received by the standby system and applied to the standby database before losing the file. See Also: "Advanced User-Managed Recovery Scenarios" in Oracle Database Backup and Recovery Advanced User's Guide If an archived redo log is lost in a Data Guard environment and the log has already been applied t o the standby database, then there is no impact. If there is no valid backup copy of the lost file, then a backup should be taken imm ediately of either the primary or standby database because the lost log will be unavailable for media recovery that may be required f or some other outage. If the lost archived redo log has not yet been applied to the standby
database, then a backup copy of the file must be restored and made available to the standby database. If there is no valid backup cop
y of the lost archived redo log, then the standby database must be reinstantiated from a backup of the primary database taken after t
he | |
|
Server parameter file ( |
Loss of the server parameter file does not affect availability of the database. See Als o: "Performing Recovery" of Oracle Datab ase Backup and Recovery Basics |
|
Oracle Cluster Registry (OCR) |
Loss of the O
racle Cluster Registry file affects the availability of RAC and Cluster Ready Services. The OCR file can be restored from a physical
backup that is automatically created or from an export file that is manually created by using the See Also: "Administering Storage in Real Application Clusters" in |
The fo llowing recovery methods can be used in all environments:
Always use local recovery methods when Data Guard is not being used. Local recovery methods may also be appropriate in a Data Guard environment. This section also includes the following topic:
Datafile media
recovery recovers an entire datafile or set of datafiles for a database by using the RMAN RECOVER command. When a large
or unknown number of data blocks are marked media-corrupt and require media recovery, or when an entire file is lost, the affected da
tafiles must be restored and recovered.
Use RMAN file media recovery when the following con ditions are true:
| See Also:
"Advanced User-Managed Recovery Scenarios" in Oracle Database Backup and Recovery Advanced User's Guide |
Block media recovery (BMR) recovers one or a set of data blocks marked "medi
a corrupt" within a datafile by using the RMAN BLOCKRECOVER command. When a small number of data blocks are marked media
corrupt and require media recovery, you can selectively restore and recover damaged blocks rather than whole datafiles. This results
in lower mean time to recovery (MTTR) because only blocks that need recovery are restored and only necessary corrupt blocks undergo
recovery. Block media recovery minimizes redo application time and avoids I/O overhead during recovery. It also enables affected data
files to remain online during recovery of the corrupt blocks. The corrupt blocks, however, remain unavailable until they are complete
ly recovered.
Use block media recovery when:
BACKUP VALIDATE co
mmand) and only when complete recovery is required.Block media recovery cannot be us ed to recover from the following:
The following are useful practices when using bloc k media recovery:
BACKUP VALIDATE is run once a week, then the flash recovery area should have a rete
ntion policy greater than one week. This ensures that corrupt blocks can be recovered quickly using block media recovery with disk-ba
sed backups.V$DATABASE_BLOCK_CORRUPTION view has a list of blocks validated as corrupt by
RMAN. RMAN can be instructed to recover all corrupt blocks listed in V$DATABASE_BLOCK_CORRUPTION using block media reco
very.
Some database objects, such as small look-up tables or indexes, can be recovered quickly by ma nually re-creating the object instead of doing media recovery.
Use manual object re-creatio n when:
Use Data Guard switchover or failover for data failure when:
Oracle flashback technology revolution izes data recovery. In the past it took seconds to damage a database but hours to days to recover it. With flashback technology, the time to correct errors can be as short as the time it took to make the error. Fixing user errors that require rewinding the database, table, transaction, or row level changes to a previous point in time is easy and does not require any database or object restoration . Flashback technology provides fine-grained analysis and repair for localized damage such as erroneous row deletion. Flashback techn ology also enables correction of more widespread damage such as accidentally running the wrong application batch job. Furthermore, fl ashback technology is exponentially faster than a database restoration.
Flashback technolog ies are applicable only to repairing the following user errors:
DROP TABLE statementsFlashback technologies cannot be used for media or da ta corruption such as block corruption, bad disks, or file deletions. See "Recovery Solutions for Data Failures" and "Database Failover" to repair these outages.
Table 10-6 summarize s the flashback solutions for each type of outage.
| Impact of Outage | Examples of User Errors< /th> | |
|---|---|---|
| < a name="1007487"> |
|
Use a combination of: See Also: "Flashback Query" |
|
See Also:"Resolving Table Inconsistencies" |
| |
|
See Also : "Resolving Database-Wide Inconsistencies" |
|
Table 10-7 summarizes e ach flashback feature.
| Flashback Feature | < /a> Description |
|---|---|
| td> |
Flashback Query enables you to view data at a point in time in the past. It can be used to view and reconstruct lost data that was deleted or changed by accident. Developers can use this feature to build s elf-service error correction into their applications, empowering end users to undo and correct their errors. p> Note: Changes are propagated to physical and logical standby databases. |
|
Flashback Version Query uses undo data stored in the database to view the changes to one or more rows along with all the metadata of the changes. Note: Changes are propagated to physical and logical standby databases. | |
|
Flashback Transaction Query enables you to examine changes to the d atabase at the transaction level. As a result, you can diagnose problems, perform analysis, and audit transactions. Note: Changes are propagated to physical and logical standby databases. | |
|
Flashback Drop provides a way to restore accidentally dropped tables. Note: Changes are propagated to p hysical standby databases. | |
|
Flashb ack Table enables you to quickly recover a table to a point in time in the past without restoring a backup. Note: Changes are propagated to physical and logical standby databases. |
Flashback Database enables you to quickly ret urn the database to an earlier point in time by undoing all of the changes that have taken place since that time. This operation is f ast because you do not need to restore the backups. |
Flashback Dat abase uses the Oracle Database flashback logs, while all other features of flashback technology use the Oracle Database unique undo a nd multiversion read consistency capabilities. See "Configuration Best Practices for the Database" for configuring flashback technologies to ensure that the resources from these solutions are avail able at a time of failure.
The rest of this section includes the following topics:
| See Also:
Oracle Database Administ rator's Guide, Oracle Database Backup and Recovery Basics, and Oracle Database Concepts for mor e information about flashback technology and automatic undo management |
Resolving row and transaction inconsistencies may require a combination of Flashback Query, Flashback Version Query, Flashback Transaction Query, and the suggested undo statements to rectify t he problem. The following sections describe a general approach using a human resources example to resolve row and transaction inconsi stencies caused by erroneous or malicious user errors.
This section includes the following topics:
Flashback Query, a feature introduced in the Oracle9i Database, enables an administrator or user to query any data at some point in time in the past. This powerful feature can be used to view and reconstruct data that may have been deleted or changed by accident. For example:
SELECT * FROM EMPLOYEES AS OF TIMESTAMP TO_DATE('28-Aug-03 14:00' ,'DD-Mon-YY HH24:MI') WHERE ...
This parti
al statement displays rows from the EMPLOYEES table starting from 2 p.m. on August 28, 2003. Developers can use this fea
ture to build self-service error correction into their applications, empowering end users to undo and correct their errors without de
lay, rather than burdening administrators to perform this task. Flashback Query is very simple to manage, because the database automa
tically keeps the necessary information to reconstruct data for a configurable time into the past.
Flashback Version Query provides a way to view changes made to the database at the row level. It is an extension to SQL and enables the retrieval of all the different versions of a row across a specified time interval. F or example:
SELECT * FROM EMPLOYEES VERSIONS BETWEEN TIMESTAMP TO_DATE('28-Aug-03 14:00','dd-Mon-YY hh24:mi') AND TO_DATE('28-Aug-03 15:0 0','dd-Mon-YY hh24:mi') WHERE ...
This stat ement displays each version of the row, each entry changed by a different transaction, between 2 and 3 p.m. today. A DBA can use this to pinpoint when and how data is changed and trace it back to the user, application, or transaction. This enables the DBA to track d own the source of a logical corruption in the database and correct it. It also enables application developers to debug their code.
Flashback Transaction Query provides a way to view changes made to the database at the transaction level. It is an extension to SQL that enables you to see all changes made by a transa ction. For example:
SELECT UNDO_SQL FOMR DBA_TRANSACTION_QUERY WHERE XID = '000200030000002D';
This statement s hows all of the changes that resulted from this transaction. In addition, compensating SQL statements are returned and can be used to undo changes made to all rows by this transaction. Using a precision tool like this, the DBA and application developer can precisely diagnose and correct logical problems in the database or application.
Consider a human resources (HR) example involving the SCOTT schema. the HR manager reports to the DBA that there is a potential discrepancy in Ward's salary. Sometime before 9:00 a.m., Ward
's salary was increased to $1875. The HR manager is uncertain how this occurred and wishes to know when the employee's salary was inc
reased. In addition, he has instructed his staff to reset the salary to the previous level of $1250, and this was completed around 9:
15 a.m.
The following steps show how to approach the problem.
Fortunate ly, the HR manager has provided information about the time when the change occurred. We can query the information as it was at 9:00 a .m. with Flashback Query.
SELECT EMPNO, ENAME, SAL FROM EMP < a name="1012425"> AS OF TIMESTAMP TO_DATE('03-SEP-03 09:00','dd-Mon-yy hh24:mi') WHERE ENAME = 'WARD '; EMPNO ENAME SAL ---------- ---------- ---------- 7521 WARD 1875
We can confirm we have the correct employee by the fact that Ward's salary was $1875 at 09:00 a.m. Rather than using Ward's nam e, we can now use the employee number for subsequent investigation.
Although i t is possible to restrict the row version information to a specific date or SCN range, we decide to query all the row information tha t we have available for the employee WARD using Flashback Version Query.
SELECT EMPNO, ENAME, SAL, VERSIONS_STARTTIME, VERSIONS_ENDTIME FROM EMP VERSIONS BETWEEN TIMEST AMP MINVALUE AND MAXVALUE WHERE EMPNO = 7521 ORDER BY NVL(VERSIONS_STARTSCN,1); < a name="1012447"> EMPNO ENAME SAL VERSIONS_STARTTIME VERSIONS_ENDTIME -------- ---------- ---------- ---------------------- ---------------------- 7521 WARD 1250 03-SEP-03 08.48.43 AM 03-SEP-03 08.54.49 AM 7521 WARD 1875 03-SEP-03 08.54 .49 AM 03-SEP-03 09.10.09 AM 7521 WARD 1250 03-SEP-03 09.10.09 AM
We can see that WARD's salary was increased from $1250 to $1875 at 08:54:49 the same mo rning and was subsequently reset to $1250 at approximately 09:10:09.
Also, we can modify t
he query to determine the transaction information for each of the changes effecting WARD using a similar Flashback Version Query. Thi
s time we use the VERSIONS_XID pseudocolumn.
SELECT EMPNO, ENAME, SAL, VER SIONS_XID FROM EMP VERSIONS BETWEEN TIMESTAMP MINVALUE AND MAXVALUE WHERE EMPNO = 7521 ORDER BY NVL(VERSIONS_STARTSCN,1); EMPNO ENAME SAL VERSIONS_XID ---------- ---------- ---------- ---------------- 7521 WARD 1250 0006000800000086 7521 WARD 1875 0009000500000089 7521 WARD 1250 000800050000008B
With the transaction information (VERSIONS_XID pseudocolumn), we can now query the database t
o determine the scope of the transaction, using Flashback Transaction Query.
SELECT UND O_SQL FROM FLASHBACK_TRANSACTION_QUERY WHERE XID = HEXTORAW('0009000500000089'); UNDO_SQL ---------------------------------------------------------------------------- update "SCO TT"."EMP" set "SAL" = '950' where ROWID = 'AAACV4AAFAAAAKtAAL'; update "SCOTT"."EMP" set "SAL" = '150 0' where ROWID = 'AAACV4AAFAAAAKtAAJ'; update "SCOTT"."EMP" set "SAL" = '2850' where ROWID = 'AAACV4AA FAAAAKtAAF'; update "SCOTT"."EMP" set "SAL" = '1250' where ROWID = 'AAACV4AAFAAAAKtAAE'; update "SCOTT"."EMP" set "SAL" = '1600' where ROWID = 'AAACV4AAFAAAAKtAAB'; 6 rows selected.
We can see that WARD's salary was not the only change that occurred in the transactio n. The information that was changed for the other four employees at the same time as WARD can now be passed back to the HR manager fo r review.
If the HR manager decides that the corrective changes suggested by the UNDO_SQL
column are correct, then the DBA can execute these statements individually.
Oracle provides a FLASHBACK DROP statement to recover from an accidenta
l DROP TABLE statement, and a FLASHBACK TABLE statement to restore a table to a previous point in the datab
ase.
This section includes the following topics:
Flash back Table provides the DBA the ability to recover a table, or a set of tables, to a specified point in time quickly and easily. In m any cases, Flashback Table alleviates the need to perform more complicated point in time recovery operations. For example:
FLASHBACK TABLE orders, order_items TO TIMESTAMP TO_DATE('29-AUG-03 14.00.00','dd-Mon-yy hh24:mi:ss');
This
statement rewinds any updates to the ORDERS and ORDER_ITEMS tables that have been done between the current
time and specified timestamp in the past. Flashback Table performs this operation online and in place, and it maintains referential i
ntegrity constraints between the tables.
Dropping or deleting database objects by accident is a common mistake. Users soon realize their mistake, but by then it is too late and there has been no way to easily recover the dropped tables and its indexes, constraints, and triggers. Objects once dropped were dropped forever. Loss of very important tables or other objects (like indexes, partitions or clusters) required DBAs to perform a point-in-time recovery, w hich can be time-consuming and lead to loss of recent transactions.
Flashback Drop provides
a safety net when dropping objects in Oracle Database 10g. When a user drops a table, Oracle places it in a
recycle bin. Objects in the recycle bin remain there until the user decides to permanently remove them or until space limitations beg
in to occur on the tablespace containing the table. The recycle bin is a virtual container where all dropped objects reside. Users ca
n look in the recycle bin and undrop the dropped table and its dependent objects. For example, the employees table and a
ll its dependent objects would be undropped by the following statement:
FLASHBACK TABLE emp loyees TO BEFORE DROP;
Oracle prov ides Flashback Database to rewind the entire database to a previous point in time. This section includes the following topics:
To bring an Oracle database to a previous point in time , the traditional method is point-in-time recovery. However, point-in-time recovery can take hours or even days, since it requires th e whole database to be restored from backup and recovered to the point in time just before the error was introduced into the database . With the size of databases constantly growing, it will take hours or even days just to restore the whole database.
Flashback Database is a new strategy for doing point-in-time recovery. It quickly rewinds an Oracle databas
e to a previous time to correct any problems caused by logical data corruption or user error. Flashback logs are used to capture old
versions of changed blocks. One way to think of it is as a continuous backup or storage snapshot. When recovery needs to be performed
the flashback logs are quickly replayed to restore the database to a point in time before the error and just the changed blocks are
restored. It is extremely fast and reduces recovery time from hours to minutes. In addition, it is easy to use. A database can be rec
overed to 2:05 p.m. by issuing a single statement. Before the database can be recovered, all instances of the database must be shut d
own and one of the instances subsequently mounted. The following is an example of a FLASHBACK DATABASE statement.
FLASHBACK DATABASE TO TIMESTAMP TIMESTAMP'2002-11-05 14:00:00';
No restoration from tape, no lengthy downtime, and no complicated recovery procedures are required
to use it. You can also use Flashback Database and then open the database in read-only mode and examine its contents. If you determi
ne that you flashed back too far or not far enough, then you can reissue the FLASHBACK DATABASE statement or continue re
covery to a later time to find the proper point in time before the database was damaged. Flashback Database works with a production d
atabase, a physical standby database, and a logical standby database.
These steps are recom mended for using Flashback Database:
SELECT OLDEST_FLASHBACK_SCN, TO_CHAR(OLDEST_FLASHBACK_TIME, 'mon-dd-yyyy HH:MI:SS') FROM V$FLASHBACK_DATABASE_LOG; < a name="1007337">
FLASHBACK DATABA SE TO SCN scn;
or
FLASHBACK DATABASE TO TIMESTAMP TO_DATE date;
ALTER DATABASE OPEN READ ONLY;
If more flashback data is required, then issue another FLASHBACK DATABASE statement. (The database must be mou
nted to perform a Flashback Database.)
If you want to move forward in time, issue a statem ent similar to the following:
RECOVER DATABASE UNTIL [TIME | CHANGE] date | scn;
ALTER DATAB ASE OPEN RESETLOGS;
Other considerations when using Fla shback Database are as follows:
Flashback Database does not automatically fix this problem, but it can be used to dramatically reduce the downtime. You can flash back the production database to a point before the tablespace was dropped and then restore a backu p of the corresponding datafiles from the affected tablespace and recover to a time before the tablespace was dropped.
Follow these recommended steps to use Flashback Database to repair a dropped tablespace:
FLASHBACK DATABASE TO BEFORE SCN drop_scn;
ALTER DATABASE RENAME FILE '.../UNNAMED00005' to 'restored_file';
ALTER DATABASE DATAFILE 'name' ONLINE; < /a>
SELECT CHECKPOINT_CHANGE# FROM V$DATAFILE_HEADER WHERE FILE#=1; RECOVER DATABASE UNTIL CHANGE checkpoint_change#;
ALTER DATABASE OPEN RESETLOG S;
"One-off" patches or interim patches to database so ftware are usually applied to implement known fixes for software problems an installation has encountered or to apply diagnostic patc hes to gather information regarding a problem. Such patch application is often carried out during a scheduled maintenance outage.
Oracle now provides the capability to do rolling patch upgrades with Real Application Clusters
with little or no database downtime. The tool used to achieve this is the opatch command-line utility.
The advantage of a RAC rolling upgrade is that it enables at least some instances of the RAC installation t
o be available during the scheduled outage required for patch upgrades. Only the RAC instance that is currently being patched needs t
o be brought down. The other instances can continue to remain available. This means that the impact on the application downtime requi
red for such scheduled outages is further minimized. Oracle's opatch utility enables the user to apply the patch success
ively to the different instances of the RAC installation.
Rolling upgrade is available only for patches that have been certified by Oracle to be eligible for rolling upgrades. Typically, patches that can be installed in a ro lling upgrade include:
Rolling upgrade of patches is currently available for one-off patches only. It is not available for pat ch sets.
Rolling patch upgrades are not available for deployments where the Oracle Database software is shared across the different nodes. This is the case where the Oracle home is on Cluster File System (CFS) or on shared v olumes provided by file servers or NFS-mounted drives. The feature is only available where each node has its own copy of the Oracle D atabase software.
This section includes the following topics:
The opatch utility applies a patch successivel
y to nodes of the RAC cluster. The nature of the patch enables a RAC installation to run in a mixed environment. Different instances
of the database may be operating at the same time, and the patch may have been applied to some instances and not others. The op
atch utility automatically detects the nodes of the cluster on which a specific RAC deployment has been implemented. The patch
is applied to each node, one at a time. For each node, the DBA is prompted to shut down the instance. The patch is applied to the da
tabase software install on that node. After the current node has been patched, the instance can be restarted. After the patch is appl
ied on the current node, the DBA is allowed to choose the next RAC node to apply the patch to. The cycle of instance shutdown, patch
application, and instance startup is repeated. Thus, at any time during the patch application, only one node needs to be down.
To check if a patch is a rolling patch, execute the following on UNIX platforms:
opatch query -is_rolling
(On Windows, e
xecute opatch.bat.)
Enter the patch location after the prompt.
To apply a patch to all nodes of the RAC cluster, execute the following command:
opatch apply patch_location
opatch automatically recognizes the patch to be a rolling patch and provides the required behavior.
To apply a patch to only the local node, enter the following command:
opatch apply -local patch_location
T o check the results of a patch application, check the logs in the following location:
$ORAC LE_HOME/.patch_storage/patch_id/patch_id_Apply_timestamp.log
Patches can be rolled back with the opatch utility. This enables the DBA to remove a troublesome patch or a patch that is no longer required. This can be done as
a rolling procedure.
To roll back a patch across all nodes of a RAC cluster, execute the f ollowing command:
opatch rollback -id patch_id -ph patch_location
To roll back a patch on the local node onl y, enter the following command:
opatch rollback -local -id patch_id -ph patch_location
To check the results of a patch rollback, check the logs in the following location:
$ORACLE_HOME/.patch_storage/ patch_id/patch_id_RollBack_timestamp.log
The opatch utility
also provides an option to list the installed software components as well as the installed patches. Enter the following command:
opatch lsinventory
For de
tails on usage and the other options to these commands, see MetaLink Notes 242993.1 and 189489.1 at http://metalink.oracle.com.
The following are recommended practices for all database patch upgrades:
The following are additional recommended practices for RAC rolling upgrades.
However, if
this was not done or is not feasible for some reason, adding information about an existing Oracle database software installation to
the Oracle inventory can be done with the attach option of the opatch utility. Node information can be also
added with this option.
-local option is the recommended way to do this.
In the interest of keeping all instances of the RAC cluster at the same patch level, it is strongly recommended that aft er a patch has been validated, it should be applied to all nodes of the RAC installation. When instances of a RAC cluster have simila r patch software, services can be migrated among instances without running into the problem a patch may have fixed.
The patches should be stored in a location that is accessible by all nodes of the cluster. Thus all nodes of the cluster are equivalent in their capability to apply or roll back a patch.
minimize_downtime option of the
apply command.Using a logical standby database enables you to accomplish upgrades for database software and patch sets with almost no downtime.
If a logical s tandby database does not currently exist, then verify that a logical standby database supports all of the essential datatypes of your application.
| See Also:
Oracle Data Guard Concepts and Administration for a list of datatypes supported by the logical standby database < /a> If you cannot use a logical standby database because of the datatypes in your application, then perform the upg rade as documented in Oracle Database Upgrade Gui de |
First, create or establish a logical standby database. Figure 10-4 shows a production database and a logical standby database, which are both version X databases.
Text description of the illustration maxav023.gif
Seco nd, stop the SQL Apply process and upgrade the database software on the logical standby database to version X+1. Figure 10-5 shows the production database, version X, and the logical standby database, version X+1.
Text description of the illustration maxav026.gif
Third, restart SQL Appl y and operate with version X on the production database and version X+1 on the standby database. The configuration can run in the mix ed mode shown in Figure 10-6 for an arbitrary period to validate the upgrade in the production environment.
Text description of the illustration maxav024.gif
When you are satisf ied that the upgraded software is operating properly, you can reverse the database roles by performing a switchover. This may take on ly a few seconds. Switch the database clients to the new production database, so that the application becomes active. If application service levels degrade for some reason, then you can open the previous production database again, switch users back, and terminate th e previous steps.
Figure 10-7 shows that the former standby database (version X+1) is now the production database, and the former production database (version X) is now the standby data base. The clients are connected to the new production database.
Text description of the illustration maxav027.gif
Upgrade the new standby database. Figure 10-8 s hows the system after both databases have been upgraded to version X+1.
Text description of the illustration maxav025 .gif
The role reversal that was just described includes the following steps:
Oracle's online obj ect reorganization capabilities have been available since Oracle8i. These capabilities enable object reorgani zation to be performed even while the underlying data is being modified.
Table 10-8 describes a few of the object reorganization capabilities available with Oracle Database 10g.
| Object Type | Example of Object Reorganization Solution | Description of Sol ution |
|---|---|---|
|
a>
Table |
|
A PL/SQL package that provides a mechanism to redefine tables on line |
|
Index td> |
Rebuild index |
Rebuild an index that has previously been marked as unusable |
|
Tablespace |
Ren ame tablespace |
Enables an existing tablespace to be renamed without rebuilding the tablespace and its contents |
In highly available systems, it is occasionally necessary to redefine large tables that are constantly accessed
to improve the performance of queries or DML performed. Oracle provides the DBMS_REDEFINITION PL/SQL package to redefine
tables online. This package provides a significant increase in availability compared to traditional methods of redefining tables tha
t require tables to be taken offline.
An index can be rebuilt online using the previous index definition, optionally moving the index to a new tablespace.
Oracle Database 10g introduces the ability to ren ame a tablespace, similar to the ability to rename a column, table and datafile. Previously, the only way to change a tablespace name was to drop and re-create the tablespace, but this meant that the contents of the tablespace had to be dropped and rebuilt later. Wi th the ability to rename a tablespace online, there is no interruption to the users.
ALTER TABLESPACE USERS RENAME TO new_tablespace_name; Tablespace alter ed.