| Oracle® Database Conce
pts 10g Release 1 (10.1) Part Number B10743-01 |
|
|
View PDF |
Computing environments confi gured to provide nearly full-time availability are known as high availability systems. Oracle has a number of products and features t hat provide high availability in cases of unplanned downtime or planned downtime.
This chapter includes the following topics:< /p>
Computing environments c onfigured to provide nearly full-time availability are known as high availability systems. Such systems typically have redundant hard ware and software that makes the system available despite failures. Well-designed high availability systems avoid having single point s-of-failure.
Oracle has a number of products and features that provide high availability i n cases of unplanned downtime or planned downtime.
Various things can cause unplanned downtime. Oracle offers the following features to maintain high availabilit y during unplanned downtime:
This section covers some Oracle solutions to system failures, including the fo llowing:
Oracle Enterprise Edition features include a fast-start fault recovery functionality to control instance recovery. This reduces the time required for cache recovery and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.
The foundation of fast-s tart recovery is the fast-start checkpointing architecture. Instead of the conventional event driven (that is, log switching) checkpo inting, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpo int advance. Fast-start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.
With fast-start fault recovery, the Oracle database is opened for access by applications without having to wait for the undo, or rollback, phase to be completed. The rollback of data locked by uncommitted transaction is done dynamically on an as needed basis . If the user process encounters a row locked by a crashed transaction, then it just rolls back that row. The impact of rolling back the rows requested by a query is negligible.
Fast-start fault recovery is very fast, because undo data is stored in the databa se, not in the log files. Undoing a block does not require an expensive sequential scan of a log file. It is simply a matter of locat ing the right version of the data block within the database.
Fast-start recovery can greatly reduce mean time to recover (MTTR) with minimal effects on online application performance. Oracle continuously est imates the recovery time and automatically adjusts the checkpointing rate to meet the target recovery time.
Real Application Clusters (RAC) databases are inherently high availability systems. The clusters that are typical of RAC environments can provide continuous service for both planned and unplanned outages. RAC builds h igher levels of availability on top of the standard Oracle features. All single instance high availability features, such as fast-sta rt recovery and online reorganizations, apply to RAC as well.
In addition to all the regular Oracle features, RAC exploits the redundancy provided by clustering to deliver availability with n-1 node failures in an n-node cluster. In other words, all users hav e access to all data as long as there is one available node in the cluster.
This section covers some Oracle solutions to data fa ilures, including the following:
In addition to fast-start fault recovery and mean time to recovery, Oracle provides several sol utions to protect against and recover from data and media failures. A system or network fault may prevent users from accessing data, but media failures without proper backups can lead to lost data that cannot be recovered. These include the following:
Recovery Manager (RMAN) is Oracle's utility to manage the backup and recovery of the database. It determines the most efficient method of running the requested backup, restore, or recovery operation. RMAN and the server automatically identify modifica tions to the structure of the database and dynamically adjust the required operation to adapt to the changes. You have the option to specify the maximum disk space when restoring logs during media recovery, thus enabling an efficient space management during the reco very process.
Oracle Flashback Database lets you quickly recover an Oracle database to a previous time t o correct problems caused by logical data corruptions or user errors.
Oracle Flashback Query lets you vi ew data at a point-in-time in the past. This can be used to view and reconstruct lost data that was deleted or changed by accident. D evelopers can use this feature to build self-service error correction into their applications, empowering end-users to undo and corre ct their errors.
Backup information can be stored in an independent flash recovery area. This increases the resilience of the information, and allows easy querying of backup information. It also acts as a central repository for backup in formation for all databases across the enterprise, providing a single point of management.
When performi ng a point in time recovery, you can query the database without terminating recovery. This helps determine whether errors affect crit ical data or non-critical structures, such as indexes. Oracle also provides trial recovery in which recovery continues but can be bac ked out if an error occurs. It can also be used to "undo" recovery if point in time recovery has gone on for too long.
With Oracle's block-level media recovery, if only a single block is damaged, then only that block needs to be recovere d. The rest of the file, and thus the table containing the block, remains online and accessible.
LogMine r lets a DBA find and correct unwanted changes. Its simple SQL interface allows searching by user, table, time, type of update, value in update, or any combination of these. LogMiner provides SQL statements needed to undo the erroneous operation. The GUI interface s hows the change history. Damaged log files can be searched with the LogMiner utility, thus recovering some of the transactions record ed in the log files.
|
See Also:
|
Partitioning addresses key issues in supp orting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions< /strong>. SQL queries and DML statements do not need to be modified in order to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individuals partitions rather than entire tables or indexes. This is how partit ioning can simplify the manageability of large database objects. Also, partitioning is entirely transparent to applications.
Transparen t Application Failover enables an application user to automatically reconnect to a database if the connection fails. Active transacti ons roll back, but the new database connection, made by way of a different node, is identical to the original. This is true regardles s of how the connection fails.
With Transparent Application Failover, a client notices no loss of connection as long as there is one instance left serving the application. The database administrator controls which applications run on which instances and also creates a failover order for each application. This works best with Real Application Clusters: If one node dies, then you can quickly reconnect to another node in the cluster.
During normal client/serve r database operations, the client maintains a connection to the database so the client and server can communicate. If the server fail s, so then does the connection. The next time the client tries to use the connection the client issues an error. At this point, the u ser must log in to the database again.
With Transparent Application Failover, however, Oracle automatically obtains a new conn ection to the database. This enables users to continue working as if the original connection had never failed.
There are sever al elements associated with active database connections. These include:
Client/server database connectio ns
Users' database sessions executing commands
Open cursors used for fetchi ng
Active transactions
Server-side program variables
Trans parent Application Failover can be used to restore client/server database connections, users' database sessions and optionally an act ive query. To restore other elements of an active database connection, such as active transactions and server-side package state, the application code must be capable of re-running statements that occurred after the last commit.
Oracle's primary solution to disasters is the Oracle Data Guard produc t.
Oracle Data Guard maintains up to nine standby databases, each of which is a real-time copy of the production database, to protect again st all threats—corruptions, data failures, human errors, and disasters. If a failure occurs on the production (primary) databa se, then you can fail over to one of the standby databases to become the new primary database. In addition, planned downtime for main tenance can be reduced, because you can quickly and easily move (switch over) production processing from the current primary database to a standby database, and then back again.
An Oracle Data Guard configuration is a collection of loosely connected systems, consisting of a single primary database and up to nine standby databases that can include a mix of both physical and logical standby databases. The databases in a Data Guard configuration can be connected by a LAN in the same data center, or—for maximum disaster protection& #x2014;geographically dispersed over a WAN and connected by Oracle Net Services.
A Data Guard configuration can be deployed fo r any database. This is possible because its use is transparent to applications; no application code changes are required to accommod ate a standby database. Moreover, Data Guard lets you tune the configuration to balance data protection levels and application perfor mance impact; you can configure the protection mode to maximize data protection, maximize availability , or maximize performance.
As application transactions make changes to the primary da tabase, the changes are logged locally in redo logs. For physical standby databases, the changes are applied to each physical standby database that is running in managed recovery mode. For logical standby databases, the changes are applied using SQL regenerated from the archived redo logs.
A physical standby database is physically identical to the primary database. While the primar y database is open and active, a physical standby database is either performing recovery (by applying logs), or open for reporting ac cess. A physical standby database can be queried read only when not performing recovery while the production database continues to sh ip redo data to the physical standby site.
Physical standby on disk database structures must be identical to the primary datab ase on a block-for-block basis, because a recovery operation applies changes block-for-block using the physical rowid. The database s chema, including indexes, must be the same, and the database cannot be opened (other than for read-only access). If opened, the physi cal standby database will have different rowids, making continued recovery impossible.
A logical standby database takes standard Oracle archived redo logs, transforms the redo records they contain into SQL transactions, and then applies them to an open standby database. Although changes can be applied concurrently with end-user access, the tables being maintained through regenerated SQL transactions allow read-only access to users of the logical standby database. Because the database is open, it is physically dif ferent from the primary database. The database tables can have different indexes and physical characteristics from their primary data base peers, but must maintain logical consistency from an application access perspective, to fulfill their role as a standby data sou rce.
Oracle Data Guard Broker automates complex creation and maintenance tasks and provides dramatically enhanced monito
ring, alert, and control mechanisms. It uses background agent processes that are integrated with the Oracle database server and assoc
iated with each Data Guard site to provide a unified monitoring and management infrastructure for an entire Data Guard configuration.
Two user interfaces are provided to interact with the Data Guard configuration, a command-line interface (DGMGRL) and a
graphical user interface called Data Guard Manager.
Oracle Data Guard Manager, which is integrated with Oracle Enterprise Man ager, provides wizards to help you easily create, manage, and monitor the configuration. This integration lets you take advantage of other Enterprise Manager features, such as to provide an event service for alerts, the discovery service for easier setup, and the jo b service to ease maintenance.
This secti on covers some Oracle solutions to human errors, including the following:
If a major error occurs, such as a batch job being run twice in succession, the database administrato r can request a Flashback operation that quickly recovers the entire database to a previous point in time, eliminating the need to re store backups and do a point-in-time recovery. In addition to Flashback operations at the database level, it is also possible to flas h back an entire table. Similarly, the database can recover tables that have been inadvertently dropped by a user.
Oracle Flashback Database lets you quickly bring your database to a prior point in time by undoing all the changes that ha ve taken place since that time. This operation is fast, because you do not need to restore the backups. This in turn results in much less downtime following data corruption or human error.
Oracle Flashback Table lets you quickly recover a table to a point in time in the past without restoring a backup.
Oracle Flashback Drop provides a way to restore accidentally dropped tables.
Oracle Flashback Query lets you view data at a point-in-time in the past. This can be used to view and reconstruct lost data that was deleted or changed by accident. Developers can use this feature to build self-service error correction into their applications, empowering end-users to undo and correct their errors.
Oracle Flashback Version Query uses undo data stored in the database to view the changes to one or more rows along wi th all the metadata of the changes.
Oracle Flashback Transaction Query lets you examine changes to the d atabase at the transaction level. As a result, you can diagnose problems, perform analysis, and audit transactions.
|
See Also:
|
Oracle LogMiner lets you query redo log files throug h a SQL interface. Redo log files contain information about the history of activity on a database. Oracle Enterprise Manager includes the Oracle LogMiner Viewer graphical user interface (GUI).
All changes made to user data or to the database dictionary are re corded in the Oracle redo log files. Therefore, redo log files contain all the necessary information to perform recovery operations. Because redo log file data is often kept in archived files, the data is already available. To take full advantage of all the features LogMiner offers, you should enable supplemental logging.
|
See Also: Chapter 11, " Oracle Utilities" |