| Oracle® High Availability Architecture and Best Practices
10g Release 1 (10.1) Part Number B10726-01 |
|
|
View PDF |
This chapter describes Oracle configuration best practices. It includes the following sections:
| <
/a>See Also:
Appendix B, "Database SPFILE and Oracle Net Configuration File Samples" for a complete example of database parameter settings |
The practices that are recommended in this section affect the performance, availability, and MTTR of y our system. These practices apply to the single-instance database, RAC-only, Data Guard-only, and Maximum Availability architectures described in Chapter 4, "High Availability Architectures". The recommendations in this s ection are identical for the primary and standby databases when Oracle Data Guard is used. Some of these practices may reduce perform ance, but they are necessary to reduce or avoid outages. The minimal performance impact is outweighed by the reduced risk of corrupti on or the performance improvement for recovery.
This section includes the following recomme ndations:
Maintain two copies of the control file. If a single control file is damaged, then any Oracle instance fails when it atte mpts to access the damaged or missing control file. If another copy of the current control file is available, then an instance can be easily restarted after copying the good control file to the location of the bad control file. Database recovery is unnecessary.
Set the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter to a value that enables all on-disk backu
p information to be retained in the control file. Allocate 200 MB for each control file. CONTROL_FILE_RECORD_KEEP_TIME s
pecifies the number of days that records are kept within the control file before becoming a candidate for reuse. Set the CONTRO
L_FILE_RECORD_KEEP_TIME value to slightly longer than the oldest backup file that you intend to keep on disk, as determined by
the size of the flash recovery area. For example, if the flash recovery area is sized to maintain two full backups that are taken ev
ery 7 days, as well as daily incremental backups and archived redo log files, then set CONTROL_FILE_RECORD_KEEP_TIME to
a value like 21 or 30. Records older than this will be reused. However, the backup metadata will still be available in the RMAN recov
ery catalog.
All online redo log files should be the same size and configured to switch approximately once an hour during normal activity. They should switch no more frequently than every 20 minutes during peak activity.
There should be a minimum of four online log groups to prevent LGWR
from waiting for a group to be available following a log switch. A group may be unavailable because a checkpoint has not yet complete
d or the group has not yet been archived.
Use Oracle log multiplexing to create multiple redo log members i n each redo group. This protects against a failure involving the redo log, such as a disk corruption that exists on both sides of the disk mirror for one of the members, or a user error that accidentally removes a member. If at least one redo log member is available , then the instance can continue to function.
ARCHIVELOG mode enables the database to be backed up while it is online and is
necessary to recover the database to a point in time later than what has already been restored.
Architectures that include Oracle Data Guard require that the production database run in ARCHIVELOG mode before a
standby database is instantiated. ARCHIVELOG mode is required to maintain a standby database.
| See Also:
Oracle Database Administrator's Guide for more information about using automatic archiving |
By default, Oracle always tests the data blocks that it reads from disk. Enabling data and log block checksums
by setting DB_BLOCK_CHECKSUM to TRUE enables Oracle to detect other types of corruption caused by underlyi
ng disks, storage systems, or I/O systems. Before a data block is written to disk, a checksum is computed and stored in the block. Wh
en the block is subsequently read from disk, the checksum is computed again and compared with the stored checksum. Any difference is
treated as a media error and an ORA-1578 error is signaled. Block checksums are always maintained for the SYSTEM tablesp
ace.
In addition to enabling data block checksums, Oracle also calculates a checksum for ev ery redo log block before writing it to the current log. Redo record corruptions are found as soon as the log is archived. Without th is option, a corruption in a redo log can go unnoticed until the log is applied to a standby database or until a backup is restored a nd rolled forward through the log containing the corrupt log block.
RMAN also calculates ch ecksums when taking backups to ensure that all blocks being backed up are validated.
Turnin g on this feature typically has minimal overhead. Measure the performance impact with your workload on a test system and ensure that the performance impact is acceptable before introducing this feature on an active database.
| See Also: |
See Also:
|
With automatic undo management, the Oracle server effectively and efficient ly manages undo space, leading to lower administrative complexity and cost. When Oracle internally manages undo segments, undo block and consistent read contention are eliminated because the size and number of undo segments are automatically adjusted to meet the cur rent workload requirement.
To use automatic undo management, set the following parameters:< /p>
UNDO_MANAGEMENT = AUTOUNDO_RETENTION is the desired time in seconds to retain undo data. It must be the same on a
ll instances.UNDO_TABLESPACE should specify a unique undo tables
pace for each instance.Advanced object recovery features, such as Flashback Query, F
lashback Version Query, Flashback Transaction Query, and Flashback Table, require automatic undo management. These features depend on
the UNDO_RETENTION setting. Retention is specified in units of seconds. By default, Oracle automatically tunes undo ret
ention by collecting database use statistics and estimating undo capacity needs. You can affect this automatic tuning by setting the
UNDO_RETENTION initialization parameter. The default value of UNDO_RETENTION is 900. You do no
t need to set this parameter if you want Oracle to tune undo retention. The UNDO_RETENTION value can be changed dynamica
lly at any time by using the ALTER SYSTEM statement.
Setting UNDO_RETENT
ION does not guarantee that undo data will be retained for the specified period of time. If undo data is needed for transactio
ns, then the UNDO_RETENTION period is reduced so that transactions can receive the necessary undo data.
You can guarantee that unexpired undo data is not overwritten even if it means that future operations that
need to generate undo data will fail. This is done by specifying the RETENTION GUARANTEE clause for the undo tablespace
when it is created by either the CREATE DATABASE or CREATE UNDO TABLESPACE statement. Alternatively, you ca
n later specify this clause in an ALTER TABLESPACE statement.
With the retenti on guarantee option, the undo guarantee is preserved even if there is need for DML activity. (DDL statements are still allowed.) If t he tablespace is configured with less space than the transaction throughput requires, the following four things will occur in this se quence:
| See Also:
Oracle Database Administr
ator's Guide for more information about the |
Locally managed tab lespaces perform better than dictionary-managed tablespaces, are easier to manage, and eliminate space fragmentation concerns. Locall y managed tablespaces use bitmaps stored in the datafile headers and, unlike dictionary managed tablespaces, do not contend for centr ally managed resources for space allocations and de-allocations.
Automatic segment space management simplifies space admi nistration tasks, thus reducing the chance of human error. An added benefit is the elimination of performance tuning related to space management. It facilitates management of free space within objects such as tables or indexes, improves space utilization, and provid es significantly better performance and scalability with simplified administration. The automatic segment space management feature is available only with permanent locally managed tablespaces.
< /a>Temporary tablespaces improve t he concurrency of multiple sort operations, reduce sort operation overhead, and avoid data dictionary space management operations alt ogether. This is a more efficient way of handling temporary segments, from the perspective of both system resource usage and database performance.
A default temporary tablespace should be specified for the entire database to
prevent accidental exclusion of the temporary tablespace clause. This can be done at database creation time by using the DEFAU
LT TEMPORARY TABLESPACE clause of the CREATE DATABASE statement or after database creation by the ALTER DAT
ABASE statement. Using the default temporary tablespace ensures that all disk sorting occurs in a temporary tablespace and tha
t other tablespaces are not mistakenly used for sorting.
Resumable space allocation provides a way to suspend and later resume da tabase operations if there are space allocation failures. The affected operation is suspended instead of the database returning an er ror. No processes need to be restarted. When the space problem is resolved, the suspended operation is automatically resumed.
Set the RESUMABLE_TIMEOUT initialization parameter to the number of seconds of the re
try time.
The flash recovery area is an Oracle-managed directory, file system, or automatic storage management disk group that provide s a centralized disk location for backup and recovery files. The flash recovery area is defined by setting the following database ini tialization parameters:
DB_RECOVERY_FILE_DEST: De
fault location for the flash recovery areaDB_RECOVERY_FILE_DEST_SIZE: Specifies (in bytes) the hard limit on the total space to be used by target database recovery files created in the recovery are
a locationThe bigger the flash recovery area, the more beneficial it becomes. The mi nimum recommended disk limit is the sum of the database size, the size of incremental backups, the size of all archived redo logs tha t have not been copied to tape, and the size of the flashback logs.
| See Also:
Oracle Database Backup and Recovery Advanced User's Guide and Oracle Database Backup and Recovery Basics for detailed information about sizing the flash recovery area and setting the retention period |
Flashback Database is a revolutionary recovery feature that operates on only the changed data, thereby making the time to correct an error equal to the time to cause the error without recovery time being a func tion of the database size. You can flash back a database from both RMAN and SQL*Plus with a single command instead of a complex proce dure. Flashback Database is similar to conventional point-in-time recovery in its effects, enabling you to return a database to its s tate at a time in the recent past. However, Flashback Database is much faster than point-in-time recovery, because it does not requir e restoring datafiles from backup or extensive application of redo data.
To enable Flashbac
k Database, set up a flash recovery area, and set a flashback retention target (DB_FLASHBACK_RETENTION_TARGET initializa
tion parameter), to specify how far back into the past in minutes you want to be able to restore your database using Flashback Databa
se. To enable Flashback Database, execute the ALTER DATABASE FLASHBACK ON statement. It is important to note that the fl
ashback retention target is a target, not an absolute guarantee that Flashback Database will be available. If your flash recovery are
a is not large enough to hold required files such as archived redo logs and other backups, then flashback logs may be deleted to make
room in the flash recovery area for these required files. To determine how far you can flash back at any one time, query the V
$FLASHBACK_DATABASE_LOG view. If you have a standby database, then set FLASHBACK_RETENTION_TIME to be the same fo
r both primary and standby databases.
The biggest threat to corporate data comes from employees an d contractors with internal access to networks and facilities. Corporate data is one of a company's most valuable assets that can be at grave risk if placed on a system or database that does not have proper security measures in place. A well-defined security policy can help protect your systems from unwanted access and protect sensitive corporate information from sabotage. Proper data protection reduces the chance of outages due to security breeches.
In addition to the "High Availabili ty" section in Chapter 9, "Oracle Security Products and Features", the Oracle Security Overview manual is a h igh-level guide to technical security solutions for the data security challenge. Consult Part II, "Technical Solutions to Security Ri sks" of the Oracle Security Overview for an overview of techniques for implementing security best practices. For a much more detailed view of security policies, checklists, guidelines, and features, see the Oracle Database Security Guide
The Database Resource Manager gives database adminis trators more control over resource management decisions, so that resource allocation can be aligned with the business objectives of a n enterprise. The Database Resource Manager provides the ability to prioritize work within the Oracle system. Availability of the dat abase encompasses both its functionality and performance. If the database is available but users are not getting the level of perform ance they need, then availability and service level objectives are not being met. Application performance, to a large extent, is affe cted by how resources are distributed among the applications that access the database. The main goal of the Database Resource Manager is to give the Oracle database server more control over resource management decisions, thus circumventing problems resulting from in efficient operating system management and operating system resource managers.
< a name="1016346">The server parameter file (SPFILE) enable
s a single, central parameter file to hold all of the database initialization parameters associated with all of the instances associa
ted with a database. This provides a simple, persistent, and robust environment for managing database parameters.
An SPFILE is required when using the Data Guard Broker.
The practices that are recommended in this section affect the performance, avail ability, and MTTR of your system. These practices build on the single instance database configuration best practices. The practices a re identical for the primary and standby databases if they are used with Data Guard in the MAA architecture. Some of these practices may reduce performance levels, but they are necessary to reduce or avoid outages. The minimal performance impact is outweighed by the reduced risk of corruption or the performance improvement for recovery.
The rest of this s ection includes the following topics:
The listen
ers should be cross-registered by using the REMOTE_LISTENER parameter so that all listeners know about all services and
in which instances the services are running. The listeners should use server-side load balancing, which can be based on session count
for connection. The listeners must be listening on the virtual IP addresses and on the cluster alias, when it is available. The list
eners must not listen on the hostname. Listening on the hostname will result in disconnected sessions when vi
rtual IPs are relocated automatically back to their owning nodes.
| See Also: |
| Questions | Recom mendations |
|---|---|
|
Run the following query: SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED ORDER BY OWNER, TABLE_NAME; Rows returned - Use a physical standby database or investigate changing to supported datatypes No rows returned - Go to next question | |
|
See Also: "Oracle9i Data Guard: SQL Apply Best Practices" at |
Yes - Evaluate a logical standby database No - Use a physical standby database |
Table 7-2 shows the recommendatio ns for configuration that apply to both logical and physical standby databases and those that apply to only logical and only physical .
| Recommendations for Both Physical and Logical Standby Databases | Recommendations for Physical Standby D atabases Only | Recommendations for Logical Standby Databases Only |
|---|---|---|
|
Use Multiplexed Standby Redo Logs and C onfigure Size Appropriately |
- |
|
|
- |
||
|
- |
||
|
Configure the Database and Listener for Dynamic Service Registration |
- |
|
|
- |
- | |
|
- |
-< /p> | |
|
Conduct a Performance Assessment with the Proposed Network Configuration |
- |
- |
|
Use a LAN or MAN for Ma ximum Availability or Maximum Protection Modes |
- |
- |
|
- |
- | |
|
- |
||
|
Use the ASYNC Attribute with a 50 MB Buffer for Maximum Perfo rmance Mode |
- |
- |
|
- |
- | |
|
- |
- | |
|
- |
- | |
|
- |
- | |
|
- |
- |
This archiving strategy is based on the following assumptions:
Table 7-3 describes the recommendations for a robust archiving strategy.< /p>
| Recommendatio n | Description |
|---|---|
|
Archiving must be started on the primary database |
Maintaining a standby database requires archiving to be enabled and started on the primary database. SQL> SHUTDOWN IMMEDIATE SQL> STARTUP MOUNT; a>SQL> ALTER DATABASE ARCHIVELOG; SQL> ALTER DATABASE OPEN; |
|
Remote archiving must be enabled. |
|
|
Use a consistent log format ( |
If the flash recovery is used, then this format is ignor ed. For example: |
|
Local archiving is done first by the arch
iver process ( |
Using |
|
Remote archiving should be done to only one standby instance and node for each standby RAC database. |
All production instances archive to one standby destination, using the same net service name. Oracle Net Services connect-t ime failover is used if you want to automatically switch to the secondary standby host when the primary standby instance has an outag e. |
|
The standby a rchive destination should use the flash recovery area. |
For simplici
ty, the standby archive destination ( |
|
The logic al standby archive destination cannot use the flash recovery area. |
For a logical standby database, |
|
Specify role-based destinations with the |
The See Also: Appendix B, "Database SPFILE and Oracle Net Configuration File Samples" |
The following example illustrates the recommended initialization parameters for a primary database commun
icating to a physical standby database. There are two instances, SALES1 and SALES2, running in maximum prot
ection mode.
*.DB_RECOVERY_FILE_DEST=/recoveryarea *LOG_ARCHIVE_DEST_ 1='SERVICE=SALES LGWR SYNC=NOPARALLEL AFFIRM REOPEN=15 MAX_FAILURE=10 VALID_FOR=(ONLINE+LOGFILES, ALL ROLES )' *.LOG_ARCHIVE_DEST_STATE_1='ENABLE' *.STANDBY_ARCHIVE_DEST='USE_DB_RECOVERY_FILE_DEST'
Note the following observations for this example:
PARALLEL attribute is used when there are multiple standby
destinations. When SYNC is set to PARALLEL, the LGWR process initiates an I/O operation to eac
h standby destination at the same time. Because there is a single standby destination, the NOPARALLEL option is set to r
educe overhead.REOPEN=15 MAX_FAILURE=10 setting denotes that
if there is a connection failure, then the connection is reopened after 15 seconds and is retried up to 10 times.VALID_FOR clause is used to designate the role for a destination. When the data
base is in a physical standby role, remote destination LOG_ARCHIVE_DEST_1 is not used because a physical standby databas
e does not use online log files.The flash recovery area must be accessible to any no de within the cluster and use a shared file system technology such as automatic storage management (ASM), a cluster file system, a gl obal file system, or high availability network file system (HA NFS). You can also mount the file system manually to any node within t he cluster very quickly. This is necessary for recovery because all archived redo log files must be accessible on all nodes.
On the standby database nodes, recovery from a different node is required when Node 1 fails and can not be restarted. In that case, any of the existing standby instances residing on a different node can initiate managed recovery. In the worst case, when the standby archived redo log files are inaccessible, the new managed recovery process (MRP) or logical standby process (LSP) on the different node fetches the archived redo log files using the FAL server to retrieve from the production nodes di rectly.
When configuring hardware vendor shared file system technology, verify the performa nce and availability implications. Investigate the following issues before adopting this strategy:
Standby redo logs (SRLs) should be used on b oth sites. Use Oracle log multiplexing to create multiple standby redo log members in each standby redo group. This protects against a failure involving the redo log, such as disk corruption that exists on both sides of the disk mirror for one of the members or a us er error that accidentally removed a member.
Use this formula to determine the number of SR Ls:
# of SRLs = sum of all production online log groups per thread + number of threadsFor example, if a primary database has two i nstances (threads) and each thread has four online log groups, then there should be ten SRLs. Having one more standby log group for e ach thread than the number of the online redo log groups for the production database reduces the likelihood that the
LGWRfor the production instance is blocked because an SRL cannot be allocated on the standby.The following are additional guidelines for creating SRLs:
ALTER DATA BASE ADD STANDBY LOGFILE THREAD 1 GROUP 10 '/dev/vx/rdsk/ha10-dg/DGFUN stbyredo10 01.log' SIZE 50M REUS E;
The remote file server (RFS) process for the standby database writes only to an SRL whose size is identical to the size of an online redo log for the production database. If it cannot f ind an appropriately sized SRL, then RFS creates an archived redo log file directly instead and logs the following message in the ale rt log:
No standby redo log files of size <#> blocks available.
When the production database is in FORCE LOGGING mode, a
ll database changes are logged except for those in temporary tablespaces and temporary segments. FORCE LOGGING mode ensu
res that the standby database remains consistent with the production database. If this is not possible because you require the load p
erformance with NOLOGGING operations, then you must ensure that the corresponding standby datafiles are subsequently syn
chronized. After completing the nologging operations, a production backup of the affected datafiles needs to replace the correspondin
g standby datafiles. Before the file transfer, the physical standby database must stop recovery and the logical standby database must
temporarily take the affected tablespaces offline.
You can enable force logging immediatel
y by issuing an ALTER DATABASE FORCE LOGGING statement. If you specify FORCE LOGGING, then Oracle waits for
all ongoing unlogged operations to finish.
Using real time apply enables the log apply services to apply redo data (physical standby database) or SQL (logi cal standby database) as it is received without waiting for the current standby redo log file to be archived. This results in faster switchover and failover times because the standby redo log files are applied to the standby database before failover or switchover be gins.
For a physical standby database, use the following SQL statement
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE;
For a logical standby database, use the following SQL statement:
a>ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
The setting for the LOG_ARCHIVE_DEST_2 initialization parameter and the settings for the FAL_SERVER<
/code> and FAL_CLIENT initialization parameters depend on a proper Oracle Net configuration. For the Oracle Data Guard t
ransport service and the gap resolution feature to work, the SPFILE, listener.ora, tnsnames.ora, and sqlnet.ora files must be consistent.
See Also:
|
The remote archive destination FAL_CLIENT and FAL_SERVER parameters require an Oracle Net service. This service is represented as a net service name entry in t
he local tnsnames.ora file. Notice that the FAL_SERVER and FAL_CLIENT reference the same Oracl
e network service name. This is possible because the FAL_SERVER service is defined in the standby tnsnames.ora file, whereas the FAL_CLIENT service is defined in the primary tnsnames.ora file. This works only when
you use the Oracle Network Service local naming method. If you are not using the local naming method, then you must have different se
rvice names. Furthermore, Oracle recommends using dynamic service registration instead of a static SID list in the listener configura
tion. To ensure that service registration works properly, the server parameter file should contain the following parameters:
SERVICE_NAMES for the database service nameINSTANCE_NAME for the instance nameLOCAL_LISTENER to specify a nondefault listener addressPMON attempts to resolve LOCAL_LISTENER
using some naming method. In the case described here, PMON finds the corresponding name in the local tnsna
mes.ora file.
For example:
SALES1.INSTANC E_NAME='SALES1' SALES2.INSTANCE_NAME='SALES2' *.LOG_ARCHIVE_DEST_2='SERVICE=SALES LGWR SY NC=NOPARALLEL AFFIRM REOPEN=15 MAX_FAILURE=10' *.LOCAL_LISTENER='SALES_lsnr' *.SERVICE_NA MES='SALES' # required for service registration *.FAL_SERVER='SALES' *.FAL_CLIENT='SAL ES'
The listener.ora file sh
ould be identical for each primary and secondary host except for HOST settings. Because service registration is used, th
ere is no need for statically configured information.
The local tnsnames.ora f
ile should contain the net service names and the local listener name translation. To use the same service name on each node, you must
use a locally managed tnsnames.ora file for the production and standby databases. On the primary cluster, the tns
names.ora entry, SERVICE_NAME, should equal the setting of the SERVICE_NAMES SPFILE parameter. If th
e listener is started after the instance, then service registration does not happen immediately. In this case, issue the ALTER
SYSTEM REGISTER statement on the database to instruct the PMON background process to register the instance with t
he listeners immediately.
Reducing the number of round trips across the network is essential for optimizing the trans portation of redo log data to a standby site. With Oracle Net Services it is possible to control data transfer by adjusting the size of the Oracle Net setting for the session data unit (SDU). In a WAN environment, setting the SDU to 32K can improve performance. The SDU parameter designates the size of an Oracle Net buffer before it delivers each buffer to the TCP/IP network layer for transmission across the network. Oracle Net sends the data in the buffer either when requested or when it is full. Oracle internal testing of Ora cle Data Guard on a WAN has demonstrated that the maximum setting of 32K (32768) performs best on a WAN. The primary gain in performa nce when setting the SDU is a result of the reduced number of calls to packet the data.
In
addition to setting the SDU parameter, network throughput can often be substantially improved by using the SQLNET.
SEND_BUF_SIZE and SQLNET.RECV_BUF_SIZE Oracle Net parameters to increase the size of the network TCP send and rec
eive I/O buffers.
In some situations, a business cannot afford to lose data at any cost. In other situations, the availa bility of the database may be more important than protecting f data. Some applications require maximum database performance and can t olerate a potential loss of data if a disaster occurs.
Choose one of the following protecti on modes:
This section includes the following topics:
| See Also:
Oracle Data Guard Concepts and Administration for more i nformation about data protection modes |
To determine the correct data protection mode for your application, ask the questions in Table 7-4.
Is data loss acceptable if the prim ary site fails?
No: Use maximum protection or maximum availability modes.
How much data loss is tolerated if a site is lost?
None: Use maximum protection or maximum availability modes.
Some: Use maximum performance mode with ASYNC=blocks. The value for the number of bl
ocks determines the maximum amount of possible redo data loss if a site fails.
Is potential data loss between the production and the standby database s tolerated when a standby host or network connection is temporarily unavailable?
Yes: Use maximum performance or maximum availability modes.
No: Use maximum p rotection mode.
How far away should the disaster recovery site be from the primary site?
The distance between sites and the network infrastructure between the sites determines network latency. In general, the latency increases with distance. Determine the minimum distance between sites to provide for outage isolation and minimal network latency. A ssess what data centers are available for your company, or assess Oracle outsourcing services.
What is the current or proposed network bandwidth and latency between sites?
Bandwidth must be greater than maximum redo generation rate. A guideline for two-way communication is for bandwidth to be 50 percent of the stated line capacity, but you must co nsider network usage of other applications.
Using maximum performance mode with asynchronous log transport or the archiver mitigates the effect on performance.
The default data protection mode is maximum performance mode. After a failover to the standby database, the protection mode automatically changes to maximum performance mode. Switchover operations do not change the protection mode.
To change the data protection mode from maximum performance to maximum ava ilability or maximum protection, perform the following steps:
LGWR SYNC option and a valid net service name during
startup. For maximum performance mode, use the LGWR ASYNC option with a valid net service name.SHUTDOWN IMMEDIATE; STARTUP MOUNT EXCLUSIVE;
ALTER DATABASE SET STANDBY TO MAXIMIZE [AVAILABILITY | PROTECTION];
To change the protection mode from maximum protection to maximum performance or maximum availability, use a statement similar to the following:
ALTER DATABASE SET STANDBY TO MAXIMIZE [PERFORMANCE | AVAILABILITY];
<
font face="Arial, Helvetica, sans-serif">See Also:
|
Oracle recommends that you con duct a performance assessment with your proposed network configuration and current (or anticipated) peak redo rate. The network impac t between the production and standby databases and the impact on the primary database throughput needs to be understood. Because the network between the production and standby databases is essential for the two databases to remain synchronized, the infrastructure mu st have the following characteristics:
The required bandwidth of a dedicated network connection is determined by
the maximum redo rate of the production database. You also need to account for actual network efficiency. Depending on the data prot
ection mode, there are other recommended practices and performance considerations. Maximum protection mode and maximum availability m
ode require LGWR SYNC transport. Maximum performance protection mode uses the ASYNC transport option or the
archiver (ARCHn) instead of LGWR to transfer the redo. These recommendations were de
rived from an Oracle internal performance study that measured the impact of network latency on primary database throughput for each O
racle Data Guard transport option: ARCH, LGWR ASYNC, and LGWR SYNC.
| See Also:
|
The network infrastructure betwe en the primary and secondary sites must be able to accommodate the redo traffic because the production database redo data is updating the physical standby database. If your maximum redo traffic at peak load is 8 MB/second, then your network infrastructure must have sufficient bandwidth to handle this load. Furthermore, network latency affects overall throughput and response time for OLTP and batc h operations.
When you compare maximum protection mode or maximum availability mode with LGWR ASYNC operations, measure whether performance or
throughput will be degraded due to the incurred latency. You should also check whether the new throughput and response time are with
in your application performance requirements. Distance and the network configuration directly influence latency, while high latency m
ay slow down your potential transaction throughput and increase response time. The network configuration, number of repeaters, the ov
erhead of protocol conversions, and the number of routers also affect the overall network latency and transaction response time.
Maximum ava
ilability mode or maximum protection mode require the Oracle Data Guard transport service to use the LGWR SYNC transport
option. Network latency is an additional overhead for each LGWR SYNC I/O operation. Figu
re 7-1 shows that LGWR SYNC writes both locally to the online redo log and remotely through the network to the
RFS process to the standby redo logs.
Text description of the illustration maxav005.gif
The following formulas emphasize that the remote write is always slower than the local write and is the limiting facto
r when LGWR synchronous writes are occurring.
Local write = local write I/O t ime Remote write = network round trip time (RTT) + local write I/O time (on standby machine)
Using an example in which the network round trip time (RTT) is 20 milliseconds and
LGWR synchronous write is configured, every transaction commit time increases by 20 milliseconds. This overhead impacts
response time and may affect primary database throughput. Because of the additional overhead incurred by the RTT, a local area networ
k (LAN) or a metropolitan area network (MAN) with an RTT less than or equal to 10 milliseconds should be used for applications that c
annot tolerate a change in performance or response time. Whether to use a LAN or MAN depends on the results of the performance assess
ment.
With only one remote standby
destination within a LAN, with sufficient bandwidth and low latency, Oracle recommends the LGWR SYNC=NOPARALLEL AFFIRM
option for the best performance with maximum data protection capabilities. When no data loss is required and there is only one remote
archive destination, SYNC=NOPARALLEL performs better than SYNC=PARALLEL (the default) with a single standb
y destination.
If SYNC=PARALLEL is used, then the network I/O is initiated asy
nchronously, so that I/O to multiple destinations can be initiated in parallel. However, after the I/O is initiated, LGWR waits for each I/O operation to complete before continuing. This is, in effect, the same as performing multiple synchronous I/O ope
rations simultaneously. If the Data Guard configuration has more than one standby database, then use a cascading standby architecture
to minimize overhead to the primary database. If business requirements stipulate that the production database must transfer to multi
ple standby databases, then set SYNC=PARALLEL.
See Also:
|
The ARCH attribute of the LOG_ARCHIVE_
DEST_n initialization parameter provides the greatest performance throughput but the greatest data loss
potential. ARCH does not affect primary performance when latency increases as long as the redo logs are configured corre
ctly as described in "Configure the Size of Redo Log Files and Groups Appropriately". This is recommended for maximum performance data protection mode and is the default.
The effects of latency on primary throughput are detailed in the following white paper.
| See Also:
"Oracle9i Data Guard: Primary Site and Network Configuration Best Practices" at |
The largest LGWR ASYNC buffer of 50 MB (ASYNC=102400) performs best in a WAN. In LAN tests, different asynchronous buffer sizes did not impact the primary database throughput. Using
the maximum buffer size also increases the chance of avoiding timeout messages that result from an "async buffer full" condition in
a WAN.
If the network buffer becomes full and remains full for 5 seconds, then the transpor
t times out and converts to the ARCH transport. This condition indicates that the network to the standby destination can
not keep up with the redo generation rate on the primary database. This indicated in the alert log by the following message:
'Timing out on NetServer %d prod=%d,cons=%d,threshold=%d"
This message indicates that the standby destination configured with the LGWR ASYNC attributes
encountered an "async buffer full" condition. When this occurs, log transport services automatically stop using the network server p
rocess to transmit the redo data and convert to using the archiver process (ARCn) until a log swit
ch occurs. The next log transmission reverts to the ASYNC transport. This change occurs automatically. Using the largest
asynchronous network buffer, 30MB, increases the chance of avoiding the transport converting to ARCH and potentially lo
sing more data.
Figure 7-2 shows the architecture w
hen the standby protection mode is set to maximum performance with LGWR ASYNC configuration. The LGWR reque
st is buffered if sufficient space is available in the network buffer. If the production database fails and is inaccessible, then the
data in the network buffer is lost, which usually means seconds of data loss in high OLTP applications.
Text description o f the illustration maxav006.gif
Evaluate SSH port forwarding with compression for maximum performance mode over a high-latency WAN (RTT greater than 100 millise
conds). Coupled with using LGWR ASYNC, the maximum buffer size, SSH with compression reduces the chance of receiving an
"async buffer full" timeout. It also reduces network traffic.
See Also:
|
Setting LOG_ARCHIVE_LOCAL_FIRST to TRUE enables the archiver processes to archive the local online redo
log files on the primary database before transmitting the redo data to remote standby destinations. This is especially useful when t
he network to the standby databases is slow.
This is the default setting for LOG_ARCH
IVE_LOCAL_FIRST.
Because a la
ck of security can directly affect availability, Data Guard provides a secure environment and prevents tampering with redo data as it
is being transferred to the standby database. To enable secure transmission of redo data, set up every database in the Data Guard co
nfiguration to use a password file, and set the password for the SYS user identically on every system. The following is
a summary of steps needed for each database in the Data Guard configuration:
REMOTE_LOGIN_PASSWORDFILE=[EXCLUSIVE | SHARED] initialization parameter on eac
h instance.After you have performed these steps to set up security on every database
in the Data Guard configuration, Data Guard transmits redo data only after the appropriate authentication checks using SYS credentials are successful. This authentication can be performed even if Oracle Advanced Security is not installed and provides s
ome level of security when shipping redo data. To further protect redo data (for example, to encrypt redo data or to compute an integ
rity checksum value for redo traffic over the network to disallow redo tampering on the network), Oracle recommends that you install
and use Oracle Advanced Security.
Se
e Also:
|
Specify a unique name for the standby database.
The name does not change even if the primary and standby databases reverse roles. The DB_UNIQUE_NAME parameter defaults
to the value of the DB_NAME parameter.
Specify the DG_CONFIG attribute of the LOG_ARCH
IVE_CONFIG initialization parameter so that it lists the DB_UNIQUE_NAME for the primary database and each standby
database in the Data Guard configuration. By default, this parameter enables the primary database to send redo data to remote destin
ations and enables standby databases to receive redo data. The DG_CONFIG attribute must be set to enable the dynamic add
ition of a standby database to a Data Guard configuration that has a RAC primary database running in either maximum protection or max
imum availability mode.
The following recommendation applies only to the physical standby databas e:
To use Oracle Data Guard with a physical standby database or to use any media recovery operation effectively, you need to tune your database recovery.
| See Also:
"Oracle9i Media Recovery Best Practices" at |
The following recommendations apply only to the logical standby database:
Use supplemental logging and primary key constraints on all prod uction tables.
If your application ensures that the rows in a table are unique, then you ca
n create a disabled primary key RELY constraint on the table. This avoids the overhead of maintaining a primary key on t
he primary database. To create a disabled RELY constraint on a primary database table, use the ALTER TABLE
statement with a RELY DISABLE clause.
To improve the performance of SQL Apply, add an index to the columns that uniquely identify the row on the logical standby database. Failure to do this results in full table scans.
If the logical standby d
atabase is being used to remove reporting or decision support operations from the primary database, then you should probably reserve
some of the parallel query slaves for such operations. Because the SQL Apply process by default uses all the parallel query slaves, s
etting the MAX_SERVERS initialization parameter enables a specified number of parallel query slaves to be reserved.
Table 7-5 shows examples of MAX_SERVERS va
lues.
It is recommended that MAX_SERVERS be set initially to the larger of the following values: 9 or 3 plus 3 times CPU.
Increase the PA
RALLEL_MAX_SERVERS initialization parameter by the larger of 9 or 3 times CPU on both the primary and standby instances:
The PARALLEL_MAX_SERVERS initialization parameter specifies the maximum number of parallel
query processes that can be created on the database instance. With the exception of the coordinator process, all the processes that
constitute the SQL Apply engine are created from the pool of parallel query processes. The SQL Apply engine, by default, uses all the
parallel query processes available on the database instance. This behavior can be overridden by using the logical standby parameters
It is recommended that PARALLEL_MAX_SERVERS be increased by the value of
The logical standby database supports the following methods of data application:
FULL or READ_ONLY transaction consiste
ncy.TRANSACTION_CONSISTENCY to NONE.If the logical standby database will be used for reporting or decision support operations, then:
FULL.READ_ONLY.Database objects that do not need to be repli
cated to the standby database should be skipped by using the DBMS_LOGSTDBY.SKIP procedure. Skipping such objects reduces
the processing of the the SQL Apply engine. Consider this recommendation in a decision support environment.
This section recommends configuration practices in addition to t he ones that are discussed for the single-instance database, RAC, and Data Guard. These practices are recommended when MAA is employe d (RAC and Data Guard are used on both sites).
This section includes the following topics:< /p>
In an MAA environment, the standby database uses RAC, and multiple standby instances are associated with the same standby database. Havi ng multiple standby instances is not the same as having multiple standby databases. Only one instance can have the managed recovery p rocess (MRP) or the logical standby apply process (LSP). The standby instance with the MRP or LSP is called the primary standby insta nce. All other standby instances are called secondary standby instances.
Having multiple st andby instances for the same database on the cluster provides the following benefits:
D ata Guard connect-time failover occurs when a connection request is forwarded to another listener if the connection fails. Connect-ti me failover is enabled by service registration, because the listener knows which available Oracle instance provides the requested ser vice.
The following is an Oracle Net connection descriptor in the tnsnames.ora
file:
sales.us.acme.com= (DESCRIPTION= (ADD RESS_LIST= (ADDRESS=(PROTOCOL=tcp)(HOST=sales1-server)(PORT=1521)) (ADDRESS=(PROT OCOL=tcp)(HOST=sales2-server)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=sales.us.ac me.com)))
Note that the SALES net service name conta
ins multiple address lists (two because it is a two-node cluster) for the production and standby clusters. The second address list en
ables connect-time failover if the first connection fails. This works for all protection modes.
To add a network protocol address to an existing net service name or database service, use either Oracle Enterprise Manager or O racle Net Manager.
While it is prudent that every database have a good backup, recovery using a backup is not alway s the fastest solution. Other available Oracle technologies, such as RAC, Data Guard, and flashback technology often provide faster m eans of recovering from an outage than restoring from backups.
A good backup and recovery s trategy is still vital to the overall high availability solution and ensures that specific outages are recovered from in an acceptabl e amount of time. The following topics are included in this section:
See Also:
|
| See Also: |
Oracle's change tracking feature for incremental backups improves incremental backup performance by recording changed blocks in each datafile in a change tracking fi le. If change tracking is enabled, then RMAN uses the change tracking file to identify changed blocks for incremental backup, thus av oiding the need to scan every block in the datafile.
Using automatic disk-based backup and recov ery, you can create a flash recovery area, which automates management of backup-related files. Choose a location on disk and an upper bound for storage space and set a retention policy that governs how long backup files are needed for recovery. Oracle manages the st orage used for backup, archived redo logs, and other recovery-related files for your database within that space. Files no longer need ed are eligible for deletion when RMAN needs to reclaim space for new files.
| See Also: |
Use the BACKUP RECOVERY
FILE DESTINATION RMAN command to move disk backups created in the flash recovery area to tape. Tape backups are used for offs
ite and long-term storage and are used to handle certain outage scenarios.
The backup retention policy is t he rule set regarding which backups must be retained (on disk or other backup media) to meet recovery and other requirements. It may be safe to delete a specific backup because it is old enough to be superseded by more recent backups or because it has been stored on tape. You may also need to retain a specific backup on disk for other reasons such as archival requirements. A backup that is no lon ger needed to satisfy the backup retention policy is said to be obsolete.
< p class="BP">Backup retention policy can be based on redundancy or a recovery wind ow. In a redundancy-based retention policy, you specify a number n such that you always keep at leas t n distinct backups of each file in your database. In a recovery window-based retention policy, you specify a time interval in the past (for example, one week or one month) and keep all backups required to let you perform point-in-time recov ery to any point during that window.Frequent backups are essential for any recovery scheme . Base the frequency of backups on the rate or frequency of database changes such as:
The more frequently your database is updated, the more often you should perform database backups. If da tabase updates are relatively infrequent, then you can make whole database backups infrequently and supplement them with incremental backups, which will be relatively small because few blocks have changed.
< a name="1016801">Configuring the size of the flash recovery area properly enables fast recovery from user error with Flashback Database and fast recovery from data failure with file or block media recovery from disk. The appropriate size of the flash recovery area depends on the following: retention policy, b ackup frequency, size of the database, rate and number of changes to the database. Specific formulas for determining the proper size of the flash recovery area for different backup scenarios are provided in Oracle Database Backup and Recovery Basics.
Take backups at the primary and secondary sites. The ad vantages of this practice are as follows:
Co nsider a scenario in which backups are done only at the secondary site. Suppose there is a site outage at the secondary site where th e estimated time to recover is three days. The primary site is completely vulnerable to an outage that is typically resolved by a fai lover, also to any outage that could be resolved by having a local backup (such as a data failure outage resolved by block media reco very). In this scenario, a production database outage can be resolved only by physically shipping the off-site tape backups that were taken at the standby site. If primary site backups were available, then restoring locally would be an available option in place of t he failover than cannot be done. Data may be lost, but having primary site backups significantly shortens the MTTR.
Another undesirable approach is to start taking primary site backups at the time that there is a secondary s ite outage. However, this approach should be avoided because it is introducing new processes and procedures at a time when the enviro nment is already under duress and the impact of a mistake by staff will be magnified. Also, it is not a time to learn that backups ca nnot be taken at the primary site.
In addition, primary site disk backups are necessary to ensure a reasonable MTTR when using RMAN file or block media recovery. Without a local on-disk backup, a backup taken at the standby site must be restored to the primary site, significantly lengthening the MTTR for this type of outage.
During backups, use the target
database control file as the RMAN repository and resynchronize afterward with the RMAN RESYNC CATALOG command.
When creating backups to disk or tape, use the target database control file as the RMAN repository s
o that the ability to back up or the success of the backup does not depend on the availability of the RMAN catalog in the manageabili
ty database. This is accomplished by running RMAN with the NOCATALOG option. After the backup is complete, the new backu
p information stored in the target database control file can be resynchronized with the recovery catalog using the RESYNC CATAL
OG command.
Using the V$DATABASE_BLOCK_CORRUPTION view, which can b
e used by RMAN block media recovery.
If BLOCK CHANGE TRACKING is enabled, then
do not use the INCREMENTAL LEVEL option with BACKUP VALIDATE to ensure that all data blocks are read and v
erified.
To detect all types of corruption that are possible to detect:
MAXCORRUPT optionNOCHECKSUM optionCHECK LOGICAL optionComplete, successful, and tested backups are fundamental to the success of any recovery. Create test plans for the differ
ent outage types. Start with the most common outage types and progress to the least probable. Issuing backup procedures does not ensu
re that the backups are successful; they must be rehearsed. Monitor the backup procedure for errors, and validate backups by testing
your recovery procedures periodically. Also, validate the ability to do backups and restores by using the RMAN commands BACKUP
VALIDATE and RESTORE... VALIDATE commands.
The Oracle Cluster Registry (OCR) contains cluster and da
tabase configuration information for RAC and Cluster Ready Services (CRS), such as the cluster database node list, CRS application re
source profiles, and Event Manager (EVM) authorizations. Using the ocrconfig tool, there are two methods of copying OCR
content and using the content for recovery. The first method uses automatically generated physical OCR file copies. The second method
uses manually created logical OCR export files. The backup file created with ocrconfig should be backed as part of the
operating system backup using standard operating system or third-party tools.
| See Also: |