Skip Headers

Oracle® Database Data Warehousing Guide
10g Release 1 (10.1)

Part Number B10736-01
Go to Documentation Home
Home
Go to Book
 List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

24 Using Parallel Execution

This chapter covers tuning in a parallel execution environment and discusses:

Introduction to Parallel Execution Tuning

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support system s (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and h ybrid systems. Parallel execution improves processing for:

  • Queries requiring large table scans, joins, o r partitioned index scans

  • Creation of large indexes

  • Creation of large table s (including materialized views)

  • Bulk inserts, updates, merges, and deletes

You can als o use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access larg e objects (LOBs).

Parallel execution benefits systems with all of the following chara cteristics:

  • Symmetric multiprocessors (SMPs), clusters, or massively parallel systems

  • Sufficient I/O bandwidth

  • Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)

  • Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.

When to Implement Parallel Execution

The benefits of parallel execution can be seen in DSS and data warehousing envi ronments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT statements that characterize OLTP applications would not see a ny benefit from being executed in parallel.

When Not to Implement Parallel Execution

Parallel execution is not normally useful for:

  • Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may out weigh the benefits of parallelism.

  • Environments in which the CPU, memory, or I/O resources are already h eavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are availabl e, then parallel execution will not yield any benefits and indeed may be detrimental to performance.

Operations That Can Be Parallelized

You can use parallel execution for any of the following:

  • Access methods

    Some examples are table s cans, index full scans, and partitioned index range scans.

  • Join methods

    Some examples are nested loop, sort merge, hash, and star transformation.

  • DDL statements

    Some examples are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX , REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCE PARTITION.

    You can normally use parallel DDL where you use regular DDL. There are, however, some additional de tails to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns.

    All of these DDL operations can be performed in NOLOGGING mode f or either parallel or serial execution.

    The CREATE TABLE statement for an index-organized table can be paralle lized either with or without an AS SELECT clause.

    Different parallelism is used for different operat ions. Parallel create (partitioned) table as select and parallel create (partitioned) index run with a degree of parallelism equal to the number of partitions.

    Parallel operations require accurate statistics to perform optimally.

  • DML statements

    Some examples are INSERT AS SELECT, updates, deletes, and MERGE operations.

    Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or s cale up large DML operations against large database tables and indexes. You can also use INSERT ... SELECT statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use reg ular DML.

    Although data manipulation language (DML) normally includes queries, the term parallel DML refers only to inserts, u pdates, upserts and deletes done in parallel.

  • Miscellaneous SQL operations

    Some examples are GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as aggregate and table functions.

  • Parallel query

    You can parallelize queries and subqueries in SELECT statements, as well as the que ry portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).

  • SQL*Loader

    You can parallelize the use of SQL*Loader, where large amounts of data are routinely encountered. To speed up your loads, you can use a parallel direct-path load as in the following example:

    sqlldr USERI
    D=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE
    sqlldr USERID=SCOTT/TIGER CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE
    sqlld
    r USERID=SCOTT/TIGER CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
    
    

    You can also use a parameter file to achieve the same th ing.

    An important point to remember is that indexes are not maintained during a parallel load.

How Parallel Execution Works

Parallel executio n divides the task of executing a SQL statement into multiple small units, each of which is executed by a separate process. Also the incoming data (tables, indexes, partitions) can be divided into parts called granules. The user shadow process that wants to execute a query in parallel takes on the role as parallel execution coordinator or query coordinator. The query coordinator does the followin g:

  • Parses the query and determines the degree of parallelism

  • Allocates one or two set of slaves (threads or processes)

  • Controls the query and sends instructions to the PQ slaves

  • Determines which tables or indexes need to be scanned by the PQ slaves

  • Produ ces the final output to the user

Degree of Parallelism

The parallel execution coordinator may enlist two or more of the instance's parallel execution servers to process a SQL statement. The number of parallel execution servers associated with a s ingle operation is known as the degree of parallelism.

A single operation is a part of a SQL statement such as an order by, a full table scan to perform a join on a nonindexed column table.

Note that the degree of parallelism applies directly only to i ntra-operation parallelism. If inter-operation parallelism is possible, the total number of parallel execution servers for a statemen t can be twice the specified degree of parallelism. No more than two sets of parallel execution servers can run simultaneously. Each set of parallel execution servers may process multiple operations. Only two sets of parallel execution servers need to be active to g uarantee optimal inter-operation parallelism.

Parallel execution is designed to effectively use multiple CPUs and disks to ans wer queries quickly. When multiple users use parallel execution at the same time, it is easy to quickly exhaust available CPU, memory , and disk resources.

Oracle Database provides several ways to manage resource utilization in conjunction with parallel execut ion environments, including:

  • The adaptive multiuser algorithm, which is enabled by default, reduces the degree of parallelism as the load on the system increases.

  • User resource limits and profiles, which allo w you to set limits on the amount of various system resources available to each user as part of a user's security domain.

  • The Database Resource Manager, which lets you allocate resources to different groups of users.

The Parallel Execution Server Pool

When an instance starts up, Oracle creat es a pool of parallel execution servers which are available for any parallel operation. The initialization parameter PARAL LEL_MIN_SERVERS specifies the number of parallel execution servers that Oracle Database creates at instance startup.

Wh en executing a parallel operation, the parallel execution coordinator obtains parallel execution servers from the pool and assigns th em to the operation. If necessary, Oracle can create additional parallel execution servers for the operation. These parallel exe cution servers remain with the operation throughout job execution, then become available for other operations. After the statement ha s been processed completely, the parallel execution servers return to the pool.

Note that the parallel execution coordinator a nd the parallel execution servers can only service one statement at a time. A parallel execution coordinator cannot coordinate, for e xample, a parallel query and a parallel DML statement at the same time.

When a user issues a SQL statement, the optimizer deci des whether to execute the operations in parallel and determines the degree of parallelism (DOP) for each operation. You can specify the number of parallel execution servers required for an operation in various ways.

If the optimizer targets the statement for parallel processing, the following sequence of events takes place:

  1. The SQL statement's foregroun d process becomes a parallel execution coordinator.

  2. The parallel execution coordinator obtains as many parallel exec ution servers as needed (determined by the DOP) from the server pool or creates new parallel execution servers as needed.

  3. Oracle executes the statement as a sequence of operations. Each operation is performed in parallel, if possible.

  4. When statement processing is completed, the coordinator returns any resulting data to the user process that issued the statement a nd returns the parallel execution servers to the server pool.

The parallel execution coordinator calls upon the parallel execution servers during the execution of the SQL statement, not during the parsing of the statement. Therefore, when parallel execution is used with the shared server, the server process that processes the EXECUTE call of a user's statement becomes the parallel execution coordinator for the statem ent. See "Setting the Degree of Parallelism for Parallel Execution" for more information.

Variations in the Number of Parallel Execution Servers

If the number of parallel operations processed concurrently by an instance changes sign ificantly, Oracle automatically changes the number of parallel execution servers in the pool.

If the number of parallel operations increases, Oracle creates additional parallel execution servers to handle incoming requests. However, Oracle never creates more parallel execution servers for an instance than the value specified by the initialization parameter PARALLEL_MAX_SERVERS.

< a id="sthref1898" name="sthref1898">If the number of parallel operations decreases, Oracle terminates any parallel execution serv ers that have been idle for a threshold period of time. Oracle does not reduce the size of the pool less than the value of PARA LLEL_MIN_SERVERS, no matter how long the parallel execution servers have been idle.

Processing Without Enough Parallel Execution Servers

Oracle can process a parallel operation with fewer than the requested number of pro cesses.

If all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has be en started, the parallel execution coordinator switches to serial processing.

See "Minimum Number of Paral lel Execution Servers" for information about using the initialization parameter PARALLEL_MIN_PERCENT and "Tuning General Parameters for Parallel Execution" for information about the PARALLEL_MIN_PERCENT and PARALLEL_MAX_SERVERS initialization parameters.

How Parallel Execution Servers Communicate

To execute a query in parallel, Oracle generally creates a producer queue server and a consumer se rver. The producer queue server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and D DL on these rows. Each server in the producer execution process set has a connection to each server in the consumer set. This means t hat the number of virtual connections between parallel execution servers increases as the square of the DOP.

Each communicatio n channel has at least one, and sometimes up to four memory buffers. Multiple memory buffers facilitate asynchronous communication am ong the parallel execution servers.

A single-instance environment uses at most three buffers for each communication channel. A n Oracle Real Application Clusters environment uses at most four buffers for each channel. Figure 24-1 illust rates message buffers and how producer parallel execution servers connect to consumer parallel execution servers.

Figure 24-1 Parallel Execution Server Connections and Buffers

Descri
ption of dwhsg096.gif follows
Description of the illust ration dwhsg096.gif

When a connection is between two processes on the same instance , the servers communicate by passing the buffers back and forth. When the connection is between processes in different instances, the messages are sent using external high-speed network protocols. In Figure 24-1, the DOP is equal to the numbe r of parallel execution servers, which in this case is n. Figure 24-1 does not show the parallel exe cution coordinator. Each parallel execution server actually has an additional connection to the parallel execution coordinator.

Parallelizing SQL Statements

Each SQL statement undergoes an optimizat ion and parallelization process when it is parsed. When the data changes, if a more optimal execution or parallelization plan becomes available, Oracle can automatically adapt to the new situation.

After the optimizer determines the execution plan of a statem ent, the parallel execution coordinator determines the parallelization method for each operation in the plan. For example, the parall elization method might be to parallelize a full table scan by block range or parallelize an index range scan by partition. The coordi nator must decide whether an operation can be performed in parallel and, if so, how many parallel execution servers to enlist. The nu mber of parallel execution servers in one set is the DOP. See "Setting the Degree of Parallelism for Parallel Exe cution" for more information.

Dividing Work Among Parallel Execution Servers

T he parallel execution coordinator examines the redistribution requirements of each operation. An operation's redistribution requireme nt is the way in which the rows operated on by the operation must be divided or redistributed among the parallel execution servers.

After determining the redistribution requirement for each operation in the execution plan, the optimizer determines the order i n which the operations must be performed. With this information, the optimizer determines the data flow of the statement.

As a n example of parallel query with intra- and inter-operation parallelism, consider the following, more complex query:

SELECT /*+ PARALLEL(employees 4) PARALLEL(departments 4) 
       USE_HASH(employees) ORDERED */ MAX(salary), AVG(salary
)
FROM employees, departments
WHERE employees.department_id = departments.department_id 
GROUP BY employees.department_id;

< p>Note that hints have been used in the query to force the join order and join method, and to specify the DOP of the tables emp loyees and departments. In general, you should let the optimizer determine the order and method.

Figure 24-2 illustrates the data flow graph or query plan for this query.

Figure 24-2 Data Flow Diagram for Joining Tables

Description of cncpt019.gif follows
Description of the illustration cncpt019.gif

Parallelism Between Operations

< p>Given two sets of parallel execution servers SS1 and SS2 for the query plan illustrated in Figure 24-2, the execution will proceed as follows: each server set (SS1 and SS2) will have four execution processes because of the PARALLEL hint in the query that specifies the DOP. In other words, the DOP will be four because each set of parallel execution servers wi ll have four processes.

Slave set SS1 first scans the table employees while SS2 will fetch rows from SS1 and buil d a hash table on the rows. In other words, the parent servers in SS2 and the child servers in SS2 work concurrently: one in scanning employees in parallel, the other in consuming rows sent to it from SS1 and building the hash table for the hash join in parallel. This is an example of inter-operation parallelism.

After SS1 has finished scanning the entire table employees (that is, all granules or task units for employees are exhausted), it scans the table departments i n parallel. It sends its rows to servers in SS2, which then perform the probes to finish the hash-join in parallel. After SS1 is done scanning the table departments in parallel and sending the rows to SS2, it switches to performing the GROUP BY in parallel. This is how two server sets run concurrently to achieve inter-operation parallelism across various ope rators in the query tree while achieving intra-operation parallelism in executing each operation in parallel.

Another importan t aspect of parallel execution is the re-partitioning of rows while they are sent from servers in one server set to another. For the query plan in Figure 24-2, after a server process in SS1 scans a row of employees, which server process of SS2 should it send it to? The partitioning of rows flowing up the query tree is decided by the operator into which the row s are flowing into. In this case, the partitioning of rows flowing up from SS1 performing the parallel scan of employees into SS2 performing the parallel hash-join is done by hash partitioning on the join column value. That is, a server process scanning employees computes a hash function of the value of the column employees.employee_id to decide the number o f the server process in SS2 to send it to. The partitioning method used in parallel queries is explicitly shown in the EXPLAIN< /code> PLAN of the query. Note that the partitioning of rows being sent between sets of execution servers should not be confused with Oracle's partitioning feature whereby tables can be partitioned using hash, range, and other methods.

Producer Operations

Operations that require the output of other operations are known as consumer operations. In Figure 24-2, the GROUP BY SORT operation is the producer of the HASH JOIN operation because GROUP BY SORT requires the HASH JOIN output.

Producer operations can begin consuming rows as soon as the producer operations have produced rows. In the previous example, while the parallel execution servers are producing rows in the FULL SCAN dep artments operation, another set of parallel execution servers can begin to perform the HASH JOIN ope ration to consume the rows.

Each of the two operations performed concurrently is given its own set of parallel execution servers. Therefore, both query operations and the data flow tree i tself have parallelism. The parallelism of an individual operation is called intraoperation parallelism and the parallelism between o perations in a data flow tree is called interoperation parallelism. Due to the producer-consumer nature of the Oracle server's operat ions, only two operations in a given tree need to be performed simultaneously to minimize execution time. To illustrate intraoperatio n and interoperation parallelism, consider the following statement:

SELECT * FROM employees ORDER BY la
st_name;

The execution plan implements a full scan of the employees table. This operation is followed by a s orting of the retrieved rows, based on the value of the last_name column. For the sake of this example, assume the last_name column is not indexed. Also assume that the DOP for the query is set to 4, which means that four parallel execution servers can be active for any given operation.

Figure 24-3 illustrates the parallel execution of the example query.

Figure 24-3 Interoperation Parallelism and Dynamic Partitioning

< img src="./dwhsg090.gif" alt="Description of dwhsg090.gif follows" />
Description of the illustration dwhsg090.gif

As you can see from Figure 24-3, there are actually eight parallel execution servers involved in the query even though the DOP is 4. Thi s is because a parent and child operator can be performed at the same time (interoperation parallelism).

Also note that all of the parallel execution servers involved in the scan operation send rows to the appropriate parallel execution server performing the SORT operation. If a row scanned by a parallel execution server contains a value for the last_name column b etween A and G, that row gets sent to the first ORDER BY parallel execution serve r. When the scan operation is complete, the sorting processes can return the sorted results to the coordinator, which, in turn, retur ns the complete query results to the user.

Parallel Queries on Index-Organized Tables

The following parallel scan methods are supported on index-organized tables:

  • Parallel fast f ull scan of a nonpartitioned index-organized table

  • Parallel fast full scan of a partitioned index-organi zed table

  • Parallel index range scan of a partitioned index-organized table

These scan m ethods can be used for index-organized tables with overflow areas and for index-organized tables that contain LOBs.

Nonpartitioned Index-Organized Tables

Parallel query on a nonpartitioned index-organized table uses parallel fa st full scan. The DOP is determined, in decreasing order of priority, by:

  1. A PARALLEL hint (if present)

  2. An ALTER SESSION FORCE PARALLEL QUERY statement

  3. The parallel degree associated with the table, if the parallel degree is specified in the CREA TE TABLE or ALTER TABLE statement

The allocation of work is done by di viding the index segment into a sufficiently large number of block ranges and then assigning the block ranges to parallel execution s ervers in a demand-driven manner. The overflow blocks corresponding to any row are accessed in a demand-driven manner only by the pro cess which owns that row.

Partitioned Index-Organized Tables

Both index range scan and fast fu ll scan can be performed in parallel. For parallel fast full scan, parallelization is exactly the same as for nonpartitioned index-or ganized tables. For parallel index range scan on partitioned index-organized tables, the DOP is the minimum of the degree picked up f rom the previous priority list (like in parallel fast full scan) and the number of partitions in the index-organized table. Depending on the DOP, each parallel execution server gets one or more partitions (assigned in a demand-driven manner), each of which contains the primary key index segment and the associated overflow segment, if any.

Parallel Queries on Object Types

Parallel queries can be performed on object type tables and tables containing object type columns. Parallel query for object types supports all of the features that are a vailable for sequential queries on object types, including:

  • Methods on object types

  • Attribute access of object types

  • Constructors to create object type instances

  • Object views

  • PL/SQL and OCI queries for object types

There are no limitatio ns on the size of the object types for parallel queries.

The following restrictions apply to using parallel query for object types.

  • A MAP function is needed to parallelize queries involving joins and sorts (through ORDER BY, GROUP BY, or set operations). In the absence of a MAP function, the query will automatically be executed serial ly.

  • Parallel DML and p arallel DDL are not supported with object types. DML and DDL statements are always performed serially.

In all cases where the query cannot execute in parallel because of any of these restrictions, the whole query executes serially without giving an error message.

Parallel DDL

This section includes the following topics on parallelism for DDL statements:

DDL State ments That Can Be Parallelized

You can parallelize DDL statements for tables and indexes that are nonpartitioned or pa rtitioned. Table 24-3 summarizes the operations that can be parallelized in DDL statements.

The parall el DDL statements for nonpartitioned tables and indexes are:

  • CREATE INDEX

    < /li>
  • CREATE TABLE ... AS SELECT

  • ALTER INDEX ... REBUILD

The parallel DDL statements for partitioned tables and indexes are:

  • C REATE INDEX

  • CREATE TABLE ... AS SELECT< /code>

  • ALTER TABLE ... [MOVE|SPLIT|COALESCE] PARTITION

  • ALTER INDEX ... [REBUILD|SPLIT] PARTITION

    • This statement can be executed in parallel only if the (global) index partition being split is usable.

All of these DDL operations can be performed in no-logging mode for either parallel or serial execution.

CREATE TABLE for an index-organized t able can be parallelized either with or without an AS SELECT clause.

Different parallelism is used f or different operations (see Table 24-3). Parallel CREATE TABLE ... AS SELECT statements on partitioned tables and parallel CREATE INDEX statements on partitioned i ndexes execute with a DOP equal to the number of partitions.

Partition parallel analyze table is made less necessary by the ANALYZE {TABLE, INDEX} PARTITION statements, since parallel analyze of an entire partitioned tab le can be constructed with multiple user sessions.

Par allel DDL cannot occur on tables with object columns. Parallel DDL cannot occur on non-partitioned tables with LOB colum ns.

< /a>

CREATE TABLE ... AS SELECT in Parallel

For performa nce reasons, < /a>decision support applications often require large amounts of data to be summarized or rolled up int o smaller tables for use with ad hoc, decision support queries. Rollup occurs regularly (such as nightly or weekly) during a short pe riod of system inactively.

Parallel execution lets you parallelize the query and crea te operations of creating a table as a subquery from another table or set of tables.

Clustered tables cannot be created and po pulated in parallel.

Figure 24-4 illustrates creating a table from a subquery in parallel.

Figure 24-4 Creating a Summary Table in Parallel

Description
 of dwhsg088.gif follows
Description of the illustratio n dwhsg088.gif

Recoverabil ity and Parallel DDL

When summary ta ble data is derived from other tables' data, recoverability from media failure for the smaller summary table may not be important and can be turned off during creation of the summary table.

If you disable logging during parallel table creation (or any other p arallel DDL operation), you should back up the tablespace containing the table once the table is created to avoid loss of the table d ue to media failure.

Use the NOLOGGING clause of the CREATE TABLE, CREATE INDEX, ALTER TABLE, and ALTER INDEX statements to disable undo and r edo log generation.

Space Management for Parallel DDL

< a id="sthref1965" name="sthref1965">Creating a table or index in parallel has space management implications that affect both the storage space required during a parallel operation and the free space available after a table or index has been created.

Storage Space When Using Dictionary-Managed Tablespaces

When creating a table or index in parallel, each parallel ex ecution server uses the values in the STORAGE clause of the CREATE statement to create temporary segments t o store the rows. Therefore, a table created with a NEXT setting of 5 MB and a PARALLEL DEGREE of 12 consumes at least 60 megabytes (MB) of storage during table creation because each process starts with an extent of 5 MB. When the parallel execution coordinator combines the segments, some of the segments may be trimmed, and the resulting table may be smaller than the requested 60 MB.

Free Space and Parallel DDL

When you create indexes and tables in parallel, each parallel execution server allocates a new extent and fills the extent with the table or index data. Thus, if you create an index with a DOP o f 3, the index will have at least three extents initially. Allocation of extents is the same for rebuilding indexes in parallel and f or moving, splitting, or rebuilding partitions in parallel.

Serial operations require the schema object to have at least one e xtent. Parallel creations require that tables or indexes have at least as many extents as there are parallel execution servers creati ng the schema object.

When you create a table or index in parallel, it is possible to create pockets of free space—either externa l or internal fragmentation. This occurs when the temporary segments used by the parallel execution servers are larger than what is n eeded to store the rows.

For example, if you specify a DOP of 3 for a CREATE TABLE ... AS SELECT statement, but there is only one datafile in the tablespace, then internal fragmentation may occur, as shown in Figure 24-5. The pockets of free space within the internal table extents of a datafile cannot be coalesced with oth er free space and cannot be allocated as extents.

See Ora cle Database Performance Tuning Guide for more information about creating tables and indexes in parallel.

Figure 24-5 Unusable Free Space (Internal Fragmentation)

Descriptio
n of dwhsg091.gif follows
Description of the illustrati on dwhsg091.gif

Parallel DML

Par allel DML (PARALLEL INSERT, UPDATE, DELETE, and MERGE) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes.


Note:

Although DML generally includes queries, i n this chapter the term DML refers only to inserts, updates, merges, and deletes.

This section discusses the following parallel DML topics:

Advantages of Parallel DML over Manual Parallelism

You can parallelize DML operations manually by issuing multiple DML statements simultaneously again st different sets of data. For example, you can parallelize manually by:

  • Issuing multiple INSERT statements to multiple instances of an Oracle Real Application Clusters to make use of free space from multiple free list block s.

  • Issuing multiple UPDATE and DELETE statements with different key value rang es or rowid ranges.

However, manual parallelism has the following disadvantages:

  • It is difficult to use. You have to open multiple sessions (possibly on different instances) and issue multiple statements.

  • There is a lack of transactional properties. The DML statements are issued at different times; and, as a result, the ch anges are done with inconsistent snapshots of the database. To get atomicity, the commit or rollback of the various statements must b e coordinated manually (maybe across instances).

  • The work division is complex. You may have to query the table in order to find out the rowid or key value ranges to correctly divide the work.

  • The calculation is complex. The calculation of the degree of parallelism can be complex.

  • There is a lack of affinity and resource information. You need to know affinity information to issue the right DML statement at the right instance when running an O racle Real Application Clusters. You also have to find out about current resource usage to balance workload across instances.

Parallel DML removes these disadvantages by performing inserts, updates, and deletes in parallel automatically.

< !-- class="sect3" -->

When to Use Parallel DML

< a id="sthref1992" name="sthref1992">Parallel DML operations are mainly used to speed up large DML operations against large database objects . Parallel DML is useful in a DSS environment where the performance and scalability of accessing large objects are important. Paralle l DML complements parallel query in providing you with both querying and updating capabilities for your DSS databases.

The overhead of setting up parallelism makes parallel DML operations infeasible for short OLTP tran sactions. However, parallel DML operations can speed up batch jobs running in an OLTP database.

Some of the scenarios where pa rallel DML is used include:

Refreshing Tables in a Data Warehouse System

In a data warehouse system, large tables need to be refreshed (updated) periodically with new or modified data from the productio n system. You can do this efficiently by using parallel DML combined with updatable join views. You can also use the MERGE statement.

The data that needs to be refreshed is generally loaded into a temporary table before starting the refresh proce ss. This table contains either new rows or rows that have been updated since the last refresh of the data warehouse. You can use an u pdatable join view with parallel UPDATE to refresh the updated rows, and you can use an anti-hash join with parallel INSERT to refresh the new rows.

Creating Intermediate Summar y Tables

In a DSS environment, many applications require complex computations that involve constructing and manipulati ng many large intermediate summary tables. These summary tables are often temporary and frequently do not need to be logged. Parallel DML can speed up the operations against these large intermediate tables. One benefit is that you can put incremental results in the intermediate tables and perform parallel update.

In addition, the summary tables may contain cumulative or comparison informat ion which has to persist beyond application sessions; thus, temporary tables are not feasible. Parallel DML operations can speed up t he changes to these large summary tables.

Using Scoring Tables< /h5>

Many DSS applications score customers periodically based on a set of criter ia. The scores are usually stored in large DSS tables. The score information is then used in making a decision, for example, inclusio n in a mailing list.

This scoring activity queries and updates a large number of rows in the large table. Parallel DML can spe ed up the operations against these large tables.

Updating Historical T ables

Historical tables describe the business transactions of an enterprise over a recent time interval. Periodically, the DBA deletes the set of oldest rows and inserts a set of new rows into the table. Parallel INSERT ... SELECT and parallel DELETE operations c an speed up this rollover task.

Although you can also use parallel direct loader (SQL*Loader) to insert bulk data from an exte rnal source, parallel INSERT ... SELECT is faster for inserting data that already exists in another table i n the database.

Dropping a partition can also be used to delete old rows. However, to do this, the table has to be partitioned by date and with the appropriate time interval.

Running Batch Jobs

Batch jobs executed in an OLTP databas e during off hours have a fixed time window in which the jobs must complete. A good way to ensure timely job completion is to paralle lize their operations. As the work load increases, more machine resources can be added; the scaleup property of parallel operations e nsures that the time constraint can be met.

Enabling Parallel DML

A DML statement can be parallelized only if you have explicitly enabled parallel DML in the session wi th the ENABLE PARALLEL DML clause of the ALTER SESSION statement. Th is mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements.

The default mode of a session is DISABLE PARALLEL DML. When parallel DML is disabled, no DML will be executed in parallel even if the PARALLEL hint is used.

When parallel DML is enabled in a session, all DML sta tements in this session will be considered for parallel execution. However, even if parallel DML is enabled, the DML operation may st ill execute serially if there are no parallel hints or no tables with a parallel attribute or if restrictions on parallel operations are violated.

The session's PARALLEL DML mode does not influence the parallelism of SELECT statements, DDL statements, and the query portions of DML statements. Thus, if this mode is not set, the DML operation is not par allelized, but scans or join operations within the DML statement may still be parallelized.

Transaction Restrictions fo r Parallel DML

To execute a DML oper ation in parallel, the parallel execution coordinator acquires or spawns parallel execution servers, and each parallel execution serv er executes a portion of the work under its own parallel process transaction.

  • Each parallel execution se rver creates a different parallel process transaction.

  • If you use rollback segments instead of Automatic Undo Management, you may want to reduce contention on the rollback segments. To do this, only a few parallel process transactions sh ould reside in the same rollback segment. See Oracle Database SQL Reference for more information.

The coordinator also has its own coordinator transaction, which can have its own rollback segment. In order to ensure user-level transactional atomicity, the coordinator uses a t wo-phase commit protocol to commit the changes performed by the parallel process transactions.

A session that is enabled for p arallel DML may put transactions in the session in a special mode: If any DML statement in a transaction modifies a table in parallel , no subsequent serial or parallel query or DML statement can access the same table again in that transaction. This means that the re sults of parallel modifications cannot be seen during the transaction.

Serial or parallel statements that attempt to access a table that has already been modified in parallel within the same transaction are rejected with an error message.

If a PL/SQL p rocedure or block is executed in a parallel DML enabled session, then this rule applies to statements in the procedure or block.

< /div>

Rollback Segments

If you use rollback segments instead of A utomatic Undo Management, there are some restrictions when using parallel DML. See Oracle Database SQL Reference for information about restrictions for parallel DML and rollback segments.

Recovery for Parallel DML

< a id="sthref2019" name="sthref2019">The time required to roll back a parallel DML operat ion is roughly equal to the time it takes to perform the forward operation.

Oracle supports parallel rollback after transactio n and process failures, and after instance and system failures. Oracle can parallelize both the rolling forward stage and the rolling back stage of transaction recovery.

See Oracle Database Backup and Recovery Basics for details about parallel rollback.

Transaction Recovery for User-Issued Rollback

A us er-issued rollback in a transaction failure due to statement error is performed in parallel by the parallel execution coordinator and the parallel execution servers. The rollback takes approximately the same amount of time as the forward transaction.

Process Recovery

Recovery from the failure of a parallel execution c oordinator or parallel execution server is performed by the PMON process. If a parallel execution server or a parallel execution coor dinator fails, PMON rolls back the work from that process and all other processes in the transaction roll back their changes.

System Recovery

Recovery from a system failure requires a ne w startup. Recovery is performed by the SMON process and any recovery server processes spawned by SMON. Parallel DML statements may b e recovered using parallel rollback. If the initialization parameter COMPATIBLE is set to 8.1.3 or greater, Fast-Start O n-Demand Rollback enables terminated transactions to be recovered, on demand one block at a time.

Space Considerations for Parallel DML

Parallel UPDATE< /code> uses the space in the existing object, while direct-path INSERT gets new segments for the data.

Space usag e characteristics may be different in parallel than sequential execution because multiple concurrent child transactions modify the ob ject.

Lock and Enqueue Resources for Parallel DML

A parallel DML operation's lock and enqueue resource req uirements are very different from the serial DML requirements. Parallel DML holds many more locks, so you should increase the startin g value of the ENQUEUE_RESOURCES and DML_LOCKS parameters. See "DML_LOCKS" for more information.

Restrictions on Parallel DML

The following restrictions apply to parallel DML (including direct-path INSERT):

  • Intra-partition parallelism for UPDATE, MERGE, and DELETE operations require that the COMPATIBLE initialization parameter be set to 9.2 or greater.

  • INSERT, U PDATE, MERGE, and DELETE operations on nonpartitioned tables are not parallelized if there is a bitm ap index on the table. If the table is partitioned and there is a bitmap index on the table, the degree of parallelism will be restri cted to at most the number of partitions accessed.

  • A transaction can contain multiple parallel DML state ments that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement ( DML or query) can access the same table again in that transaction.

    • This restriction also exists after a serial direct-path INSERT statement: no subsequent SQL statement (DML or query) can access the modified table during t hat transaction.

    • Queries that access the same table are allowed before a parallel DML or direct-path < code>INSERT statement, but not after.

    • Any serial or parallel statements attempting to access a table that has already been modified by a parallel UPDATE, DELETE, or MERGE, or a direct-path INSERT during the same transaction are rejected with an error message.

  • Parallel DML operations cannot be done on tables with triggers.

  • Replication functionality is not supported for parallel DML.

  • Parallel DM L cannot occur in the presence of certain constraints: self-referential integrity, delete cascade, and deferred integrity. In additio n, for direct-path INSERT, there is no support for any referential integrity.

  • Parallel DML can be done on tables with object columns provided you are not touching the object columns.

  • < li type="disc">

    Parallel DML can be done on tables with LOB columns provided the table is partitioned. However, intra-partition parallelism is not supported.

  • A transaction involved in a para llel DML operation cannot be or become a distributed transaction.

  • Clustered tables are not supported.

Violations of these restrictions cause the statement to execute serially without warnings or error messages (except fo r the restriction on statements accessing the same table in a transaction, which can cause error messages). For example, an update is serialized if it is on a nonpartitioned table.

Partitioning Key Restriction

You can only update the partitioning key of a partitioned table to a new value if the update does not cause the row to move to a new partition. The update is possible if the table is defined with the row movement clause enabled.

Function Restrictions

The function restricti ons for parallel DML are the same as those for parallel DDL and parallel query. See "Parallel Execution of Functi ons" for more information.

Data Integrity Restrictions

This section describes the interactions of integrity con straints and parallel DML statements.

NOT NULL and CHECK

These types of integrity constraints are allowed. They are not a p roblem for parallel DML because they are enforced on the column and row level, respectively.

UNIQUE and PRIMA RY KEY

These types of integrity constraints are allowed.

FOREIGN KEY (Referential Integrity)

Restrictions for referential integrity occur whenever a DML operation on one table could cause a recursive DML operatio n on another table. These restrictions also apply when, in order to perform an integrity check, it is necessary to see simultaneously all changes made to the object being modified.

Table 24-1 lists all of the operations that are possib le on tables that are involved in referential integrity constraints.

Table 24-1 Referential Integrity Restrictions

DML Statement Issued on Parent Issued on Child Self-Referential
INSERT (Not applicable) Not parallelize d Not parallelized
MERGE (Not applicable) Not parallelized Not parallelized
UPDATE No Action Supported Supported Not parallel ized
DELETE No Action Supported Supported Not parallelized
DELETE Cascade Not parallelized (Not applicable) Not parallelized

Delete Cascade

Delete on tables having a foreign key with delete cascade is not parallelized because parallel execution servers will try to delete rows from multiple partitions (parent and child tables).

Self-Referential Integrity

DML on tables with self-referential integrity constraints is not paralleliz ed if the referenced keys (primary keys) are involved. For DML on all other columns, parallelism is possible.

Deferrable Integrity Constraints

If any deferrable constraints apply to the table being operated on, the DML operatio n will not be parallelized.

Trigger Restrictions

A DML operation will not be parallelized if the affected tables contain enabled triggers that may get fired as a result of the statement. This implies that DML statements on tables that are being replicated will not be parallelized.

Relevant triggers must be disabled in order to parallelize DML on the table. Note that, if you enable or disable triggers, the depend ent shared cursors are invalidated.

Distributed Transaction Restrictio ns

A DML operation cannot be parallelized if it is in a distributed transaction or if t he DML or the query operation is against a remote object.

Examples of Distributed Transaction Parallelization

This section contains several examples of distributed transaction processing.

Example 24-1 Distributed Transaction Paralleliz ation

In this example, the DML statement queries a remote object:

INSERT /*
 APPEND PARALLEL (t3,2) */ INTO t3 SELECT * FROM t4@dblink;

The query operation is executed serially without notification because it references a remote object.

Example 24-2 Distributed Transaction Parallelization

In this example, the DML operation is applied to a remote object:

DELETE /*+ PARALLEL (t1, 2
) */ FROM t1@dblink;

The DELETE operation is not parallelized because it references a remote object.

Example 24-3 Distributed Transaction Parallelization

In this example, the DML operation i s in a distributed transaction:

SELECT * FROM t1@dblink; 
DELETE /*+ PARALLEL (t2,2) */ FROM t2;
COMMIT
; 

The DELETE operation is not parallelized because it occurs in a distributed transaction (which is started by the SELECT statement).

Functions in Parallel DM L and DDL Statements

In a parallel DML or DDL statement, as in a parallel query, a user-written function may be execut ed in parallel if it has been declared with the PARALLEL_ENABLE keyword, if it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES that indicates all of RNDS, WNDS, RNPS, and WNPS, or if it is declared with CREATE FUNCTION and the system can analyze the body of th e PL/SQL code and determine that the code neither reads nor writes to the database or reads nor modifies package variables.

Fo r a parallel DML statement, any function call that cannot be executed in parallel causes the entire DML statement to be executed seri ally.

For an INSERT ... SELECT or CREATE TABLE ... AS S ELECT statement, function calls in the query portion are parallelized according to the parallel query rules in the prior parag raph. The query may be parallelized even if the remainder of the statement must execute serially, or vice versa.

Other Types of Parallelism

In addition to par allel SQL execution, Oracle can use parallelism for the following types of operations:

  • Parallel recovery

  • Parallel propagation (replication)

  • Parallel load (the SQL*Loader utility)< /p>

Like parallel SQL, parallel recovery and propagation are performed by a parallel execution coordinator and multiple parallel execution servers. Parallel load, however, uses a different mechanism.

The behavior of the parallel execution coordin ator and parallel execution servers may differ, depending on what kind of operation they perform (SQL, recovery, or propagation). For example, if all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been st arted:

  • In parallel SQL, the parallel execution coordinator switches to serial processing.

  • In parallel propagation, the parallel execution coordinator returns an error.

For a given session, th e parallel execution coordinator coordinates only one kind of operation. A parallel execution coordinator cannot coordinate, for exam ple, parallel SQL and parallel recovery or propagation at the same time.


See Also:


Initializing and Tuning Parameters for Parallel Execution

Parallel execution is enabled by default. The initial computed values of the parallel execution parameters should be acc eptable for the majority of installations. These parameters affect memory usage and the degree of parallelism used for parallel opera tions.

Oracle Database computes defaults for these parameters based on the value at database startup of CPU_COUNT and PARALLEL_THREADS_PER_CPU. The parameters can also be manually tuned, increasing or decreasing their values to suit specific system configurations or performance goals.

For example:

  • On systems where parallel execu tion will never be used, PARALLEL_MAX_SERVERS can be set to zero.

  • On large systems with abu ndant SGA memory, PARALLEL_EXECUTION_MESSAGE_SIZE can be increased to improve throughput.

You can also manually tune parallel execution parameters; however, Oracle recommends using default settings for parallel execution. Manual tuning of parallel execution is more complex than using default settings for two reasons: manual parallel execution tuning requires more att entive administration than automated tuning, and manual tuning is prone to user-load and system-resource miscalculations.

Init ializing and tuning parallel execution involves the following steps:

Using Default Parameter Settings

By de fault, Oracle automatically sets parallel execution parameters, as shown in Table 24-2. For most systems, you do not need to make further adjustments to have an adequately tuned parallel execution environment.

Table 24-2 Parameters and Their Defaults

Parameter Default Comments
PARALLEL_ADAPTIVE_MULTI_USER TRUE Causes parallel execution SQL to throttle DOP re quests to prevent system overload.
PAR ALLEL_MAX_SERVERS CPU_COUNT x PARALLEL_THREADS_PRE_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5 Use this limit to maximize the number of processes that parallel execution uses.
PARALLEL_EXECUTION_MESSAGE_SIZE 2 KB (port specifi c) Increase to 4k or 8k to improve parallel execution performance if sufficient SGA m emory exists.

Note that you can set some parameters in su ch a way that Oracle will be constrained. For example, if you set PROCESSES to 20, you will not be able to get 25 slaves .

Setting the Degree of Parallelism for Parallel Execution

The parallel execution coordinator may enlist two or more of the instance's parallel execution servers to process a SQL statement . The number of parallel execution servers associated with a single operation is known as the degree of parallelism.

The DOP is specified in the following ways:

  • At the statement level with hints and with the PARALLEL clause

  • At the session level by issuing the ALTER SESSION FORCE PARALLEL statement

  • At the table level in the table's definition

  • At the index level in the index's definition

The following example shows a statement that sets the DOP to 4 on a table:

ALTER TABLE order
s PARALLEL 4;

This next example sets the DOP on an index to 4:

ALTER INDEX iorders PARALLEL
4;

This last example sets a hint to 4 on a query:

SELECT /*+ PARALLEL(orders, 4) */ COUNT(*)
 FROM orders;

Note that the DOP applies directly only to intraoperation parallelism. If interoperation parallelism is poss ible, the total number of parallel execution servers for a statement can be twice the specified DOP. No more than two operations can be performed simultaneously.

Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quick ly. When multiple users employ parallel execution at the same time, available CPU, memory, and disk resources may be quickly exhauste d. Oracle provides several ways to deal with resource utilization in conjunction with parallel execution, including:

  • The adaptive multiuser algorithm, which reduces the DOP as the load on the system increases. By default, the adaptive mu ltiuser algorithm is enabled, which optimizes the performance of systems with concurrent parallel SQL execution operations.

  • User resource limits and profiles, which allow you to set limits on the amount of various system resources availa ble to each user as part of a user's security domain.

  • The Database Resource Manager, which enables you t o allocate resources to different groups of users.

How Oracl e Determines the Degree of Parallelism for Operations

The parallel execution coordinator determines the DOP by considering several specifications. The coordinator:

< ul>
  • Checks for hints or a PARALLEL clause specified in the SQL statement itself.

  • Checks for a session value set by the ALTER SESSION FORCE PARALLEL stat ement.

  • Looks at the table's or index's definition.

  • After a DOP is found in one of these specifications, it becomes the DOP for the operation.

    Hints, PARALLEL clauses, table or index definitions, and d efault values only determine the number of parallel execution servers that the coordinator requests for a given operation. The actual number of parallel execution servers used depends upon how many processes are available in the parallel execution server pool and wh ether interoperation parallelism is possible.

    Hints and Degree of Parallelism

    You can speci fy hints in a SQL statement to set the DOP for a table or index and for the caching behavior of the operation.

    • The PARALLEL hint is used only for operations on tables. You can use it to parallelize queries and DML statements (INSERT, UPDATE, M ERGE, and DELETE).

    • The PARALLEL_INDEX hint parallelizes an index range scan of a partitioned index. (In an index operation , the PARALLEL hint is not valid and is ignored.)

    See Oracle Database Performance Tuning Guide for information about using hints in SQL statements and the spe cific syntax for the PARALLEL, NO_PARALLEL, PARALLEL_INDEX, CACHE, and NOCA CHE hints.

    Table and Index Definitions

    You can specify the DOP within a table or index definition by using one of the following statements: CREATE TABLE, ALTER TABLE, < code>CREATE INDEX, or ALTER INDEX.

    Default Degree of Parallelism

    The default DOP is used when you ask to parallelize an operation but you do n ot specify a DOP in a hint or within the definition of a table or index. The default DOP is appropriate for most applications.

    The default DOP for a SQL statement is determined by the following factors:

    • The value of the parameter CPU_COUNT, which is, by default, the number of CPUs on the system, the number of RAC instances, and the value of the par ameter PARALLEL_THREADS_PER_CPU.

    • For parallelizing by partition, the number of partitions t hat will be accessed, based on partition pruning.

    • For parallel DML operations with global index maintena nce, the minimum number of transaction free lists among all the global indexes to be updated. The minimum number of transaction free lists for a partitioned global index is the minimum number across all index partitions. This is a requirement to prevent self-deadloc k.

    These factors determine the default number of parallel execution servers to use. However, the actual number of pr ocesses used is limited by their availability on the requested instances during run time. The initialization parameter PARALLEL _MAX_SERVERS sets an upper limit on the total number of parallel execution servers that an instance can have.

    If a mini mum fraction of the desired parallel execution servers is not available (specified by the initialization parameter PARALLEL_MIN _PERCENT), a user error is produced. You can retry the query when the system is less busy.

    Adaptive Multiuser Algorithm

    With the adaptive multiuser algorithm, the parallel execution coordinator varies the DOP according to the system load. Oracle determines the load by calculating the number of active Oracle Serve r processes. If the number of server processes currently allocated is larger than the optimal number of server processes, given the n umber of available CPUs, the algorithm reduces the DOP. This reduction improves total system throughput by avoiding overallocation of resources.

    Minimum Number of Parallel Execution Servers

    < p>Oracle can perform an operation in parallel as long as at least two parallel execution servers are available. If too few parallel e xecution servers are available, your SQL statement may execute slower than expected. You can specify the minimum percentage of reques ted parallel execution servers that must be available in order for the operation to execute. This strategy ensures that your SQL stat ement executes with a minimum acceptable parallel performance. If the minimum percentage of requested parallel execution servers is n ot available, the SQL statement does not execute and returns an error ora 12827.

    The initialization parameter PARALLEL_MIN_PERCENT specifies the desired minimum per centage of requested parallel execution servers. This parameter affects DML and DDL operations as well as queries.

    For example , if you specify 50 for this parameter, then at least 50 percent of the parallel execution servers requested for any parallel operati on must be available in order for the operation to succeed. If 20 parallel execution servers are requested, then at least 10 must be available or an error is returned to the user. If PARALLEL_MIN_PERCENT is set to null, then all parallel operations will proceed as long as at least two parallel execution servers are available for processing.

    Limiting the Number of Available Instances

    In Oracle Real Applica tion Clusters, instance groups can be used to limit the number of instances that participate in a parallel operation. You can create any number of instance groups, each consisting of one or more instances. You can then specify which instance group is to be used for any or all parallel operations. Parallel execution servers will only be used on instances which are members of the specified instance group. See Oracle Real Application Clusters Administrator's Guide< /a> and Oracle Real Application Clusters Deployment and Perfor mance Guide for more information about instance groups.

    Balancing the Workload

    To optimize performance, all parallel execution servers should have equal workloads. For SQL statements parallel ized by block range or by parallel execution servers, the workload is dynamically divided among the parallel execution servers. This minimizes workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processe s.

    For the relatively few SQL statements parallelized by partitions, if the workload is evenly distributed among the partition s, you can optimize performance by matching the number of parallel execution servers to the number of partitions or by choosing a DOP in which the number of partitions is a multiple of the number of processes. This applies to partition-wise joins and PDML on tables created before Oracle9i. See "Limitation on the Degree of Parallelism" for details regarding this to pic.

    For example, suppose a table has 10 partition, and a parallel operation divides the work evenly among them. You can use 1 0 parallel execution servers (DOP equals 10) to do the work in approximately one-tenth the time that one process would take. You migh t also use five processes to do the work in one-fifth the time, or two processes to do the work in one-half the time.

    If, howe ver, you use nine processes to work on 10 partitions, the first process to finish its work on one partition then begins work on the 1 0th partition; and as the other processes finish their work, they become idle. This configuration does not provide good performance w hen the work is evenly divided among partitions. When the work is unevenly divided, the performance varies depending on whether the p artition that is left for last has more or less work than the other partitions.

    Similarly, suppose you use four processes to w ork on 10 partitions and the work is evenly divided. In this case, each process works on a second partition after finishing its first partition, but only two of the processes work on a third partition while the other two remain idle.

    In general, you cannot as sume that the time taken to perform a parallel operation on a given number of partitions (N) with a given number of parallel executio n servers (P) will be N/P. This formula does not take into account the possibility that some processes might have to wait while other s finish working on the last partitions. By choosing an appropriate DOP, however, you can minimize the workload skew and optimize per formance.

    Parallelization Rules for SQL Statements

    A SQL statement can be p arallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized by using the PARALLEL clause. However, not all of these methods apply to all types of SQL statements.

    Parallelization has two components: the de cision to parallelize and the DOP. These components are determined differently for queries, DDL operations, and DML operations.

    < p>To determine the DOP, Oracle looks at the referen ce objects:

    • Parallel query looks at each table and index, in the portion of the query being parallelized , to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP.

    • For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference objec t that determines the DOP is the table being modified by an insert, update, or delete operation. Parallel DML also adds some limits t o the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the subquery's DOP is the same as the DML operation .

    • For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the subquery's DOP is the same as the DDL oper ation.

    Rules for Parallelizing Queries

    This section discusses some rules f or parallelizing queries.

    Decision to Parallelize

    A SELECT statement ca n be parallelized only if the following conditions are satisfied:

    • The query includes a parallel hint spe cification (PARALLEL or PARALLEL_INDEX) or the schema objects referred to in the query have a PARALLE L declaration associated with them.

    • At least one of the tables specified in the query requir es one of the following:

      • A full table scan

      • An index range scan spanning mul tiple partitions

    • No scalar subqueries are in the SELECT list

    Degree of Parallelism

    The DOP for a query is determined by the following rules:

    • The query uses the maximum DOP taken from all of the table declarations involved in the query and all of the potential indexes t hat are candidates to satisfy the query (the reference objects). That is, the table or index that has the greatest DOP determines the query's DOP (maximum query directive).

    • If a table has both a parallel hint specification in the query a nd a parallel declaration in its table specification, the hint specification takes precedence over parallel declaration specification . See Table 24-3 for precedence rules.

    Degree of Parallelism

    The DOP is determined by the same rules as for the queries. Note that in the case of UPDATE and DELETE< /code> operations, only the target table to be modified (the only reference object) is involved. Thus, the UPDATE or DELETE parallel hint specification takes precedence over the parallel declaration specification of the target table. In oth er words, the precedence order is: MERGE, UPDATE, DELETE hint > Session > Parallel decla ration specification of target table. See Table 24-3 for precedence rules.

    A parallel execution server can update or merge into, or delete from multiple partitions, but each partition can only be updated or deleted by one parallel exec ution server.

    If the DOP is less than the nu mber of partitions, then the first process to finish work on one partition continues working on another partition, and so on until th e work is finished on all partitions. If the DOP is greater than the number of partitions involved in the operation, then the excess parallel execution servers will have no work to do.

    Example 24-4 Parallelization: Example 1

    UPDATE tbl_1 SET c1=c1+1 WHERE c1>100;
    
    

    If tbl_1 is a partitioned table and its table definition has a p arallel clause, then the update operation is parallelized even if the scan on the table is serial (such as an index scan), assuming t hat the table has more than one partition with c1 greater than 100.

    Example 24-5 Parallelization: Example 2

    UPDATE /*+ PARALLEL(tbl_2,4) */ tbl_2 SET c1=c1+1;
    
    

    Both the scan and update operations on tbl_2 will be parallelized with degree four.

    Rules for INSERT ... SELECT

    An INSERT ... SELECT statement parallelizes its INSERT and SELECT operations independently, except for the DOP.

    You can specify a parallel hint after the INSERT k eyword in an INSERT ... SELECT statement. Because the tables being queried are usually not the same as the table being inserted into, the hint enables you to specify parallel directives specifically for the insert operation.

    You have the following ways to specify parallel directives for an INSERT ... SELECT statement (assuming that PARALLEL DML mode is enabled):

    • SELECT parallel hints specified at the s tatement

    • Parallel clauses specified in the definition of tables being selected

    • INSERT parallel hint specified at the statement

    • Parallel clause specified in the defini tion of tables being inserted into

    You can use the ALTER SESSION FORCE PARALLEL DML statement to override parallel clauses for subsequent INSERT operations in a session. Parallel hints in insert operations override the ALTER SESSION FORCE PARALLEL DML statement.

    Decision to Parallelize

    The following rule determines whet her the INSERT operation should be parallelized in an INSERT ... SELECT statement:

    The INSERT operation will be parallelized if and only if at least one of the following is true:

    • The PARALLEL hint is specified after the INSERT in the DML statement.

    • The tab le being inserted into (the reference object) has a PARALLEL declaration specification.

    • An ALTER SESSION FORCE PARALLEL DML statement has been issued previously during the session.

    The decision to parallelize the INSERT o peration is made independently of the SELECT operation, and vice versa.

    Degree of Parallelism

    Once the decision to parallelize the SELECT or INSERT operation is made, one parallel directiv e is picked for deciding the DOP of the whole statement, using the following precedence rule Insert hint directive > Session> P arallel declaration specification of the inserting table > Maximum query directive.

    In this context, maximum query directiv e means that among multiple tables and indexes, the table or index that has the maximum DOP determines the parallelism for the query operation.

    The chosen parallel directive is applied to both the SELECT and INSERT operations.

    Example 24-6 P arallelization: Example 3

    The DOP used is 2, as specified in the INSERT hint:

    INSERT /*+ PARALLEL(tbl_ins,2) */ INTO tbl_ins
    SELECT /*+ PARALLEL(tbl_sel,4) */ * FROM tbl_sel;
    

    Rules for DDL Statements

    You need to keep the fo llowing in mind when parallelizing DDL statements.

    Decision to Parallelize

    DDL operations can be parallelized if a PA RALLEL clause (declaration) is specified in the syntax. In the case of CREATE INDEX and ALTER< /code> INDEX ... REBUILD or ALTER INDEX ... REBUILD PARTITION< /code>, the parallel declaration is stored in the data dictionary.

    You can use the ALTER SESSION FORCE PARALLEL DDL statement to override the parallel cl auses of subsequent DDL statements in a session.

    Degree of Parallelism

    The DOP is determined b y the specification in the PARALLEL clause, unless it is overridden by an ALTER SESSION FORCE PARALLEL DDL statement. A rebuild of a partitioned index is never parallelized.

    Paralle l clauses in CREATE TABLE and ALTER TABLE statements specify table parallelism. I f a parallel clause exists in a table definition, it determines the parallelism of DDL statements as well as queries. If the DDL stat ement contains explicit parallel hints for a table, however, those hints override the effect of parallel clauses for that table. You can use the ALTER SESSION FORCE PARALLEL DDL statement to override p arallel clauses.

    Rules for [CREATE | REBUILD] INDEX or [MOVE | SPLIT] PARTITION

    The following rules apply:

    Parallel CREATE INDEX or ALTER INDEX ... REBUILD

    The CREATE INDEX and ALTER INDEX ... REBUILD statements can be paral lelized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

    ALTER INDEX ... REBUILD can be parallelized only for a non partitioned index, but ALTER INDEX ... REBUILD PARTITION can be parallelized by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

    The scan operation for ALTER INDEX ... REBUILD (nonpartitioned), ALTE R INDEX ... REBUILD PARTITION, and CREATE INDEX has the same parallelism as the REBUILD or CREATE operation and uses the same DOP. If the DOP is not specified for REBUILD or CREATE, the default is the number of CPUs.

    Parallel MOVE PARTITION or SPLIT P ARTITION

    The ALTERINDEX ... MOVE PARTITION and ALTERINDEX ...SPLIT PARTITION statements can be parallel ized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement. Their scan operations have the same parallelism as the corresponding MOVE or SPLIT operations. If the DOP is not specified, the default is the number of CPUs.

    R ules for CREATE TABLE AS SELECT

    The CREATE TABLE .. . AS SELECT statement contains two parts: a CREATE part (DDL) and a SELECT part ( query). Oracle can parallelize both parts of the statement. The CREATE part fol lows the same rules as other DDL operations.

    Decision to Parallelize (Query Part)

    The query part of a CREATE < code>TABLE ... AS SELECT statement can be parallelized only if the following conditions are satisfie d:

    • The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX) or the CREATE part of the statement has a PARALLEL clause specification or the schema objects referred t o in the query have a PARALLEL declaration associated with them.

    • At least one of the t ables specified in the query requires one of the following: a full table scan or an index range scan spanning multiple partitions.

    Degree of Parallelism (Query Part)

    The DOP for the query part of a CREATE TABLE ... AS SELECT statement is determined by one of the following rules:

    • The query part uses the values specified in the PARALLEL clause of the CREATE part.

    • If the PARALLEL clause is not specified, the default DOP is the number of CPUs.

    • If the CREATE is serial, then the DOP is determined by the query.

    Note that any values specified in a hint for parallelism are ignored.

    Decision to Parallelize (CREATE Part)

    The CREATE operation of CREATE TABLE ... AS SELECT< /code> can be parallelized only by a PARALLEL clause or an ALTER SESSION FORCE PARALLEL DDL statement.

    When the CREATE operation of CREATE TABLE ... AS SELECT is parallelized, Oracle also parallelizes the scan operation if possible. The scan operati on cannot be parallelized if, for example:

    • The SELECT clause has a NO_PARALLEL hint

    • The operation scans an index of a nonpartitioned table

    When the CREATE operation is not parallelized, the SELECT can be parallelized if it has a PARALLEL hint or if the select ed table (or partitioned index) has a parallel declaration.

    Degree of Parallelism (CREATE Part)
    < p>The DOP for the CREATE operation, and for the SELECT operation if it is parallelized, is specified by the PARALLEL clause of the CREATE statement, unless it is overridden by an ALTER SESSION FORCE PARALLEL DDL statement. If the PARALLEL clause does not specify the DO P, the default is the number of CPUs.

    Summary of Parallelization Rules

    Table 24-3 shows how various types of SQL statements can be parallelized and indicates which methods o f specifying parallelism take precedence.

    • The priority (1) specification overrides priority (2) and prio rity (3).

    • The priority (2) specification overrides priority (3).

    Table 24-3 Parallelization Rules

    < td align="left" headers="r6c1-t10 r1c2-t10 r2c2-t10">(1) PARALLEL of insert
    Pa rallel Operation Parallelized by Clause, Hint, or Underlying Table/Index Declaration (priority order: 1, 2, 3)
    PARALLEL Hint PARALLEL Clause ALTER SESSION Parallel Declaration
    Parallel query table scan (partitioned or nonpartitioned table) (1) PARALLEL
    (2) FORCE PARALLEL QUERY (3) of table
    Parallel q uery index range scan (partitioned index) (1) PARALLEL_INDEX< /td>
    (2) FORCE PARALLEL QUERY (2) of index
    Parallel UPDATE or DELETE (partitioned table only) (1) PARALLEL
    (2) FORCE PARALLEL DML (3) of table being updated or deleted from
    INSERT operation of parallel INSERT... SELECT (partitioned or nonpartitioned table)
    (2) F ORCE PARALLEL DML (3) of table being inserted into
    SELECT operation of INSERT ... SELECT when INSERT is parallel Takes degree from INSERT statement
    SELECT operation of INSERT ... SELECT whe n INSERT is serial (1) PARALLEL

    (2) of table being selected from
    CREATE operation of parallel CREATE TABLE ... AS SELECT (partitioned or nonpartitioned table) (Note: Hint in select clause does not affect the create operation.) (2) (1) FORCE PARALLEL DDL
    SELECT operation of CREATE TABLE ... AS SELECT when CREATE is parallel Takes degree from CREATE statement
    SELECT operation of CREATE TABLE ... AS SELECT when CREATE is serial (1) PARALLEL or PARALLEL_INDEX

    (2) of querying tables or partitioned indexes
    Pa rallel CREATE INDEX (partitioned or nonpartitioned index)
    (2) (1) FORCE PARALLEL DDL
    Parallel REBUILD INDEX (nonpartitioned index)
    (2) (1) FORCE PARALLEL DDL
    REBUILD INDEX (partitioned index)—never parallelized



    Parallel REBUILD INDEX partition
    (2) (1) FORCE PARALLEL DDL
    Parallel MOVE or SPLIT par tition
    (2) (1) FORCE PARALLEL DDL

    < !-- class="tblformalwide" -->

    Degree of Parallelism and Adaptive Multiuser: How T hey Interact

    The DOP specifies the number of available processes, or threads, used in parallel operations. Each parallel thread can use one or two query processes, depending on the query's complexity.

    The adaptive multiuser feature adjusts the DOP based on user load. For example, you might have a table with a DOP of 5. This DOP might be acceptable with 10 users. However, if 10 more users enter the system and you enable the < code>PARALLEL_ADAPTIVE_MULTI_USER feature, Oracle reduces the DOP to spread resources more evenly according to the perceived O racle load.

    Once Oracle determines the DOP for a query, the DOP does not change for the duration of the query.

    It is be st to use the parallel adaptive multiuser feature w hen users process simultaneous parallel execution operations. By default, PARALLEL_ADAPTIVE_MULTI_USER is set to T RUE, which optimizes the performance of systems with concurrent parallel SQL execution operations. If PARALLEL_ADAPTIVE_ MULTI_USER is set to FALSE, each parallel SQL execution operation receives the requested number of parallel execu tion server processes regardless of the impact to the performance of the system as long as sufficient resources have been configured.

    How the Adaptive Multiuser Algorithm Works

    The adaptive multiuser algorithm has s everal inputs. The algorithm first considers the number of active Oracle Server processes as calculated by Oracle. The algorithm then considers the default settings for parallelism as set in the initialization parameter file, as well as parallelism options used in < code>CREATE TABLE and ALTER TABLE statements and SQL hints.

    When a system is ove rloaded and the input DOP is larger than the default DOP, the algorithm uses the default degree as input. The system then calculates a reduction factor that it applies to the input DOP. For example, using a 16-CPU system, when the first user enters the system and it is idle, it will be granted a DOP of 32. The next user will be give a DOP of eight, the next four, and so on. If the system settles into a steady state of eight users issuing queries, all the users will eventually be given a DOP of 4, thus dividing the system evenl y among all the parallel users.

    Forcing Parallel Execution for a Session

    If you are sure you want to execute in parallel and want to avoid setting the DOP for a table or modifying the queries involved, you can force parallelism with the following statement:

    ALT
    ER SESSION FORCE PARALLEL QUERY;
    
    

    All subsequent queries will be executed in parallel provided no restrictions are violate d. You can also force DML and DDL statements. This clause overrides any parallel clause specified in subsequent statements in the ses sion, but is overridden by a parallel hint.

    In typical OLTP environments, for example, the tables are not set parallel, but ni ghtly batch scripts may want to collect data from these tables in parallel. By setting the DOP in the session, the user avoids alteri ng each table in parallel and then altering it back to serial when finished.

    Controlling Performance with the Degree of Parallelism

    The initialization parameter PARALLEL_THREADS_PER_CPU affects algorithms controlling both the DOP and the adaptive multiuser feature. O racle multiplies the value of PARALLEL_THREADS_PER_CPU by the number of CPUs for each instance to derive the number of t hreads to use in parallel operations.

    The adaptive multiuser feature also uses the default DOP to compute the target number of query server processes that should exist in a system. When a system is running more processes than the target number, the adaptive a lgorithm reduces the DOP of new queries as required. Therefore, you can also use PARALLEL_THREADS_PER_CPU to control the adaptive algorithm.

    PARALLEL_THREADS_PER_CPU enables you to adjust for hardware configurations with I/O subsyste ms that are slow relative to the CPU speed and for application workloads that perform few computations relative to the amount of data involved. If the system is neither CPU-bound nor I/O-bound, then the PARALLEL_THREADS_PER_CPU value should be increased . This increases the default DOP and allow better utilization of hardware resources. The default for PARALLEL_THREADS_PER_CPU on most platforms is 2. However, the default for machines with relatively slow I/O subsystems can be as high as eight.

    Tuning General Parameters for Parallel Execution

    This section discusses the following topics:

    Par ameters Establishing Resource Limits for Parallel Operations

    The parameters that establish resource limits are:

  • PARALLEL_MAX_SERVERS

  • PARALLEL_MIN_SER VERS

  • SHARED_POOL_SIZE

  • PARAL LEL_MIN_PERCENT

  • PARALLEL_MAX_SERVERS

    The PARALLEL_MAX_SEVERS parameter sets a resource limit on the maximum number of processes available for parallel execution. Most parallel operations need at most twice the number of query server processe s as the maximum DOP attributed to any table in the operation.

    Oracle sets PARALLEL_MAX_SERVERS to a default valu e that is sufficient for most systems. The default value for PARALLEL_MAX_SERVERS is as follows:

    (CPU_COUNT x PARALLEL_THREADS_PER_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5)
    
    

    This might not be eno ugh for parallel queries on tables with higher DOP attributes. We recommend users who expects to run queries of higher DOP to set PARALLEL_MAX_SERVERS as follows:

    2 x DOP x NUMBER_OF_CONCURRENT_USERS
    
    

    For example , setting PARALLEL_MAX_SERVERS to 64 will allow you to run four parallel queries simultaneously, assuming that each quer y is using two slave sets with a DOP of eight for each set.

    If the hardware system is neither CPU bound nor I/O bound, then yo u can increase the number of concurrent parallel execution users on the system by adding more query server processes. When the system becomes CPU- or I/O-bound, however, adding more concurrent users becomes detrimental to the overall performance. Careful setting of PARALLEL_MAX_SERVERS is an effective method of restricting the number of concurrent parallel operations.

    If users initiate too many concurrent operations, Oracle might not have enough query server processes. In this case, Oracle executes the oper ations sequentially or displays an error if P ARALLEL_MIN_PERCENT is set to a value other than the default value of 0 (zero).

    This condition can be verified through the GV$SYSSTAT view by comparing the statistics for parallel operations not downgraded and parallel operations downgrade d to serial. For example:

    SELECT * FROM GV$SYSSTAT WHERE name LIKE 'Parallel operation%';
    
    When Users Have Too Many Processes

    When concurrent users have too many query server processe s, memory contention (paging), I/O contention, or excessive context switching can occur. This contention can reduce system throughput to a level lower than if parallel execution were not used. Increase the PARALLEL_MAX_SERVERS value only if the system has sufficient memory and I/O bandwidth for the resu lting load.

    You can use operating system performance monitoring tools to determine how much memory, swap space and I/O bandwid th are free. Look at the runq lengths for both your CPUs and disks, as well as the service time for I/Os on the system. Verify that t he machine has sufficient swap space exists on the machine to add more processes. Limiting the total number of query server processes might restrict the number of concurrent users who can execute parallel operations, but system throughput tends to remain stable.

    Increasing the Number of Concurrent Users

    To increase the number of concurrent users, you must restrict the resource usage of each individual user. You can ach ieve this by using the parallel adaptive multiuser feature or by using resource consumer groups. See Oracle Database Administrator's Guide and Oracle Database Concepts for more information about resource consumer groups and the Database Resour ce Manager.

    Limiting the Number of Resources for a User

    < /a>You can limit the amount of parallelism available to a given user by establishing a resou rce consumer group for the user. Do this to limit the number of sessions, concurrent logons, and the number of parallel processes tha t any one user or group of users can have.

    Each query server process working on a parallel execution statement is logged on wi th a session ID. Each process counts against the user's limit of concurrent sessions. For example, to limit a user to 10 parallel exe cution processes, set the user's limit to 11. One process is for the parallel coordinator and the other 10 consist of two sets of que ry server servers. This would allow one session for the parallel coordinator and 10 sessions for the parallel execution processes.

    See Oracle Database Administrator's Guide for more information about managing resources with user profiles and Oracle Real Application Clusters Administrator's Guide for more information on querying GV$ views.

    PARALL EL_MIN_SERVERS

    The recommended value for the PARALLEL_MIN_SERVERS parameter is 0 (zero), which is the def ault.

    This parameter lets you specify in a single instance the number of processes to be started and reserved for parallel ope rations. The syntax is:

    PARALLEL_MIN_SERVERS=n 
    
    

    The n variable is the num ber of processes you want to start and reserve for parallel operations.

    Setting PARALLEL_MIN_SERVERS balances the startup cost against memory usage. Processes started using PARALLEL_MIN_SERVERS do not exit until the database is shut down. This way, when a query is issued the processes are likely to be available. It is desirable, however, to recycle query server pr ocesses periodically since the memory these processes use can become fragmented and cause the high water mark to slowly increase. Whe n you do not set PARALLEL_MIN_SERVERS, processes exit after they are idle for five minutes.

    SHARED_POOL_SIZE

    Parallel execution requires memory resources in addition to those required by serial SQL execution. Additional memory is used for communication and passing data between query server processes and the query coordinator.

    Oracle Database allocates memory for query server processes from the shared pool. Tune the shared pool as follows:

      Allow for other clients of the shared pool, such as shared cursors and stored procedures.

    • Remember that larger values improve performance in multiuser systems, but smaller values use less memory.

    • You must also take into account that using parallel execution generates more cursors. Look at statistics in the V$SQLAREA view to determine how often Oracle recompiles cursors. If the cursor hit ratio is poor, increase the size of the pool. This h appens only when you have a large number of distinct queries.

      You can then monitor the number of buffers used by parallel exec ution and compare the shared pool PX msg pool to the current high wa ter mark reported in output from the view V$PX_PROCESS_SYSSTAT.


      Note:

      If you do not have enough memory available, error message 12853 occurs (insuffi cient memory for PX buffers: current stringK, max needed stringK). This is caused by having insufficient SGA memory available for PX buffers. You need to reconfigure the SGA to have at least (MAX - CURRENT) bytes of additi onal memory.

    By default, Oracle allocates parallel ex ecution buffers from the shared pool.

    If Oracle displays the following error on startup:

    ORA-271
    02: out of memory 
    SVR4 Error: 12: Not enough space 
    
    

    You should reduce the value for SHARED_POOL_SIZE low en ough so your database starts. After reducing the value of SHARED_POOL_SIZE, you might see the error:

    ORA-04031: unable to allocate 16084 bytes of shared memory 
       ("SHARED pool","unknown object","SHARED pool heap","PX msg
    pool") 
    
    

    If so, execute the following query to determine why Oracle could not allocate the 16,084 bytes:

    SELECT NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL='SHARED POOL' 
      GROUP BY ROLLUP (NAME); 
    
    

    Your output shou ld resemble the following:

    NAME                       SUM(BYTES) 
    -------------------------- ----------
     
    PX msg pool                   1474572 
    free memory                    562132
                                  2036704 
    
    

    If you specify SHARED_POOL_SIZE and the amount of memory you need to reserve is bigger than the pool, Oracle does not alloc ate all the memory it can get. Instead, it leaves some space. When the query runs, Oracle tries to get what it needs. Oracle uses the 560 KB and needs another 16KB when it fails. The error does not report the cumulative amount that is needed. The best way of determi ning how much more memory is needed is to use the formulas in "Adding Memory for Message Buffers".

    To resolve the problem in the current example, increase the value for SHARED_POOL_SIZE. As shown in the sample output, the SHARED_POOL_SIZE is about 2 MB. Depending on the amount of memory available, you could increase the value of SHARE D_POOL_SIZE to 4 MB and attempt to start your database. If Oracle continues to display an ORA-4031 message, gradu ally increase the value for SHARED_POOL_SIZE until startup is successful.

    Computing Additional M emory Requirements for Message Buffers

    After you determine the initial setting for the shared pool, you must calculate additional memory requirements for message buffers and determine how much additional space you need for cursors.

    Adding Memory for Message Buffers

    You must increase the value for the SHARED_POOL_SIZE par ameter to accommodate message buffers. The message buffers allow query server processes to communicate with each other.

    Oracle uses a fixed number of buffers for each virtual connection between producer query servers and consumer query servers. Connections in crease as the square of the DOP increases. For this reason, the maximum amount of memory used by parallel execution is bound by the h ighest DOP allowed on your system. You can control this value by using either the PARALLEL_MAX_SERVERS parameter or by u sing policies and profiles.

    To calculate the amount of memory required, use one of the following formulas:

    • For SMP systems:

      mem in bytes = (3 x size x users x groups x connections)
      
      
    • For SMP Real Application Clusters and MPP systems:

      mem in bytes = ((3 x local) + (2 x re
      mote)) x (size x users x groups) 
        / instances
      
      

    Each instance uses the memory computed by the formula.

    T he terms are:

    • SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE

    • USERS = the number of concurrent parallel execution users that you expect to have running with the optimal DOP

    • GROUPS = the number of query server process groups used for each query

      A simple SQL state ment requires only one group. However, if your queries involve subqueries which will be processed in parallel, then Oracle uses an ad ditional group of query server processes.

    • CONNECTIONS = (DOP2 + 2 x DOP)

      If your system is a cluster or MPP, then you should account for the number of instances because this will increase the DOP. In other words, using a DOP of 4 on a two instance cluster results in a DOP of 8. A value of PARALLEL_MAX_SERVERS times the numbe r of instances divided by four is a conservative estimate to use as a starting point.

    • LOCAL = CONNECTIONS/INSTANCES

    • REMOTE = CONNECTIONS - LOCAL

    Add this amount to your original setting for the shared pool. However, before setting a value for eit her of these memory structures, you must also consider additional memory for cursors, as explained in the following section.

    Calculating Additional Memory for Cursors

    Parallel execution plans consume more space in the SQL area than serial execution plans. You should regularly monitor shared pool resource use to ensure that the memory used by both messages an d cursors can accommodate your system's processing requirements.

    Adjusting Memo ry After Processing Begins

    The formulas in this section are just starting points. Whether you are using automated or m anual tuning, you should monitor usage on an on-going basis to make sure the size of memory is not too large or too small. To do this , tune the shared pool using the following query:

    SELECT POOL, NAME, SUM(BYTES) FROM V$SGASTAT WHERE PO
    OL LIKE '%pool%'
      GROUP BY ROLLUP (POOL, NAME);
    
    

    Your output should resemble the following:

    POOL        NAME                       SUM(BYTES) 
    ----------- -------------------------- ---------- 
    shared pool Checkpoint queue
                  38496 
    shared pool KGFF heap                        1964 
    shared pool KGK heap                         4372 
    shared poo
    l KQLS heap                     1134432 
    shared pool LRMPD SGA Table                 23856 
    shared pool PLS non-lib hp
        2096 
    shared pool PX subheap                     186828 
    shared pool SYSTEM PARAMETERS               55756 
    shared pool State obj
    ects                 3907808 
    shared pool character set memory            30260 
    shared pool db_block_buffers               200000 
    s
    hared pool db_block_hash_buckets           33132 
    shared pool db_files                       122984 
    shared pool db_handles
                52416 
    shared pool dictionary cache               198216 
    shared pool dlm shared memory             5387924 
    shared pool
    enqueue_resources               29016 
    shared pool event statistics per sess      264768 
    shared pool fixed allocation callback
      1376 
    shared pool free memory                  26329104 
    shared pool gc_*                            64000 
    shared pool latch nowai
    t fails or sle       34944 
    shared pool library cache                 2176808 
    shared pool log_buffer                      24576 
    sha
    red pool log_checkpoint_timeout          24700 
    shared pool long op statistics array        30240 
    shared pool message pool freequeue
             116232 
    shared pool miscellaneous                  267624 
    shared pool processes                       76896 
    shared pool se
    ssion param values            41424 
    shared pool sessions                       170016 
    shared pool sql area                      954
    9116 
    shared pool table columns                  148104 
    shared pool trace_buffers_per_process     1476320 
    shared pool transactions
                       18480 
    shared pool trigger inform                  24684 
    shared pool                              52248968 
                                        90641768 
    
    

    Evaluate the memory used as shown in your output, and alter the setting for SHARED_POOL_SIZE based on your processing needs.

    To obtain more memory usage statistics, execute the following q uery:

    SELECT * FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE 'Buffers%';
    
    

    Your output shoul d resemble the following:

    STATISTIC                           VALUE 
    -------------------
      ----- 
    Buffers Allocated                   23225 
    Buffers Freed                       23225 
    Buffers Current
      0 
    Buffers HWM                          3620 
    
    

    The amount of memory used appears in the Buffers Curre nt and Buffers HWM statistics. Calculate a value in bytes by multiplying the number of buffers by th e value for PARALLEL_EXECUTION_MESSAGE_SIZE. Compare the high water mark to the parallel execution message pool size to determine if you allocated too much memory. For example, in the first output, the value for large pool as shown in px msg pool is 38,092,812 or 38 MB. The Buffers HWM from the second output is 3,620, w hich when multiplied by a parallel execution message size of 4,096 is 14,827,520, or approximately 15 MB. In this case, the high wate r mark has reached approximately 40 percent of its capacity.

    PARALLEL_ MIN_PERCENT

    The recommended value for the PARALLEL_MIN_PERCENT parameter is 0 (zero).

    This parameter enables users to wait for an acceptable DOP, dep ending on the application in use. Setting this parameter to values other than 0 (zero) causes Oracle to return an error when the requ ested DOP cannot be satisfied by the system at a given time. For example, if you set PARALLEL_MIN_PERCENT to 50, which t ranslates to 50 percent, and the DOP is reduced by 50 percent or greater because of the adaptive algorithm or because of a resource l imitation, then Oracle returns ORA-12827. For example:

    SELECT /*+ PARALLEL(e, 8, 1) */ d.d
    epartment_id, SUM(SALARY)
    FROM employees e, departments d WHERE e.department_id = d.department_id
    GROUP BY d.department_id ORDER BY d
    .department_id;
    
    

    Oracle responds with this message:

    ORA-12827: insufficient parallel query s
    laves available
    

    Parameters Affecting Resource Consumption

    The first group of parameters discussed in this section affects memory and resource consumption for all parallel operations, in particular, for parallel execution. These parameters are:

    A second subset of parameters discussed in this section explains parameters affecting parall el DML and DDL.

    To control resource consumption, you should configure memory at two levels:

    • At the Oracle l evel, so the system uses an appropriate amount of memory from the operating system.

    • At the operating sys tem level for consistency. On some platforms, you might need to set operating system parameters that control the total amount of virt ual memory available, summed across all processes.

    The SGA is typically part of real physical memory. The SGA is sta tic and of fixed size; if you want to change its size, shut down the database, make the change, and restart the database. Oracle allo cates the shared pool out of the SGA.

    A large percentage of the memory used in data warehousing operations is more dynamic. Th is memory comes from process memory (PGA), and both the size of process memory and the number of processes can vary greatly. Use the PGA_AGGREGATE_TARGET parameter to control both the process memory and the number of processes.

    PGA_AGGREGATE_TARGET

    You ca n simplify and improve the way PGA memory is allocated by enabling automatic PGA memory management. In this mode, Oracle dynamically adjusts the size of the portion of the PGA memory dedicated to work areas, based on an overall PGA memory target explicitly set by th e DBA. To enable automatic PGA memory management, you have to set the initialization parameter PGA_AGGREGATE_TARGET. See Oracle Database Performance Tuning Guide for descripti ons of how to use PGA_AGGREGATE_TARGET in different scenarios.

    HASH_AREA_SIZE

    HASH_AREA_SIZE has been depreca ted and you should use PGA_AGGREGATE_TARGET instead.

    < /a>
    SORT_AREA_SIZE

    SORT_AREA_SIZE has been deprecated and you should use PGA_AGGREGATE_TARGET instead.

    PARALLEL_EXECUTION_MESSAGE_SIZE

    The PARALLEL_EXECUTION_MESSAGE_SIZE parameter specifies the size of the buffer used for parallel execution messages. The default value is os specific, but is typically 2K. This value should be adequate for most applications, however, increasing this val ue can improve performance. Consider increasing this value if you have adequate free memory in the shared pool or if you have suffici ent operating system memory and can increase your shared pool size to accommodate the additional amount of memory required.

    Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL

    The parameters that affect parallel DML and parallel DDL resource consumption are:

    Parallel inserts, updates, and deletes require more resources than seria l DML operations. Similarly, PARALLEL CREATE TABLE .. . AS SELECT and PARALLEL CREATE IN DEX can require more resources. For this reason, you may need to increase the value of several additional initialization param eters. These parameters do not affect resources for queries.

    < a id="sthref2236" name="sthref2236">
    TRANSACTIONS

    F or parallel DML and DDL, each query server process starts a transaction. The parallel coordinator uses the two-phase commit protocol to commit transactions; therefore, the number of transactions being processed increases by t he DOP. As a result, you might need to increase the value of the TRANSACTIONS initialization parameter.

    The TRANSACTIONS parameter specifies the maximu m number of concurrent transactions. The default assumes no parallelism. For example, if you have a DOP of 20, you will have 20 more new server transactions (or 40, if you have two server sets) and 1 coordinator transaction. In this case, you should increase T RANSACTIONS by 21 (or 41) if the transactions are running in the same instance. If you do not set this parameter, Oracle sets it to a value equal to 1.1 x SESSIONS. This discussion does not apply if you are using server-managed undo.

    FAST_START_PARALLEL_ROLLBACK

    If a system fails when there are uncom mitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by using the FAST_START_PARALLEL_ROLLBACK parameter.

    This parameter co ntrols the DOP used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. By default, the DOP is chosen to be at most two times the value of the CPU_COUNT parameter.

    If the defa ult DOP is insufficient, set the parameter to the HIGH. This gives a maximum DOP of at most four times the value of the CPU_COUNT parameter. This feature is available by default.

    LOG_BUFFER

    Check the statistic redo buffer allocation retries in the V$SYSSTAT view. If this value is high relative to redo blocks written, try to increase the LOG_BUFFER size. A common LOG_BUFFER size for a system genera ting numerous logs is 3 MB to 5 MB. If the number of retries is still high after increasing LOG_BUFFER size, a problem m ight exist with the disk on which the log files reside. In that case, tune the I/O subsystem to increase the I/O rates for redo. One way of doing this is to use fine-grained striping across multiple disks. For example, use a stripe size of 16 KB. A simpler approach is to isolate redo logs on their own disk.

    DML_LOCKS

    Th is parameter specifies the maximum number of DML locks. Its value should equal the total number of locks on all tables referenced by all users. A parallel DML operation's lock and enqueue resource requirement is very different from serial DML. Parallel DML holds man y more locks, so you should increase the value of the ENQUEUE_RESOURCES and DML_LOC KS parameters by equal amounts.

    Table 24-4 shows the types of locks acquired by coordinator and parallel execution server processes for different types of parallel DML statements. Using this information, you can determine the va lue required for these parameters.

    Table 24-4 Locks Acquired by Parallel DML Statements

    Type of statement Coordinator process acquires : Each parallel execution server acquires:
    Parallel UPDATE or DELETE into partitioned table; WHERE clause pruned to a s ubset of partitions or subpartitions 1 table lock SX

    1 partition lock X for eac h pruned (sub)partition

    1 table lock SX

    1 partition lock NULL for each pruned (sub)partition owned by the query server process

    1 partition-wait lock S for each pruned (sub)partition owned by the query server process

    Parallel row-migrating UPDATE into partitioned table; WHERE clause pruned to a subset of (sub)partitions< /td> 1 table lock SX 1 table lock SX
    1 partition X lock for each pruned (sub)partition 1 partition lock NULL for each pruned (sub)pa rtition owned by the query server process

    1 partition-wait lock S for each pruned partition owned by the query server process

    1 partition lock SX for all othe r (sub)partitions 1 partition lock SX for all other (sub)partitions
    Parallel UPDATE, MERGE , DELETE, or INSERT into partitioned table 1 table lock S X

    Partition locks X for all (sub)partitions

    1 table lock SX

    1 partition lock NULL for each (sub)partition

    1 partition-wait lock S for each (sub)partition

    Parallel INSERT into partitioned table; destination table with parti tion or subpartition clause 1 table lock SX

    1 partition lock X for each specifi ed (sub)partition

    1 table lock SX

    1 partition lock NULL for ea ch specified (sub)partition

    1 partition-wait lock S for each specified (sub)partition

    Parallel INSERT into nonpartitioned table 1 table lock X None



    Consider a table with 600 partitions running with a DOP of 100. Assume all parti tions are involved in a parallel UPDATE or DELETE statement with no row-migrations.

    The coordinator acquires:

    • 1 table lock SX

    • 600 partition locks X

    Total serv er processes acquires:

    • 100 table locks SX

    • 600 partition locks NULL

    • 600 partition-wait locks S

    ENQUEUE_RESO URCES

    This parameter sets the number of resources that can be locked by the lock manager. Parallel DML operations requ ire many more resources than serial DML. Oracle allocates more enqueue resources as needed.

    < !-- class="sect3" -->

    Parameters Related to I/O

    The parame ters that affect I/O are:

    These param eters also affect the optimizer which ensures optimal performance for parallel execution I/O operations.

    DB_CACHE_SIZE

    When you perform parallel updates, merges, and deletes, the buffer cache behavior is very similar to any OLTP system running a high volume of updates.

    DB_BLOCK_SIZE

    < p>The recommended value for this parameter is 8 KB or 16 KB.

    Set the database block size when you create the database. If you are creating a new database, use a large block size such as 8 KB or 16 KB.

    DB_FILE_MULTIBLOCK_READ_COUNT

    T he recommended value for this parameter is eight for 8 KB block size, or four for 16 KB block size. The default is 8.

    This par ameter determines how many database blocks are read with a single operating system READ call. The upper limit for this p arameter is platform-dependent. If you set DB_FILE_MULTIBLOCK_READ_COUNT to an excessively high value, your operating sy stem will lower the value to the highest allowable level when you start your database. In this case, each platform uses the highest v alue possible. Maximum values generally range from 64 KB to 1 MB.

    < /a>

    DISK_ASYNCH_IO and TAPE_ASYNCH_IO

    The recommended value for both of these parameters is TRUE.

    These parameters enable or disable the operating system's asynchronous I/O faci lity. They allow query server processes to overlap I/O requests with processing when performing table scans. If the operating system supports asynchronous I/O, leave these parameters at the default value of TRUE. Figure 24-6 illu strates how asynchronous read works.

    Figure 24-6 Asynchronous Read

    Description of dwhsg097.gif follows
    D escription of the illustration dwhsg097.gif

    Asynchronous operations are currently s upported for parallel table scans, hash joins, sorts, and serial table scans. However, this feature can require operating system spec ific configuration and may not be supported on all platforms. Check your Oracle operating system-specific documentation.

    Monitoring and Diagnos ing Parallel Execution Performance

    You should do the following tasks when diagnosing parallel execution performance pr oblems:

    • Quantify your performance expectations to determine whether there is a problem.

    • Determine whether a problem pertains to optimization, such as inefficient plans that might require reanalyzing tables or adding hints, or whether the problem pertains to execution, such as simple operations like scanning, loading, grouping, or indexing running much slower than published guidelines.

    • Determine whether the problem occurs when running in para llel, such as load imbalance or resource bottlenecks, or whether the problem is also present for serial operations.

    Performance expectations are based on either prior performance metrics (for example, the length of time a given query took last week or on the previous version of Oracle) or scaling and extrapolating from serial execution times (for example, serial execution took 10 minutes while parallel execution took 5 minutes). If the performance does not meet your expectations, consider the following questio ns:

    • Did the execution plan change?

      If so, you should gather statistics and decide whether to use index-only access and a CREATE TABLE AS SELECT statement. You should use index hi nts if your system is CPU-bound. You should also study the EXPLAIN PLAN output.

    • < p>Did the data set change?

      If so, you should gather statistics to evaluate any differences.

    • Is th e hardware overtaxed?

      If so, you should check CPU, I/O, and swap memory.

    After setting your basic goals and a nswering these questions, you need to consider the following topics:

    Is There Regression?

    Does parallel execution's actual performance deviate from what you expected? If performance is as you expected, could there be an underlying performance problem? Perhaps you have a desired outcome in mind to which you are comparing the current outcome. Perhaps yo u have justifiable performance expectations that the system does not achieve. You might have achieved this level of performance or a particular execution plan in the past, but now, with a similar environment and operation, the system is not meeting this goal.

    If performance is not as you expected, can you quantify the deviation? For data warehousing operations, the execution plan is key. F or critical data warehousing operations, save the EXPLAIN PLAN results. Then, as you analyze and reanalyze the data, upgrade Oracle, and load new data, over time you can compare new execution plans with old plans. Take this approach either proactively or reactively.

    Alternatively, you might find that plan performance improves if you use hints. You might want to un derstand why hints are necessary and determine how to get the optimizer to generate the desired plan without hints. Try increasing th e statistical sample size: better statistics can give you a better plan.

    See Oracle Database Performance Tuning Guide for information on preserving plans throughout changes to your system, using plan stability and outlines.

    Is There a Plan Change?

    If there has been a change in the execution plan, determine whether the plan is or should be parallel or serial.

    Is There a Parallel Plan?

    If the execution plan is or should be parallel, study the EXPLAIN PLAN output. Did you analyze all the tables? Perhaps you need to use hints in a few cases. Verify that the hint provides better performance. See utlxplp.sql in the rdbms/admin directory.

    Is There a Serial Plan?

    If the execution plan is or should be s erial, consider the following strategies:

    • Use an index. Sometimes adding an index can greatly improve pe rformance. Consider adding an extra column to the index. Perhaps your operation could obtain all its data from the index, and not req uire a table scan. Perhaps you need to use hints in a few cases. Verify that the hint provides better results.

    • Compute statistics. If you do not analyze often and you can spare the time, it is a good practice to compute statistics. This is particularly important if you are performing many joins, and it will result in better plans. Alternatively, you can estimate statistics. If you use different sample sizes, the plan may change. Generally, the higher the sam ple size, the better the plan.

    • Use histograms for nonuniform distributions.

    • Check initialization parameters to be sure the values are reasonable.

    • Replace bind variables with liter als unless CURSOR_SHARING is set to force or similar.

    • Determine w hether execution is I/O- or CPU-bound. Then check the optimizer cost model.

    • Convert subqueries to joins.

    • Use the CREATE TABLE ... AS SELECT statement to break a complex operation into smaller pieces. With a large query referencing five or six table s, it may be difficult to determine which part of the query is taking the most time. You can isolate bottlenecks in the query by brea king it into steps and analyzing each step.

    Is There Paralle l Execution?

    If the cause of regression cannot be traced to problems in the plan, the problem must be an execution iss ue. For data warehousing operations, both serial and parallel, consider how the plan uses memory. Check the paging rate and make sure the system is using memory as effectively as possible. Check buffer, sort, and hash area sizing. After you run a query or DML operat ion, look at the V$SESSTAT, V$PX_SESSTAT, and V$PQ_SYSSTAT views to see the number of server processes used and other information for the session and system.

    Is the Workload Evenly Distributed?

    If you are using parallel execution, is there unevenness in workload distribution? For example, if there are 10 CPUs and a single user, you can see whether the workload is evenly distributed across CPUs. This can vary over time, wi th periods that are more or less I/O intensive, but in general each CPU should have roughly the same amount of activity.

    The s tatistics in V$PQ_TQSTAT show rows produced and consumed for each parallel exec ution server. This is a good indication of skew and does not require single user operation.

    Operating system statistics show y ou the per-processor CPU utilization and per-disk I/O activity. Concurrently running tasks make it harder to see what is going on, ho wever. It may be useful to run in single-user mode and check operating system monitors that show system level CPU and I/O activity.

    If I/O problems occur, you might need to reorganize your data by spreading it over more devices. If parallel execution problems occur, check to be sure you have followed the recommendation to spread data over at least as many devices as CPUs.

    If there i s no skew in workload distribution, check for the following conditions:

    • Is there device contention?

      < /li>
    • Is there controller contention?

    • Is the system I/O-bound with too little paralle lism? If so, consider increasing parallelism up to the number of devices.

    • Is the system CPU-bound with t oo much parallelism? Check the operating system CPU monitor to see whether a lot of time is being spent in system calls. The resource might be overcommitted, and too much parallelism might cause processes to compete with themselves.

    • Are there more concurrent users than the system can support?

    Mon itoring Parallel Execution Performance with Dynamic Performance Views

    After your system has run for a few days, monitor parallel execution performance statistics to determine whether your parallel processing is optimal. Do this using any of the views discussed in this section.

    In Oracle Real Ap plication Clusters, global versions of the views described in this section aggregate statistics from multiple instances. The global v iews have names beginning with G, such as GV$FILESTAT for V$ FILESTAT, and so on.

    V$PX_BUFFER_ADVICE

    The V$PX_BUFFER_ADVICE view provides statistics on historical a nd projected maximum buffer usage by all parallel queries. You can consult this view to reconfigure SGA size in response to insuffici ent memory problems for parallel queries.

    V$PX_SESSION

    < p>The V$PX_SESSION view shows data about query server sessions, groups, sets, and server numbers. It also displays real- time data about the processes working on behalf of parallel execution. This table includes information about the requested DOP and th e actual DOP granted to the operation.

    V$PX_SESSTAT

    T he V$PX_SESSTAT view provides a join of the session information from V$PX_SESSION and the V$SESSTAT table. Thus, all session statistics available to a normal session are available for all sessions performed using parallel execu tion.

    V$PX_PROCESS

    The V$PX_PROCESS view contains information about the parallel processes, including status, session ID, process ID, and other information.

    V$PX_PROCESS_SYSSTAT

    The V$PX_PROCESS_SYSSTAT view shows the status of query servers and provides buffer allocation statistics.

    V$PQ_SESSTAT

    The V$PQ_SESS TAT view shows the status of all current server groups in the system such as data about how queries allocate processes and how the multiuser and load balancing algorithms are affecting the default and hinted values. V$PQ_SESSTAT will be obsolete in a future release.

    You might need to adjust some parameter settings to improve performance after reviewing data from these v iews. In this case, refer to the discussion of "Tuning General Parameters for Parallel Execution". Query thes e views periodically to monitor the progress of long-running parallel operations.

    For many dynamic performance views, you must set the parameter TIMED_STATISTICS to TRUE in order for Oracle to collect statistics for each view. You can use the ALTER SYSTEM or ALTER SESSION statements to turn TIMED_STATISTICS on and off.

    V$PARAMETER

    The V$PARAMETER view lists the name, current value, and default value of all system parameters. In addition , the view shows whether a parameter is a session parameter that you can modify online with an ALTER SYSTEM or ALTER SESSION statement.

    V$PQ_TQSTAT

    As a simple example, consider a hash jo in between two tables, with a join on a column with only two distinct values. At best, this hash function will have one hash value to parallel execution server A and the other to parallel execution server B. A DOP of two is fine, but, if it is four, then at least tw o parallel execution servers have no work. To discover this type of skew, use a query similar to the following example:

    SELECT dfo_number, tq_id, server_type, process, num_rows
    FROM V$PQ_TQSTAT ORDER BY dfo_number DESC, tq_id, server_ty
    pe, process;
    
    

    The best way to resolve this problem might be to choose a different join method; a nested loop join might be the best option. Alternatively, if one of the join tables is small relative to the other, a BROADCAST distribution meth od can be hinted using PQ_DISTRIBUTE hint. Note that the optimizer considers the BROADCAST distribution met hod, but requires OPTIMIZER_FEATURES_ENABLE set to 9.0.2 or higher.

    Now, assume that you have a join key with hig h cardinality, but one of the values contains most of the data, for example, lava lamp sales by year. The only year that had big sale s was 1968, and thus, the parallel execution server for the 1968 records will be overwhelmed. You should use the same corrective acti ons as described previously.

    The V$PQ_TQSTAT view provides a detailed report of message traffic at the table queu e level. V$PQ_TQSTAT data is valid only when queried from a session that is executing parallel SQL statements. A table queue is the pipeline between query server groups, between the parallel coordinator and a que ry server group, or between a query server group and the coordinator. The table queues are represented explicitly in the operation co lumn by PX SEND <partitioning type> (for example, PX SEND HASH) and PX RECEIVE. For backward compatibility, the row labels of PARALLEL_TO_PARALLEL, SERIAL_TO_PARALLEL, or PARAL LEL_TO_SERIAL will continue to have the same semantics as previous releases and can be used as before to infer the table queue allocation. In addition, the top of the parallel plan is marked by a new node with operation PX COORDINATOR.

    V$PQ_TQSTAT has a row for each query server process that reads from or writes to in each table queue. A table queue connecting 10 consumer processes to 10 producer processes has 20 rows in the view. Sum the bytes column and group by TQ_I D, the table queue identifier, to obtain the total number of bytes sent through each table queue. Compare this with the optimi zer estimates; large variations might indicate a need to analyze the data using a larger sample.

    Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalances. You should investigate large variances to determine whe ther the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.

    Note that the V$PQ_TQSTAT view w ill be renamed in a future release to V$PX_TQSTSAT.

    V$SESSTAT and V$SYSSTAT

    The < a id="sthref2327" name="sthref2327">V$SESSTAT view provides parallel execut ion statistics for each session. The statistics include total number of queries, DML and DDL statements executed in a session and the total number of intrainstance and interinstance messages exchanged during parallel execution during the session.

    V$SYSSTAT provides the same statistics as V$ SESSTAT, but for the entire system.

    Monitoring Session Statistics

    These examples use the dynamic performance views described in "Mo nitoring Parallel Execution Performance with Dynamic Performance Views".

    Use GV$PX_SESSION to determine the c onfiguration of the server group executing in parallel. In this example, sessions 9 is the query coordinator, while sessions 7 and 21 are in the first group, first set. Sessions 18 and 20 are in the first group, second set. The requested and granted DOP for this que ry is 2, as shown by Oracle's response to the following query:

    SELECT QCSID, SID, INST_ID "Inst", SERVE
    R_GROUP "Group", SERVER_SET "Set",
      DEGREE "Degree", REQ_DEGREE "Req Degree"
    FROM GV$PX_SESSION ORDER BY QCSID, QCINST_ID, SERVER_GR
    OUP, SERVER_SET;
    
    

    Your output should resemble the following:

    QCSID      SID        Inst
      Group      Set        Degree     Req Degree 
    ---------- ---------- ---------- ---------- ---------- ---------- ---------- 
     9          9          1 
             9          7          1          1          1          2          2 
             9         21
      1          1          1          2          2 
             9         18          1          1          2          2          2 
       9         20          1          1          2          2          2 
    
    

    For a single instance, use SELECT FROM V$PX_SESSION and do not include the column name Instance ID.

    The proces ses shown in the output from the previous example using
    GV$PX_SESSION collaborate to complete the same task. The n ext example shows the execution of a join query to determine the progress of these processes in terms of physical reads. Use this que ry to track any specific statistic:

    SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET
     "Set",
      NAME "Stat Name", VALUE
    FROM GV$PX_SESSTAT A, V$STATNAME B
    WHERE A.STATISTIC# = B.STATISTIC# AND NAME LIKE 'PHYSICAL READS'
    
      AND VALUE > 0 ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;
    
    

    Your output should resemble the following:

    < pre xml:space="preserve">QCSID SID Inst Group Set Stat Name VALUE ------ ----- ------ ------ ------ --------- --------- ---------- 9 9 1 physical reads 3863 9 7 1 1 1 physical rea ds 2 9 21 1 1 1 physical reads 2 9 18 1 1 2 physical reads 2 9 20 1 1 2 physical reads 2

    Use the previous type of query to track statistics in V$STATNAME. Repeat this query as often as required to observe the progress of the query server processes.< /p>

    The next query uses V$PX_PROCESS to check the status of the query servers.

    SELECT *
     FROM V$PX_PROCESS;
    
    

    Your output should resemble the following:

    SERV STATUS    PID    SPID
        SID    SERIAL 
    ---- --------- ------ --------- ------ ------ 
    P002 IN USE        16     16955     21   7729 
    P003 IN USE        1
    7     16957     20   2921 
    P004 AVAILABLE     18     16959              
    P005 AVAILABLE     19     16962             
    P000 IN USE
        12      6999     18   4720 
    P001 IN USE        13      7004      7    234
    
    

    Monitoring System Statisti cs

    The V$SYSSTAT and V$SESSTAT views contain several statistics for monitoring parallel exec ution. Use these statistics to track the number of parallel queries, DMLs, DDLs, data flow operators (DFOs), and operations. Each que ry, DML, or DDL can have multiple parallel operations and multiple DFOs.

    In addition, statistics also count the number of quer y operations for which the DOP was reduced, or downgraded, due to either the adaptive multiuser algorithm or the depletion of availab le parallel execution servers.

    Finally, statistics in these views also count the number of messages sent on behalf of parallel execution. The following syntax is an example of how to display these statistics:

    SELECT NAME, VALUE F
    ROM GV$SYSSTAT
    WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
    OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%';
    
    
    
    

    Your output should resemble the following:

    NAME
    VALUE      
    -------------------------------------------------- ---------- 
    queries parallelized
    347 
    DML statements parallelized                                 0 
    DDL statements parallelized                                 0 
    DF
    O trees parallelized                                    463 
    Parallel operations not downgraded                         28 
    Parallel
    operations downgraded to serial                   31 
    Parallel operations downgraded 75 to 99 pct               252 
    Parallel operati
    ons downgraded 50 to 75 pct               128 
    Parallel operations downgraded 25 to 50 pct                43 
    Parallel operations dow
    ngraded 1 to 25 pct                 12 
    PX local messages sent                                  74548 
    PX local messages recv'd
                              74128 
    PX remote messages sent                                     0 
    PX remote messages recv'd
                           0 
    
    

    The following query shows the current wait state of each slave and QC process on the system:

    SELECT px.SID "SID", p.PID, p.SPID "SPID", px.INST_ID "Inst",
           px.SERVER_GROUP "Group", px.SERVER_S
    ET "Set",
           px.DEGREE "Degree", px.REQ_DEGREE "Req Degree", w.event "Wait Event"
    FROM GV$SESSION s, GV$PX_SESSION px, GV$PROCESS
     p, GV$SESSION_WAIT w
    WHERE s.sid (+) = px.sid AND s.inst_id (+) = px.inst_id AND
          s.sid = w.sid (+) AND s.inst_id = w.inst_id (
    +) AND
          s.paddr = p.addr (+) AND s.inst_id = p.inst_id (+)
    ORDER BY DECODE(px.QCINST_ID,  NULL, px.INST_ID,  px.QCINST_ID), px.Q
    CSID, 
    DECODE(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP), px.SERVER_SET, px.INST_ID;
    

    Monitoring Operating System Statistics

    There is considerable overlap between information available in Oracle and information available though operating system utilities (such as sar and vmstat on UNIX-based systems). Operating systems provide performance st atistics on I/O, communication, CPU, memory and paging, scheduling, and synchronization primitives. The V$SESSTAT view provides the major categories of operating system statistics as well.

    Typically, operating system information about I/O devices and semap hore operations is harder to map back to database objects and operations than is Oracle information. However, some operating systems have good visualization tools and efficient means of collecting the data.

    Operating system information about CPU and memory us age is very important for assessing performance. Probably the most important statistic is CPU usage. The goal of low-level performanc e tuning is to become CPU bound on all CPUs. Once this is achieved, you can work at the SQL level to find an alternate plan that migh t be more I/O intensive but use less CPU.

    Operating system memory and paging information is valuable for fine tuning the many system parameters that control how memory is divided among memory-intensive data warehouse subsystems like parallel communication, so rt, and hash join.

    Affinity and Parallel Operations

    In a shared-disk cluster or MPP configuration, an instance of the Oracle Real Applica tion Clusters is said to have affinity for a device if the device is directly accessed from the processors on which the instance is r unning. Similarly, an instance has affinity for a file if it has affinity for the devices on which the file is stored.

    Determi nation of affinity may involve arbitrary determinations for files that are striped across multiple devices. Somewhat arbitrarily, an instance is said to have affinity for a tablespace (or a partition of a table or index within a tablespace) if the instance has affin ity for the first file in the tablespace.

    Oracle considers affinity when allocating work to parallel execution servers. The us e of affinity for parallel execution of SQL statements is transparent to users.

    Affinity and Parallel Queries

    Affinity in p arallel queries increases the speed of scanning data from disk by doing the scans on a processor that is near the data. This can prov ide a substantial performance increase for machines that do not naturally support shared disks.

    The most common use of affinity is for a table or index partition to be stored in one file on one device. This configuration provides the highest availability by limiting the damage done by a device failure and makes the best use of partition-parallel index scans.

    DSS customers might prefer to stripe table partitions over multiple devices (probably a subset of the total number of devices). This configuration enables some queries to prune the total amount of data being accessed using partitioning criteria and still obtain parallelism through rowid-range parallel table (partition) scans. If the devices are config ured as a RAID, availability can still be very good. Even when used for DSS, indexes should probably be partitioned on individual dev ices.

    Other configurations (for example, multiple partitions in one file striped over multiple devices) will yield correct que ry results, but you may need to use hints or explicitly set object attributes to select the correct DOP.

    Affinity and Parallel DML

    For parallel DML (inserts, updates, and deletes), affinity enhancements impro ve cache performance by routing the DML operation to the node that has affinity for the partition.

    Affinity determines how to distribute the work among the set of instances or parallel execution servers to perform the DML operation in parallel. Affinity can i mprove performance of queries in several ways:

    • For certain MPP architectures, Oracle uses device-to-node affinity information to determine on which nodes to spawn parallel execution servers (parallel process allocation) and which work gr anules (rowid ranges or partitions) to send to particular nodes (work assignment). Better performance is achieved by having nodes mai nly access local devices, giving a better buffer cache hit ratio for every node and reducing the network overhead and I/O latency.

    • For SMP, cluster, and MPP architectures, process-to-device affinity is used to achieve device isolation. T his reduces the chances of having multiple parallel execution servers accessing the same device simultaneously. This process-to-devic e affinity information is also used in implementing stealing between processes.

    For partitioned tables and indexes, partition-to-node affinity information determines process allocation and work assignment. For shared-nothing MPP systems, Oracle Real Application Clusters tries to assign partitions to instances, taking the disk affinity of the partitions into account. For shared-di sk MPP and cluster systems, partitions are assigned to instances in a round-robin manner.

    Affinity is only available for paral lel DML when running in an Oracle Real Application Clusters configuration. Affinity information which persists across statements impr oves buffer cache hit ratios and reduces block pings between instances.

    Overriding the Default Degree of Parallelism

    The default DOP is appropriate for reducin g response time while guaranteeing use of CPU and I/O resources for any parallel operations.

    If it is memory-bound, or if seve ral concurrent parallel operations are running, you might want to decrease the default DOP.

    Oracle uses the default DOP for ta bles that have PARALLEL attributed to them in the data dictionary or that have the PARALLEL hint specified. If a table does not have parallelism attributed to it, or has NO_PARALLEL (the default) attributed to it, and parallelism is not being forced through ALTER < code>SESSION FORCE PARALLEL, then that table is never scanned in parallel. This override occurs rega rdless of the default DOP indicated by the number of CPUs, instances, and devices storing that table.

    You can adjust the DOP b y using the following guidelines:

    • Modify the default DOP by changing the value for the PARALLEL_TH READS_PER_CPU parameter.

    • Adjust the DOP either by using ALTER TABLE, ALTER SESSION, or by using hints.

    • To increase the number of concurrent parallel operations, reduce the DOP, or set the parameter PARALLEL_ADAPTIVE_MULTI_USER to TRUE.

    Rewriting SQL Statements

    The most important issue for parallel execution is ensuring that all parts of the query plan that process a substantial amount of da ta execute in parallel. Use EXPLAIN PLAN to verify that all plan steps have an OTHER_TAG of PARALLEL_TO_PARALLEL, PARALLEL_TO_SERIAL, PARALLEL_COMBINED_WITH_PARENT, or PARALLEL_COMBI NED_WITH_CHILD. Any other keyword (or null) indicates serial execution and a possible bottleneck. Also verify that such plan s teps end in the operation PX SEND <partitioning type> node (for example, PX SEND H ASH).

    You can also use the utlxplp.sql script to present the EXPLAIN PLAN output with all relevant parallel information.

    You can increase the optimizer's ability to generate parallel plans converting subque ries, especially correlated subqueries, into joins. Oracle can parallelize joins more efficiently than subqueries. This also applies to updates. See "Updating the Table in Parallel" for more information.

    Creating and Populating Tables in Parallel

    Oracle cannot return results to a user process in parallel. If a query returns a large number of rows, execution of the query might indeed be faster. However, the user process can onl y receive the rows serially. To optimize parallel execution performance for queries that retrieve large result sets, use PARALL EL CREATE TABLE ... AS SELECT or direct-path INSERT to store the result set in the database. At a later time, users can view the result set serially.

    Performing the SELECT in parallel does not influence the CREATE statement. If the CREATE i s parallel, however, the optimizer tries to make the SELECT run in parallel also.

    When combined with the NOLOGGING option, the parallel version of CREATE TABLE ... AS SELECT provides a very efficient intermediate table facility, for example:

    C
    REATE TABLE summary PARALLEL NOLOGGING AS SELECT dim_1, dim_2 ..., 
    SUM (meas_1)
    FROM facts GROUP BY dim_1, dim_2;
    
    

    These tables can also be incrementally loaded with parallel INSERT. You can take advantage of intermediate tables using the fo llowing techniques:

    • Common subqueries can be computed once and referenced many times. This can allow som e queries against star schemas (in particular, queries without selective WHERE-clause predicates) to be better paralleli zed. Note that star queries with selective WHERE-clause predicates using the star-transformation technique can be effect ively parallelized automatically without any modification to the SQL.

    • Decompose complex queries into sim pler steps in order to provide application-level checkpoint or restart. For example, a complex multitable join on a database 1 teraby te in size could run for dozens of hours. A failure during this query would mean starting over from the beginning. Using CREATE TABLE ... AS SELECT or PARALLEL INSERT AS SELECT, you can rewrite the query as a sequence of simpler queries that run for a few hours each. If a system failure occurs, the query can be restarted from the last completed step.

    • Implement manual parallel deletes efficiently by creating a new table that omits the unwanted rows from the original table, and then dropping the original table. Alternatively, yo u can use the convenient parallel delete feature, which directly deletes rows from the original table.

    • C reate summary tables for efficient multidimensional drill-down analysis. For example, a summary table might store the sum of revenue grouped by month, brand, region, and salesman.

    • Reorganize tables, eliminating chained rows, compressing free space, and so on, by copying the old table to a new table. This is much faster than export/import and easier than reloading.

    Be sure to use the DBMS_STATS package on newly created tables. Also consider creating indexes. To avoid I/ O bottlenecks, specify a tablespace with at least as many devices as CPUs. To avoid fragmentation in allocating space, the number of files in a tablespace should be a multiple of the number of CPUs. See Chapter 4, " Hardware and I/O C onsiderations in Data Warehouses", for more information about bottlenecks.

    Creating Temporary Tablespaces for Parallel Sort and Hash Join

    For optimal space management performance, you should use locally managed temporary tablespaces. The following is an example:

    CREATE TEMPORARY TABLESP
    ACE TStemp TEMPFILE '/dev/D31'
    SIZE 4096MB REUSE EXTENT MANAGEMENT LOCAL UNIFORM SIZE 10m;
    
    

    You can associate temporary ta blespaces to a database by issuing a statement such as:

    ALTER DATABASE TEMPORARY TABLESPACE TStemp;
    
    
    
    

    Once this is done, explicit assignment of users to tablespaces is not needed.

    Size of Temporary Extents

    When using a locally managed temporary tablespace, extents are all the same size because this helps avoid fragmentation. As a general rule, tem porary extents should be smaller than permanent extents because there are more demands for temporary space, and parallel processes or other operations running concurrently must share the temporary tablespace. Normally, temporary extents should be in the range of 1MB to 10MB. Once you allocate an extent, it is available for the duration of an operation. If you allocate a large extent but only need to use a small amount of space, the unused space in the extent is unavailable.

    At the same time, temporary extents should be large enough that processes do not have to wait for space. Temporary tablespaces use less overhead than permanent tablespaces when al locating and freeing a new extent. However, obtaining a new temporary extent still requires the overhead of acquiring a latch and sea rching through the SGA structures, as well as SGA space consumption for the sort extent pool.

    See Oracle Database Performance Tuning Guide for information regarding locally-managed temporary tablespaces.

    Executing Paralle l SQL Statements

    After analyzing your tables and indexes, you should see perf ormance improvements based on the DOP used.

    As a general process, you should start with simple parallel operations and evaluat e their total I/O throughput with a SELECT COUNT(*) FROM facts statement. Then, e valuate total CPU power by adding a complex WHERE clause to the statement. An I/O imbalance might suggest a better physi cal database layout. After you understand how simple scans work, add aggregation, joins, and other operations that reflect individual aspects of the overall workload. In particular, you should look for bottlenecks.

    Besides query performance, you should also m onitor parallel load, parallel index creation, and parallel DML, and look for good utilization of I/O and CPU resources.

    Using EXPLAIN PLAN to Show Parallel Operations Plans

    Use the EXPLAIN PLAN statement to see the execution plans for parallel queries. EXPLAIN PLAN output shows optimizer information in the COST, BYTES, and CAR DINALITY columns. You can also use the utlxplp.sql script to present the EXPLAIN PLAN o utput with all relevant parallel information.

    There are several ways to optimize the parallel execution of join statements. Yo u can alter system configuration, adjust parameters as discussed earlier in this chapter, or use hints, such as the DISTRIBUTIO N hint.

    The key points when using EXPLAIN PLAN are to:

    • Verify op timizer selectivity estimates. If the optimizer thinks that only one row will be produced from a query, it tends to favor using a nes ted loop. This could be an indication that the tables are not analyzed or that the optimizer has made an incorrect estimate about the correlation of multiple predicates on the same table. A hint may be required to force the optimizer to use another join method. Cons equently, if the plan says only one row is produced from any particular stage and this is incorrect, consider hints or gather statist ics.

    • Use hash join on low cardinality join keys. If a join key has few distinct values, then a hash join may not be optimal. If the number of distinct values is less than the DOP, then some parallel query servers may be unable to work on the particular query.

    • Consider data skew. If a join key involves excessive data skew, a hash join may r equire some parallel query servers to work more than others. Consider using a hint to cause a BROADCAST distribution met hod if the optimizer did not choose it. Note that the optimizer will consider the BROADCAST distribution method only if the OPTIMIZER_FEATURES_ENABLE is set to 9.0.2 or higher. See "V$PQ_TQSTAT" for further details.< /p>

    Additional Considerations for Parallel DML

    Wh en you want to refresh your data warehouse database using parallel insert, update, or delete on a data warehouse, there are additiona l issues to consider when designing the physical database. These considerations do not affect parallel execution operations. These is sues are:

    PDML and Direct-Path Restrictions

    If a parallel restriction is violated, the operation is simply performed serially. If a direct-path INSERT rest riction is violated, then the APPEND hint is ignored and a conventional insert is performed. No error message is returne d.

    Using Local and Global Striping

    Parallel updates and deletes work only on partitioned tables. T hey can generate a high number of random I/O requests during index maintenance.

    For < a id="sthref2382" name="sthref2382">local index maintenance, local striping is most efficient in reducing I/O contention because one server process only goes to its own set of disk s and disk controllers. Local striping also increases availability in the event of one disk failing.

    For global index maintena nce (partitioned or nonpartitioned), globally striping the index across many disks and disk controllers is the best way to distribute the number of I/Os.

    Increasing INITRANS

    If you have gl obal indexes, a global index segment and global index blocks are shared by server processes of the same parallel DML statement. Even if the operations are not performed against the same row, the server processes can share the same index blocks. Each server transacti on needs one transaction entry in the index block header before it can make changes to a block. Therefore, in the CREATE INDEX or ALTER INDEX statements, you should set INITRANS, the initial number of transactions allocated within each data block, to a large value, such as the maximum DOP against this index.

    Limitation on Available Number of Transaction Free Lists for Segments

    There is a limitation on the available number of transaction free lists for segments in dictionary-managed tablespaces. Once a segment has bee n created, the number of process and transaction free lists is fixed and cannot be altered. If you specify a large number of process free lists in the segment header, you might find that this limits the number of transaction free lists that are available. You can ab ate this limitation the next time you re-create the segment header by decreasing the number of process free lists; this leaves more r oom for transaction free lists in the segment header.

    For UPDATE and DELETE operations, each server process can require its own transaction free list. The parallel DML DOP is thus effectively limited by the smallest number of transac tion free lists available on the table and on any of the global indexes the DML statement must maintain. For example, if the table ha s 25 transaction free lists and the table has two global indexes, one with 50 transaction free lists and one with 30 transaction free lists, the DOP is limited to 25. If the table had had 40 transaction free lists, the DOP would have been limited to 30.

    The < code>FREELISTS parameter of the STORAG E clause is used to set the number of process free lists. By default, no process free lists are created.

    The default nu mber of transaction free lists depends on the block size. For example, if the number of process free lists is not set explicitly, a 4 KB block has about 80 transaction free lists by default. The minimum number of transaction free lists is 25.

    Using Multiple Archivers

    Parallel DDL and parallel DML operations can generate a large amount of redo logs. A single ARCH process to archive these redo logs might not be able to keep up. To avoid this problem , you can spawn multiple archiver processes. This can be done manually or by using a job queue.

    Database Writer Process (DBWn) Workload

    Parallel DML operations dirty a large number of data , index, and undo blocks in the buffer cache during a short period of time. For example, suppose you see a high number of free_ buffer_waits after querying the V$SYSTEM_EVENT view, as in the following syntax:

    SELECT TOTAL_WAITS FROM V$SYSTEM_EVENT WHERE EVENT = 'FREE BUFFER WAITS';
    
    

    In this case, you should consider increasing the DBWn processes. If there are no waits for free buffers, the query will not return any rows.

    [NO]LOGGING Clause

    The [NO]LOGGING clause applies to tables, partitions, tablespaces, and indexes. Virtually no log is genera ted for certain operations (such as direct-path INSERT) if the NOLOGGING clause is used. The NOLOGGING attribute is not specified at the IN SERT statement level but is instead specified when using the ALTER or CREATE statement for a table, partition, index, or tablespace.

    When a table or index has NOLOGGING set, neither parallel nor serial direct-path INSERT operations generate redo logs. Processes running with the NOLOGGING option set run faster because n o redo is generated. However, after a NOLOGGING operation against a table, partition, or index, if a media failure occur s before a backup is taken, then all tables, partitions, and indexes that have been modified might be corrupted.

    Direct-path < code>INSERT operations (except for dictionary updates) never generate redo logs. The NOLOGGING attribute does not affect undo, only redo. To be precise, NOLOGGING allows the direct-path INSERT operation to generate a neg ligible amount of redo (range-invalidation redo, as opposed to full image redo).

    For backward compatibility, [UN]RECOVER ABLE is still supported as an alternate keyword with the CREATE TABLE statement. This alternate keyw ord might not be supported, however, in future releases.

    At the tablespace level, the logging clause specifies the default log ging attribute for all tables, indexes, and partitions created in the tablespace. When an existing tablespace logging attribute is ch anged by the ALTER TABLESPACE statement, then all tables, indexes, and partitions created after the A LTER statement will have the new logging attribute; existing ones will not change their logging attributes. The tablespace-lev el logging attribute can be overridden by the specifications at the table, index, or partition level.

    The default logging attr ibute is LOGGING. However, if you have put the database in NOARCHIVELOG mode, by issuing ALTER DATABASE NOARCHIVELOG, then all operations that can be done witho ut logging will not generate logs, regardless of the specified logging attribute.

    Creating Indexes in Parallel

    Multiple processes can work together simultaneously to cr eate an index. By dividing the work necessary to create an index among multiple server processes, Oracle Database can create the inde x more quickly than if a single server process created the index sequentially.

    Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of qu ery processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the p arallel coordinator simply concatenates the pieces (which are ordered) to form the final index.

    Parallel local index creation uses a single server set. Each server process in the set is assigned a table partition to sc an and for which to build an index partition. Because half as many server processes are used for a given DOP, parallel local index cr eation can be run with a higher DOP. However, the DOP is restricted to be less than or equal to the number of index partitions you wi sh to create. To avoid this, you can use the DBMS_PCLXUTIL package.

    You can optionally specify that no redo and u ndo logging should occur during index creation. This can significantly improve performance but temporarily renders the index unrecove rable. Recoverability is restored after the new index is backed up. If your application can tolerate a window where recovery of the i ndex requires it to be re-created, then you should consider using the NOLOGGING clause.

    The PARALLEL clause in the CREATE INDEX statement is the only way in which you can specify the DOP for creating the index. If the DOP is not specified in the parallel cla use of CREATE INDEX, then the number of CPUs is used as the DOP. If there is no PARALLEL claus e, index creation is done serially.

    When creating an index in parallel, the STO RAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process s tarts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some of the extents might be trimmed , and the resulting index might be smaller than the requested 60 MB.

    When you add or enable a UNIQUE or PRIMARY KEY constraint on a table, you cannot automatically create the required index in parallel. Instead, manually create an index on the desired columns, using the CREATE INDEX statement and an appropriate PARALLEL clause, and then add or enable the constrai nt. Oracle then uses the existing index when enabling or adding the constraint.

    Multiple constraints on the same table can be enabled concurrently and in parallel if all the constraints are already in the ENABLE NOVALIDATE state. In the following example, the ALTER TABLE ... ENABLE CONSTRAINT statement performs t he table scan that checks the constraint in parallel:

    CREATE TABLE a (a1 NUMBER CONSTRAINT ach CHECK (a
    1 > 0) ENABLE NOVALIDATE)
    PARALLEL; 
    INSERT INTO a values (1);
    COMMIT;
    ALTER TABLE a ENABLE CONSTRAINT ach;
    

    Parallel DML Tips

    This section provides an overview of parallel DML funct ionality. The topics covered include:

    Parallel DML Tip 1: INSE RT

    Oracle INSERT functionality can be summarized as follows:

    Table 24-5 Summary of INSERT Features

    Note:

    Table, partition, and partition-wait DML locks all appear as TM locks in the V$LOCK view.
    Insert Type Parallel Serial NOLOGGING
    Conventional No Yes No
    Direct-path
    INSERT

    (APPEND)

    Yes , but requires:

    ALTER SESSION ENABLE PARALLEL DML Table PA RALLEL attribute or PARALLEL hint
    APPEND hint (optional)

    Yes, but requires:

    APPEND hint

    Yes, but requi res:

    NOLOGGING attribute set for partition or table


    If parallel DML is enabled and there is a PARALLEL hint or PARALLEL attribute set for the table in the data dictionary, then inserts are parallel and appended, unless a restriction applies. If either the PARALLEL hint or PARALLEL attribute is missing, the insert is performed serially.

    Parallel DML Tip 2: Direct-Path INSE RT

    The append mode is the default during a parallel insert: data is always inserted into a new block which is allocate d to the table. Therefore the APPEND hint is optional. You should use append mo de to increase the speed of INSERT operations, but not when space utilization needs to be optimized. You can use < a id="sthref2420" name="sthref2420">NOAPPEND to override append mode.

    The APPEND hint applies to both serial and parallel insert: even serial inserts are faster if you use this hint. APPEND, however, does require more spac e and locking overhead.

    You can use NOLOGGING with APPEND t o make the process even faster. NOLOGGING means that no redo log is generated for the operation. NOLOGGING is never the default; use it when you wish to optimize performance. It should not normally be used when recovery is needed for the ta ble or partition. If recovery is needed, be sure to take a backup immediately after the operation. Use the ALTER TABLE [NO]LOGGING statement to set the appropriate value.

    Parallel DML Tip 3: Parallelizing INSERT, MERGE, UPDATE, and DELETE

    Wh en the table or partition has the PARALLEL attribute in the data dictionary, th at attribute setting is used to determine parallelism of INSERT, UPDATE, and DELETE statements as well as queries. An explicit PARALLEL hint for a table in a statement overrides the effect of the PARALLEL attribute in the data dictionary.

    You can use the NO_PARALLEL hint to override a PARALLEL attrib ute for the table in the data dictionary. In general, hints take precedence over attributes.

    DML operations are considered for parallelization only if the session is in a PARALLEL DML enabled mode. (Use ALTER SESSI ON ENABLE PARALLEL DML to enter this mode.) The mode does not affect parallelization of queries or of the query portions of a DML statement.

    Parallelizing INSERT ... SELECT

    In the INSERT ... SELECT statement you can specify a PARALLEL hint after the INSERT keyword, in addition to the hint a fter the SELECT keyword. The PARALLEL hint after the INSERT keyword applies to the INSER T operation only, and the PARALLEL hint after the SELECT keyword applies to the SELECT operation only. Thus, parallelism of the INSERT and SELECT operations are independent of each other. If one operation cannot be performed in parallel, it has no effect on whether the other operation can be performed in parallel.

    The ability to parallelize inserts causes a change in existing behavior if the user has explicitly enabled the session for parallel DML a nd if the table in question has a PARALLEL attribute set in the data dictionary entry. In that case, existing INSE RT ... SELECT statements that have the select operation parallelized can also have their insert operation paralle lized.

    If you query multiple tables, you can specify multiple SELECT PARALLEL hints and multiple PARALLEL attributes.

    Example 24-7 Parallelizing INSERT ... SELECT

    Add the new employee s who were hired after the acquisition of ACME.

    INSERT /*+ PARALLEL(EMP) */ INTO employees
    
    SELECT /*+ PARALLEL(ACME_EMP) */ *  FROM ACME_EMP;
    
    

    The APPEND keyword is not required in this example becau se it is implied by the PARALLEL hint.

    Parallelizing UPDATE and DELETE

    The PARALLEL hint (placed immediately after the UPDATE or DELETE keyword ) applies not only to the underlying scan operation, but also to the UPDATE or DELETE operation. Alternativ ely, you can specify UPDATE or DELETE parallelism in the PARALLEL clause specified in the defi nition of the table to be modified.

    If you have explicitly enabled parallel DML for the session or transaction, UPDATE or DELETE statements that have their query operation parallelized can also have their UPDATE or DELETE operation parallelized. Any subqueries or updatable views in the statement can have their own separate PARALLEL< /code> hints or clauses, but these parallel directives do not affect the decision to parallelize the update or delete. If these opera tions cannot be performed in parallel, it has no effect on whether the UPDATE or DELETE portion can be perf ormed in parallel.

    Tables must be partitioned in order to support parallel UPDATE and DELETE.

    Example 24-8 < strong>Parallelizing UPDATE and DELETE

    Give a 10 percent salary raise to all clerks in Dallas.

    UPDATE /*+ PARALLEL(EMP) */ employees
    SET SAL=SAL * 1.1 WHERE JOB='CLERK' AND DEPTNO IN
      (SELECT DEPTNO
     FROM DEPT WHERE LOCATION='DALLAS');
    
    

    The PARALLEL hint is applied to the UPDATE operation as we ll as to the scan.

    Example 24-9 Parallelizing UPDATE and DELETE

    R emove all products in the grocery category because the grocery business line was recently spun off into a separate company.

    DELETE /*+ PARALLEL(PRODUCTS) */ FROM PRODUCTS 
    WHERE PRODUCT_CATEGORY ='GROCERY';
    
    

    Again, the parall elism is applied to the scan as well as UPDATE operation on table employees.

    Incremen tal Data Loading in Parallel

    Parallel DML combined with the updatable join views facility provides an efficient soluti on for refreshing the tables of a data warehouse system. To refresh tables is to update them with the differential data generated fro m the OLTP production system.

    In the following example, assume that you want to refresh a table named customer th at has columns c_key, c_name, and c_addr. The differential data contains either new rows or ro ws that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the product ion system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named diff_ customer, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task. You can use the APPEND hint when loading in parallel as well.

    Once diff_customer is loaded, the refresh process can be started. It can be performed in two phases or by merging in parallel, as demonstrated in the f ollowing:

    Updating the Table in Parallel

    The following statement is a straightfo rward SQL implementation of the update using subqueries:

    UPDATE customers SET(c_name, c_addr) = (SELECT
     c_name, c_addr
    FROM diff_customer WHERE diff_customer.c_key = customer.c_key)
    WHERE c_key IN(SELECT c_key FROM diff_customer);
    
    
    
    

    Unfortunately, the two subqueries in this statement affect performance.

    An alternative is to rewrite this query using u pdatable join views. To do this, you must first add a primary key constraint to the diff_customer table to ensure that t he modified columns map to a key-preserved table:

    CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_
    key) PARALLEL NOLOGGING;
    ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);
    
    

    You can then update the customers table with the following SQL statement:

    UPDATE /*+ PARALLEL(cust_joinview) */
    (SELECT /*+ PARALLEL(cu
    stomers) PARALLEL(diff_customer) */
    CUSTOMER.c_name AS c_name CUSTOMER.c_addr AS c_addr,
    diff_customer.c_name AS c_newname, diff_cust
    omer.c_addr AS c_newaddr
       WHERE customers.c_key = diff_customer.c_key) cust_joinview
       SET c_name = c_newname, c_addr = c_newaddr;
    
    
    

    The base scans feeding the join view cust_joinview are done in parallel. You can then parallelize the updat e to further improve performance, but only if the customers table is partitioned.

    Inserting the New Rows into the Table in Parallel

    The last phase of the refresh process consi sts of inserting the new rows from the diff_customer temporary table to the customer table. Unlike the upda te case, you cannot avoid having a subquery in the INSERT statement:

    INSERT /*+PARALLEL(cu
    stomers)*/ INTO customers SELECT * FROM diff_customer s);
    
    

    However, you can guarantee that the subquery is transformed int o an anti-hash join by using the HASH_AJ hint. Doing so enables you to use parallel INSERT to execute the p receding statement efficiently. Parallel INSERT is applicable even if the table is not partitioned.

    Merging in Parallel

    You can combine updates and inserts into one statement, commonly known as a merge. The following statement achieves the same result as all of the statements in "Updating the Table in Parallel" and "Inserting the New Rows into the Table in Parallel":

    MERGE INTO customers USING diff_customer ON (diff_customer.c_key = customer.c_key) WHEN MATCHED THEN UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customers.c_key) WHEN NOT MATCHED T HEN INSERT VALUES (diff_customer.c_key,diff_customer.c_data);

    Using Hints with Query Optimization

    Query optimization is a sophisticated approach to finding the best execution plan for SQL statements. Oracle automatically uses query optimization with parallel execution.

    You must use the < code>DBMS_STATS package to gather current statistics for cost-based optimization. In particular, tables used in parallel shoul d always be analyzed. Always keep your statistics current by using the DBMS_STATS package. Failure to do so may result i n degraded execution performance due to non-optimal execution plans.

    Use discretion in employing hints. If used, hints should come as a final step in tuning and only when they demonstrate a necessary and significant performance advantage. In such cases, begin with the execution plan recommended by query optimization, and go on to test the effect of hints only after you have quantified your performance expectations. Remember that hints are powerful. If you use them and the underlying data changes, you might need to chang e the hints. Otherwise, the effectiveness of your execution plans might deteriorate.

    FIRST_ROWS(n) Hint

    The FIRST_ROWS(n) hint enables the optimizer to use a new optimization mode to optimize the query to return n row s in the shortest amount of time. Oracle Corporation recommends that you use this new hint in place of the old FIRST_ROWS hint for online queries because the new optimization mode may improve the response time compared to the old optimization mode.

    Use the FIRST_ROWS(n) hint in cases where you want the first n number of rows in the shortest possible time. For example, to obtain the first 10 rows in the shortest possible time, use the hint as follows:

    SELEC
    T /*+ FIRST_ROWS(10) */ article_id
    FROM articles_tab WHERE CONTAINS(article, 'Oracle')>0 ORDER BY pub_date DESC;
    

    Enabling Dynamic Sampling

    Dynamic sampling allows Oracle to derive m ore accurate statistics and thereby improve query performance when statistics do not exist or are out of date. This is particularly u seful in data warehousing environments or when you expect long transactions or queries. In these situations, making sure that Oracle uses the best execution plan is important. Dynamic sampling does, however, have a small cost, so you should use it when that cost is likely to be a small fraction of the total execution time.

    If you enable dynamic statistic sampling, Oracle determines at comp ile time whether a query would benefit from dynamic sampling. If so, a recursive SQL statement is issued to scan a small, random samp le of the table's blocks, and to apply the relevant single table predicates to estimate predicate selectivities. Relevant table, inde x and column statistics are also estimated. More accurate selectivity and statistics estimates allow the optimizer to produce better performing plans.

    Dynamic sampling is controlled with the initialization parameter OPTIMIZER_DYNAMIC_SAMPLING, wh ich can be set to a value between 0 and 10, inclusive. Increasing the value of the parameter will result in more aggressive applicati on of dynamic sampling, in terms of both the type (unanalyzed/analyzed) of tables sampled and the amount of I/O spent on sampling.

    Oracle also provides the table-specific hint DYNAMIC_SAMPLING. If the table name is omitted, the hint is considered cursor-level. The table-level hint forces dynamic sampling for the table.

    See Oracle Database Performance Tuning Guide for more information regarding dynamic sampling.