Skip Headers

Oracle® Database C oncepts
10g Release 1 (10.1)

Part Number B10743-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback< /a>

Go to previous page
Previous
Go to next page
Next
View PDF

23 Information Integration

This chapt er contains the following topics:

< /a>

Intr oduction to Oracle Information Integration

As a company evolves, it becomes increasingly important for it to be able t o share information among multiple databases and applications. Companies need to share OLTP updates, database events, and application messages, as customers place orders online, through the sales force, or even with a partner. This information must be routed to a va riety of destinations including heterogeneous replicated databases, message queuing systems, data warehouse staging areas, operationa l data stores, other applications, and a standby database.

There are three basic approaches to sharing information. You can co nsolidate the information into a single database, which eliminates the need for further integration. You can leave information distri buted, and provide tools to federate that information, making it appear to be in a single virtual database. Or, you can share informa tion, which lets you maintain the information in multiple data stores and applications. This chapter focuses on federating and sharin g information.


See Also:

Chapter 16, " Business Intelligence" for more information on features to consolidate information

Oracle provides distributed SQL for federating distributed information. Distributed SQL synchronously accesses and updates data distributed among multiple databases, whi le maintaining location transparency and data integrity.

Oracle Streams is the asynch ronous information sharing infrastructure in the Oracle database. Oracle Streams can mine the Oracle redo logs to capture DML and DDL changes to Oracle data, and it makes that changed data available to other applications and databases. Thus, Oracle Streams can provi de an extremely flexible asynchronous replication solution, as well as an event notification framework. Because Streams supports appl ications explicitly enqueuing and dequeuing messages, it also provides a complete asynchronous messaging solution. That solution, Ora cle Streams Advanced Queuing, can be used to exchange information with customers, partners, and suppliers, and to coordinate business processes. Both Streams and distributed SQL can access and update data in non-Oracle system s using Oracle Transparent Gateways, Generic Connec tivity, and the Messaging Gateway. Oracle can work with non-Oracle data sources, non-Oracle message queuing systems, and non-SQL applications, ensuring interoperabilility with other vendor's products and technologies. Each of the solutions are described in detail in the following sections.

A distributed environment is a network of d isparate systems that seamlessly communicate with each other. Each system in the distributed environment is called a node. The system to which a user is directly connected is called the local system. Any additional systems accessed by this user are called remote sys tems. A distributed environment allows applications to access and exchange data from the local and remote systems. All the data can b e simultaneously accessed and modified.

While a distributed environment enables increased access to a large amount of data acr oss a network, it must also hide the location of the data and the complexity of accessing it across the network.

In order for a company to operate successfully in a distributed environment, it must be able to do the following:

  • Exc hange data between Oracle databases

  • Communicate between applications

  • Exchan ge information with customers, partners, and suppliers

  • Replicate data between databases

  • Communicate with non-Oracle databases

Federated Acc ess

A homogeneous distributed database system is a network of two or more Oracle databases that reside on one or more machines.

Distributed SQL

Distributed SQL e nables applications and users to simultaneously access or modify the data in several databases as easily as they access or modify a s ingle database.

An Oracle distributed database system can be transparent to users, making it appear as though it is a single O racle database. Companies can use this distributed SQL feature to make all its Oracle databases look like one and thus reduce some of the complexity of the distributed system.

Oracle uses database links to enable users on one database to access objects in a r emote database. A local user can access a link to a remote database without having to be a user on the remote database.


See Also:

Oracle Database Administrator's Guide for more information o n database links

Location Transparen cy

An Oracle distributed database system lets application developers and administrators hide the physical location of database objects from applications and users. Location transparency exists when a user can universally refer to a database object, su ch as a table, regardless of the node to which an application connects. Location transparency has several benefits, including the fol lowing:

  • Access to remote data is simple, because database users do not need to know the physical locatio n of database objects.

  • Administrators can move database objects with no impact on users or existing data base applications. Typically, administrators and developers use synonyms to establish location transparency for the tables and suppor ting objects in an application schema.

In addition to synonyms, developers can use views and stored procedures to es tablish location transparency for applications that work in a distributed database system.


< p>See Also:


Distributed Query Optimization

Distri buted query optimization reduces the amount of data transfer required between sites when a transaction retrieves data from remote tab les referenced in a distributed SQL statement. Distributed query optimization uses Oracle's optimizer to find or generate SQL express ions that extract only the necessary data from remote tables, process that data at a remote site (or sometimes at the local site) and send the results to the local site for final processing.

This operation reduces the amount of required data transfer when com pared to the time it takes to transfer all the table data to the local site for processing. Using various optimizer hints, such as DRIVING_SITE, NO_MERGE, and INDEX, you can control where Oracle processes the data and how it ac cesses the data.


See Also:

Oracle Database Performance Tuning Guide for mo re information on the optimizer and hints

Information Sharing

At the heart of any integration is the sharing of data among various applications in the enterprise.

Replicati on is the maintenance of database objects in two or more databases. It provides a solution to the scalability, availability, and perf ormance issues facing many companies. For example, replication can improve the performance of a company's Web site. By locally replic ating remote tables that are frequently queried by local users, such as the inventory table, the amount of data going across the netw ork is greatly reduced. By having local users access the local copies instead of one central copy, the distributed database does not need to send information across a network repeatedly, thus helping to maximize the performance of the database application. Oracle St reams provides powerful replication features that can be used to keep multiple copies of distributed objects synchronized.

Man y companies have developed a variety of autonomous and distributed applications to automate business processes and manage business ta sks. However, these applications need to communicate with each other, coordinating business processes and tasks in a consistent manne r. They also need to exchange information efficiently with customers, partners, and suppliers over low-cost channels such as the Inte rnet, while preserving a traceable history of events—a requirement previously satisfied through now obsolete paper forms.For l oose application coupling, Oracle offers Oracle Streams Advanced Queuing, which is built on top of the flexible Oracle Streams infras tructure. Oracle Streams Advanced Queuing provides a unified framework for processing events.Events generated in applications, in wor kflow, or implicitly captured from redo logs or in database triggers can be captured and staged in a queue. These events can be consu med in a variety of ways. They can be applied automatically with a user-defined function or database table operation, or they can be dequeued explicitly. Also, notifications can be sent to the consuming application. These events can be transformed at any stage. If t he consuming application is on a different database, then the events can be propagated to the appropriate database automatically. Ope rations on these events can be automatically audited, and the history can be retained for the user-specified duration.

Oracle Streams

Oracle Streams enables the propagation and management of data, transactions, and events in a data stream either within a database, or from one database to another. The stream routes published information to subscribed destinations. As users' needs change, they can implement a new capability of Oracle Stream s, without sacrificing existing capabilities.

Oracle Streams provides a set of elements that allows users to control what info rmation is put into a stream, how the stream flows or is routed from node to node, what happens to events in the stream as they flow into each node, and how the stream terminates. By specifying the configuration of the elements acting on the stream, a user can addre ss specific requirements, such as message queuing or data replication.

Oracle Streams satisfies the information sharing requir ements for a variety of usage scenarios. Oracle Streams Advanced Queuing provides the database-integrated message queuing and event m anagement capabilities. In addition, Oracle includes tools to help users build event notification, replication and data warehouse loa ding solutions using Oracle Streams. Using the full power of Oracle Streams, you can create configurations that span multiple use cas es, enabling new classes of applications. Most deployments and their associated metadata are compatible. For example, a system config ured to load a data warehouse easily can be extended to enable bi-directional replication. A complete reconfiguration is not required .



Oracle Streams Architecture

Th e architecture of Oracle Streams is very flexible. As shown in Figure 23-1, Streams contains three basic elemen ts.

  • Capture

  • Staging

  • Consumption

Figure 23-1 Streams Information Flow

Description of cncpt
200.gif follows
Description of the illustration cncpt200. gif

Capture

Oracle Streams can capture ev ents implicitly and explicitly and place these events in a staging area. Database events, such as DML and DDL changes, can be implici tly captured by mining the redo log files. Rules determine what events are captured. Information representing a captured event is for matted as a logical change record (LCR) and placed in the staging area.

Oracle Streams supports mining the online redo log, as well as mining archived log files. In the case of online redo log mining, redo information is mined for change data at the same time it is written, reducing the latency of capture.

User applications explicitly can enqueue messages representing events into th e staging area. These messages can be formatted as LCRs, which will allow them to be consumed by the apply engine, or they can be for matted for consumption by another user application using an explicit dequeue.

Staging

Once captured, events are placed in a staging area. The staging area is a queue that stores and manage s captured events. LCRs and other types of events are stored in a staging area until subscribers consume them. LCR staging provides a holding area with security, as well as auditing and tracking of LCR data.

Subscribers examine the contents of the staging are a and determine whether they have an interest in the message representing that event. A subscriber can either be a user application, a propagation to another staging area, usually on another system, or an apply process. The subscriber optionally can evaluate a set o f rules to determine whether the message meets the criteria set forth in the subscription. If so, then the message will be consumed b y the subscriber.

If the subscriber is a user application, then the application will dequeue the message from the staging area in order to consume the message. If the subscriber is a propagation to another staging area, then the message will be propagated to that staging area. If the subscriber is an apply process, then it will be dequeued and consumed by the apply process.

Events i n the staging area optionally may be propagated to other staging areas in the same database, or to staging areas in remote databases. To simplify network routing and reduce network traffic, events need not be sent to all databases and applications. Rather, they can be directed through staging areas on one or more systems until they reach the subscribing system. Not all systems need subscribe to t he events, providing flexibility regarding what events are applied at a particular system. A single staging area can stage events fro m multiple databases, simplifying setup and configuration.

As events enter the staging area, are propagated, or exit the stagi ng area, they can be transformed. A transformation is a change in the form of an object participating in capture and apply or a chang e in the data it holds. Transformations can include changing the datatype representation of a particular column in a table at a parti cular database, adding a column to a table at one database only, or including a subset of the data in a table at a particular databas e.

Replication with Oracle Streams

Oracle Streams is an information sharing technology that automatically determines what information is relevant and shares th at information with those who need it. This active sharing of information includes capturing and managing events in the database incl uding DML and DDL changes and propagating those events to other databases and applications. Data changes can be applied directly to t he replica database or can call a user-defined procedure to perform alternative work at the destination database. For example, such a procedure can populate a staging table used to load a data warehouse.

The basic elements of the Oracle Streams technology use d in replication environments include the following:

Capturing DML and DDL Changes

Configuring Streams for replication begins with specifying an object or set o f objects to be replicated. Using the implicit capture mechanism of Oracle Streams, changes made to these objects can be captured eff iciently and replicated to one or more remote systems with little impact to the originating system. This capture mechanism can extrac t both data changes (DML) and structure changes (DDL) from the redo log. The captured changes are published to a staging area. Log-ba sed capture leverages the fact that changes made to tables are logged in the redo log to guarantee recoverability in the event of a m alfunction or media failure.

Capturing changes directly from the redo log minimizes the overhead on the system. Oracle can rea d, analyze, and interpret redo information, which contains information about the history of activity on a database. Oracle Streams ca n mine the information and deliver change data to the capture process.

Replicated databases utilizing Oracle Streams technolog y need not be identical. Participating databases can maintain different data structures using Streams to transform the data into the appropriate format. Streams provides the ability to transform the stream at multiple points: during change capture at the source data base, while propagating to another database, or during application at the destination site. These transformations are user-defined fu nctions registered within the Oracle Streams framework. For example, the transformation can be used to change the datatype representa tion of a particular column in a table or to change the name of a column in a table or change a table name.

The data at each s ite can be subsetted based on content as well. For example, the replica can use a rule which specifies that only the employees for a particular division based on the department identifier column be contained within the table. Oracle Streams automatically manages the changes to ensure that the data within the replica matches the subset rule criteria.

Propagating Changes Over a Directed Network

Events in a staging area can be sent to staging areas in other databases. The directed network capability of Streams allows changes to be directed throu gh intermediate databases as a pass-through. Changes at any database can be published and propagated to or through other databases an ywhere on the network. By using the rules-based publish and subscribe capabilities of the staging area queues, database administrator s can choose which changes are propagated to each destination database and can specify the route messages traverse on their way to a destination.

Thus, for example, a company could configure replication to capture all changes to a particular schema, propagate only changes to European customers to their European headquarters in London, apply only those changes relevant to the London office, and forward site-specific information to be applied at each field office.

This directed network approach is also friendly to Wide Area Networks (WAN), enabling changes to subsequent destinations to traverse the network once to a single site for later fan-out to other destinations, rather than sending to each destination directly.

Resolving Conflicts and Applying Changes

Messages in a staging area can be consumed by an apply process, where the changes they represent are applied to database objects, or they can be consumed by an applica tion. User-defined apply procedures enable total control over the events to be applied.

Using custom apply, separate procedure s can be defined for handling each type of DML operation (inserts, updates, or deletes) on a table. For example, using this custom ap ply capability, a user could write a procedure to skip the apply of all deletes for the employees table, for employees w ith a salary greater than $100,000, based on a value for the employee in the salary table. Inserts and updates to the employees table would continue to be applied using the default apply engine, as would deletes for employees with salaries less than $100,000.

Custom apply also could be used to perform custom transformations of data. For example, changes to one tab le at the originating site might need to be applied to three different tables at the remote location.

The remote databases in a replication environment are fully open for read/write, and need not be identical copies of the source database. Because the remote database can be updated by other means, an apply process detects conflicts before changes are applied. These conflicts also can be au tomatically resolved using built-in or custom resolution mechanisms.

< div class="sect3">

Oracle Streams Advanced Queuing

Beyond database integration, Oracle Streams Advanced Queu ing provides many features that make it the most robust and feature rich message queuing system. These features improve developer pro ductivity and reduce the operational burden on administrators, which reduces the cost of building and maintaining Oracle-based distri buted applications. These features are described in the following sections.

Asynchronous Applicatio n Integration

Oracle Streams Advanced Queuing provides asynchronous integration of distributed applications. It offers several ways to enqueue messages. A capture process can capture the messages from redo logs implicitly, or applications and users ca n capture messages explicitly.Messages can be enqueued with delay and expiration. Delay allows an enqueued message to be visible at a later date. Advanced Queuing also supports several ways to order messages before consumption. It supports first-in first-out orderin g and priority-based ordering of messages.Advanced Queuing also offers multiple ways to consume a message. Automatic apply lets users invoke a user-specified action for the message. Consuming applications can dequeue a message explicitly. Both blocking and nonblocki ng dequeue is supported. The consuming applications can choose to receive notifications either procedurally using PL/SQL, OCI, or Jav a callback functions. Alternatively, they can get notifications in an email or by HTTP post. Consuming applications can also choose t o perform automatic apply.

Extensible Integration Architecture< /h5>

Oracle Streams Advanced Queuing offers an extensible framework for developing and integrating distributed applications. Many applications are integrated with a distributed hub and spoke model with the Oracle database server as the hub.

The distributed applications on an Oracle database communicate with queues in the same Oracle database server hub. Oracle's extensible framework let s multiple applications share the same queue, eliminating the need to add additional queues to support additional applications.

< p>Also, Advanced Queuing supports multiconsumer queues, where a single message can be consumed by multiple applications. As additiona l applications are added, these applications can coordinate business transactions using the same queues and even the same messages in the Oracle database server hub. It offers the benefits of extensibility without losing guaranteed once and only once delivery of a m essage.

Advanced Queuing supports a content-based publish and subscribe model, where applications publish messages and consume rs subscribe to the messages without knowledge of the publishing application. With such a model, it is possible to add consuming appl ications to a hub with no change required to existing applications.

If the distributed applications are running on different O racle databases, then business communications can be automatically propagated to the appropriate Oracle database. The propagation is managed automatically by the Oracle Streams Advanced Queuing system and is transparent to the application.

Heterogeneous Application Integration

Traditionally, different applications had t o use a common data model for communication. This data model was further restricted by the limited datatype support of the message-or iented middleware. Oracle Streams Advanced Queuing supports AnyData queues that can store messages of multiple datatypes.

Adva nced Queuing provides applications with the full power of the Oracle type system. It includes support for scalar datatypes such as nu mber, date, varchar, and so on, Oracle object types with inheritance, XMLType with additional operators for XML data, and AnyData sup port. In particular, with XMLType type support, application developers can make use of the full power of XML for extensibility and fl exibility in business communications.

Oracle Streams Advanced Queuing also offers transformation capabilities. Applications wi th different data models can transform the messages while dequeuing or enqueuing the messages to or from their own data model. These transformation mappings are defined as SQL expressions, which can involve PL/SQL functions, Java functions, or external C callouts.

Legacy Application Integration

The Oracle Messaging Gate way integrates Oracle database applications with other message queuing systems, such as Websphere MQ (formerly called MQ Series) and Tibco. Because many legacy applications on mainframes communicate with Websphere MQ, there is a need for integrating these applicatio ns into an Oracle environment. The message gateway makes non-Oracle message queues appear as if they were Oracle Streams queues, and automatically propagates messages between Oracle Streams queues and Websphere MQ or Tibco queues.

Distributed applications spa nning multiple partners can coordinate using the Internet access features of Oracle Streams Advanced Queuing. Using these features, a business partner or application securely can place an order into an advanced queuing queue over the Internet. Only authorized and au thenticated business partners can perform these operations.

Advanced Queuing Internet operations utilize an XML-based protocol over Internet transports, such as HTTP(S), allowing messages to flow through firewalls without compromising security. Supporting the Internet for communications drastically reduces the cost of communications, and thus the cost of the entire solution.

Standard-Based API Support

Oracle Streams Advanced Queuing supports i ndustry-standard APIs: SQL, JMS,and SOAP. Database changes made using SQL are captured automatically as messages.

Similarly, t he distributed messages and database changes can be applied to database tables, which can be seen using SQL. The messages can be enqu eued and dequeued using industry-standard JMS. Advanced queuing also has a SOAP-based XML API and supports OCI and OCCI to enqueue an d dequeue messages.


See Also:

Oracle Streams Concepts and Administration

See Also:< /font>

Oracle Streams Advanced Queuing User's Guide and Refe rence

Change Data Capture

Change data capture, a feature built on the Oracle Streams infrastructure, efficiently identifies and captures data that has been added to, up dated, or removed from Oracle relational tables, and it makes the change data available for use by ETL tools and applications. Using the change data capture capabilities of Oracle Streams, it quickly identifies and processes only the data that has changed, not entir e tables.

Heterogeneous Environments

Oracle Streams is an open information sharing solution, supporting heterogeneous replication between Oracle and non-Oracle systems. Using a transparent gateway, DML changes initiated at Oracle databases can be applied to non-Oracle databases.

To implement capture and apply of DML changes from an Oracle source to a non-Oracle destination, an Oracle system functions as a proxy and runs the apply process that would normally be running at an Oracle destination site. The Oracle system then communicates with the non-Oracle system with a transp arent gateway.

The changes are dequeued in an Oracle database itself and the local apply process applies the changes to a non- Oracle system across a network connection through a gateway.

Users who want to propagate changes from a non-Oracle database to an Oracle database write an application to capture the changes made to the non-Oracle database. The application can capture the chan ges by reading from transaction logs or by using triggers. The application is then responsible for assembling and ordering these chan ges into transactions, converting them into the Oracle defined logical change record (LCR) format, and publishing them into the targe t Oracle database staging area. These changes can be applied with a Streams apply process.

Oracle Streams Use Cases

Use Oracle Streams to create configurations that enable new classes of applications. In a ddition, all deployments and their associated metadata are compatible. For example, a replication installation easily can be extended to load a data warehouse or enable bi-directional replication—a complete reconfiguration is not required.

Suppose that a company uses Oracle Streams to maintain multiple copies of a corporate Web site for improved availability, scalability, and perfor mance. Additional requirements could include a reporting database containing the most current information for analysts in a company h eadquarters office in New York to perform ad-hoc querying as well as a disaster recovery database separately maintained from their Ne w York office. Additionally, updatable materialized views can be used to support the field sales staff. A final requirement is to sha re data with existing applications that are hosted on a Sybase database.

Figure 23-2 illustrates this S treams configuration.

Figure 23-2 Streams Configuration

Description of cncpt199.gif follows
Description of t he illustration cncpt199.gif

Oracle Streams is used to replicate data in an N-way c onfiguration consisting of three regional sites: New York, London, and Tokyo. At each of these sites, Streams log-based capture colle cts any changes that occur for subscribed tables in each local region, and stages them locally in the queue. All changes captured in each region are then forwarded to each of the other region's databases with the goal that all changes made at each database be reflec ted at every other database, providing complete data for the subscribed objects throughout the world.

Because the updates are applied automatically when received at each regional database, an Oracle Streams apply process is used to apply the changes. As chang es are applied, Oracle Streams checks for and resolves any conflicts that are detected. Streams also can be used to exchange data for particular tables with non-Oracle databases. Utilizing the Oracle Transparent Gateway for Sybase, a Streams apply process applies th e changes to a Sybase database using the same mechanisms as it does for Oracle databases.

The databases for reporting and disa ster recovery are hosted from the New York database site. The reporting database is a fully functional Oracle database that has a rea d-only copy of the relevant application tables. The reporting site is not configured to capture changes on these application tables. Streams imposes no restrictions on the configuration or usage of this reporting database.

The London site also serves as the m aster site for several updatable materialized view sites. Each salesperson receives an updatable copy of just the portion of the data that he requires. These sites typically only connect once a day to upload their orders and download any changes since their last ref resh.

Materialized Views

Oracle Streams is fully inter-operational with materialized views, or snapshots, which can be used to maintain updatable or read-only , point-in-time copies of data. They can be defined to contain a full copy of a table or a defined subset of the rows in the master t able that satisfy a value-based selection criterion. There can be multitier materialized views as well, where one materialized view i s based on another materialized view. Materialized views are periodically updated, or refreshed, from their associated master tables through transactionally consistent batch updates.

Read-only materialized views can be used to periodically propagate the updat ed product catalog to the various sales offices, because the product catalog is only updated at the headquarters location.

Bec ause materialized views do not require a dedicated connection, they are ideal for disconnected computing. For example, a company migh t choose to use updatable materialized views for the members of their sales force. A salesperson could enter orders into his or her l aptop throughout the day, then simply dial up the regional sales office at the end of the day to upload these changes and download an y updates.

Integrating Non-Oracle Systems

Oracle provides two solutions for integrating the Oracle database server with non-Oracle databases--Generic Connectivity and Transparent Gateways. These solutions enable Oracle clients to access non-Oracle data stores. They translate third party SQL dia lects, data dictionaries, and datatypes into Oracle formats, thus making the non-Oracle data store appear as a remote Oracle database . These technologies enable companies to integrate seamlessly the different systems and provide a consolidated view of the company as a whole.

Generic Connectivity and Oracle Transparent Gateways can be used for synchronous access, using distributed SQL. In a ddition, Oracle Transparent Gateways can be used for asynchronous access, using Oracle Streams. Introducing a Transparent Gateway int o an Oracle Streams environment enables replication of data from an Oracle database to a non-Oracle database.

Both Generic Con nectivity and Oracle Transparent Gateways transparently access data in non-Oracle systems from an Oracle environment. As with an Orac le distributed database environment, location transparency can be extended to objects residing in non-Oracle systems as well. Therefo re, users can create synonyms for the objects in the non-Oracle database and refer to them without having to specify its physical loc ation. This transparency eliminates the need for application developers to customize their applications to access data from different non-Oracle systems, thus decreasing development efforts and increasing the mobility of the application. Instead of requiring applica tions to interoperate with non-Oracle systems using their native interfaces (which can result in intensive application-side processin g), applications can be built upon a consistent Oracle interface for both Oracle and non-Oracle systems.

Generic Connectivity

Generic Connectivity is a generic solution that uses an ODBC or OLEDB driver to access an y ODBC or OLEDB compliant non-Oracle system. It provides data access to many data stores for which Oracle does not have a gateway sol ution. This enables transparent connectivity using industry standards, such as ODBC and OLEDB. Generic connectivity makes it possible to access low-end data stores, such as Foxpro, Access, dBase, and non-relational targets like Excel.