| Oracle®
; Database Data Warehousing Guide 10g Release 1 (10.1) Part Number B10736-01 |
|
|
View PDF |
In large data warehouse environments, many different types of analysis can occur. In addition to SQL queries, you ma y also apply more advanced analytical operations to your data. Two major types of such analysis are OLAP (On-Line Analytic Processing ) and data mining. Rather than having a separate OLAP or data mining engine, Oracle has integrated OLAP and data mining capabilities directly into the database server. Oracle OLAP and Oracle Data Mining (ODM) are options to the Oracle Database. This chapter provides a brief introduction to these technologies, and more detail can be found in these products' respective documentation.
The fol lowing topics provide an introduction to Oracle's OLAP and data mining capabilities:
|
See Also: Oracle OLAP Application Developer's Guide for further information regarding OLAP and Oracle Data Mi ning documentation for further information regarding data mining |
Oracle Database OLAP adds the query performance and calculation capability previously found only in multidimensional databases to Oracle's relationa l platform. In addition, it provides a Java OLAP API that is appropriate for the development of internet-ready analytical application s. Unlike other combinations of OLAP and RDBMS technology, Oracle Database OLAP is not a multidimensional database using bridges to m ove data from the relational data store to a multidimensional data store. Instead, it is truly an OLAP-enabled relational database. A s a result, this release provides the benefits of a multidimensional database along with the scalability, accessibility, security, ma nageability, and high availability of the Oracle Database. The Java OLAP API, which is specifically designed for internet-based analy tical applications, offers productive data access. See Oracle OLAP Application Developer's Guide for further information regarding OLAP.
Basi ng an OLAP system directly on the Oracle server offers the following benefits:
Oracle Database OLAP is highly scalable. In today 's environment, there is tremendous growth along three dimensions of analytic applications: number of users, size of data, complexity of analyses. There are more users of analytical applications, and they need access to more data to perform more sophisticated analys is and target marketing. For example, a telephone company might want a customer dimension to include detail such as all telephone num bers as part of an application that is used to analyze customer turnover. This would require support for multi-million row dimension tables and very large volumes of fact data. Oracle Database can handle very large data sets using parallel execution and partitioning , as well as offering support for advanced hardware and clustering.
Oracle Database includes many features that support high availability. One of the most significant is parti tioning, which allows management of precise subsets of tables and indexes, so that management operations affect only small pieces of these data structures. By partitioning tables and indexes, data management processing time is reduced, thus minimizing the time data is unavailable. Another feature supporting high availability is transportable tablespaces. With transportable tablespaces, large data sets, including tables and indexes, can be added with almost no processing to other databases. This enables extremely rapid data loa ding and updates.
Oracle enables you to p recisely control resource utilization. The Database Resource Manager, for example, provides a mechanism for allocating the resources of a data warehouse among different sets of end-users. Consider an environment where the marketing department and the sales departmen t share an OLAP system. Using the Database Resource Manager, you could specify that the marketing department receive at least 60 perc ent of the CPU resources of the machines, while the sales department receive 40 percent of the CPU resources. You can also further sp ecify limits on the total number of active sessions, and the degree of parallelism of individual queries for each department.
Another resource management facility is the progress monitor, which gives end users and administrators the status of long-running ope rations. Oracle Database 10g maintains statistics describing the percent-complete of these operations. Oracle Enterprise Man ager enables you to view a bar-graph display of these operations showing what percent complete they are. Moreover, any other tool or any database administrator can also retrieve progress information directly from the Oracle data server, using system views.
Oracle provides a server-managed infrastruct ure for backup, restore, and recovery tasks that enables simpler, safer operations at terabyte scale. Some of the highlights are:
Details related to backup, restore, and recovery operations are maintained by the server in a recovery cata log and automatically used as part of these operations. This reduces administrative burden and minimizes the possibility of human err ors.
Backup and recovery operations are fully integrated with partitioning. Individual partitions, when placed in their own tablespaces, can be backed up and restored independently of the other partitions of a table.
Oracle includes support for incremental backup and recovery using Recovery Manager, enabling operations to be completed effi ciently within times proportional to the amount of changes, rather than the overall size of the database.
Just as the demands of real-world transaction processing required Oracle to develop robust features for scalability, manageabilit y and backup and recovery, they lead Oracle to create industry-leading security features. The security features in Oracle have reache d the highest levels of U.S. government certification for database trustworthiness. Oracle's fine grained access control feature, ena bles cell-level security for OLAP users. Fine grained access control works with minimal burden on query processing, and it enables ef ficient centralized security management.
Oracle Data Mining uses data mining algorithms to sift thro ugh the large volumes of data generated by businesses to produce, evaluate, and deploy predictive and descriptive models. It also enr iches mission critical applications in CRM, manufacturing control, inventory management, customer service and support, Web portals, w ireless devices and other fields with context-specific recommendations and predictive monitoring of critical processes. ODM delivers real-time answers to questions such as:
Which items is a person most likely to buy or like?
What is the likelihood that this product will be returned for repair?
What is the likel ihood that this person poses a credit risk?
Oracle Data Mining enables data mining inside the database for performan ce and scalability. Some of the capabilities are:
Java and PL/SQL interfaces that provide programmatic c ontrol and application integration
Several algorithms:
Classification: Naiv e Bayes, Adaptive Bayes Network, Support Vector Machine
Regression: Support Vector Machine
Clustering: k-Means, O-Cluster
Association: Apriori
Attribu te Importance: Predictor Variance
Feature Extraction: Non-Negative Matrix Factorization
Real-time and batch scoring modes
Oracle Data Mining also supports sequence similarity search and annotation (BLAST) in the database.
Oracle Da ta Mining provides a Java API and PL/SQL packages to exploit the data mining functionality that is embedded in the Oracle database. p>
By delivering complete programmatic control of data mining in the database, Oracle Data Mining (ODM) delivers powerful, scalabl e modeling and real-time scoring. This enables businesses to incorporate rules, predictions, and classifications in all processes and decision points throughout the business cycle.
ODM is designed to meet the challenges of vast amounts of data, delivering acc urate insights completely integrated into e-business applications. This integrated intelligence enables the automation and decision s peed that e-businesses require to compete today.
Oracle Data Mining performs all phases of data mining within the database. In each data mining phase, this arch itecture results in significant improvements including performance, automation, and integration.
Performing data mining in the database has the following benefits:
All phases of data mining take place in the database:
All data preparation occurs in the database
The data that is mined remains in the datab ase
The models produced by mining reside in the database
Scoring occurs in the database along with results immediately available as tables
Data mining automatically inhe rits important database features, including:
Scalability
Availability
li>Manageability
Backup and recovery
Security
Data preparation usually requires the creation of new t ables or views based on existing data. Both options perform faster than moving data to an external data mining utility and offer the programmer the option of snapshots or real-time updates.
Oracle Data Mining provides utilities for complex, data mining-specif ic tasks. For example, for some types of models, binning improves model build time and model performance; therefore, ODM provides a u tility for user-defined binning.
ODM accepts data in either non-transactional (single-record case) format or transactional (mu lti-record case) format. ODM provides a pivoting utility for converting multiple non-transactional tables into a single transactional table.
ODM data exploration and model evaluation features are extended by Oracle's statistical functions and OLAP capabilitie s. Because these also operate within the database, they can all be incorporated into a seamless application that shares database obje cts. This allows for more functional and faster applications.
Oracle Data Mining supports all the major data mining functions: classification, regression, association rules, clustering, attribute importance, and feature extraction.
These algorithms address a broad spectrum of business problems, ran ging from predicting the future likelihood of a customer purchasing a given product, to understand which products are likely to be pu rchased together in a single trip to the grocery store. Since all model building takes place inside the database, the data never need s to move outside the database, and therefore the entire data-mining process is accelerated.
Models are stored in the database and are directly accessible for evaluation, reporting, and further analysis by a wide variety of tools and application functions. ODM provides APIs for calculating confusion mat rix and lift charts. ODM stores the models, the underlying data, and the results of model evaluation together in the database to enab le further analysis, reporting, and application-specific model management.
Oracle Data Mining provides both batch and real-time scoring. In batch mode, ODM takes a ta ble as input. It scores every record, and returns a scored table as a result. In real-time mode, parameters for a single record are p assed in and the scores are returned in a Java object.
In both modes, ODM can deliver a variety of scores. It can return a rat ing or probability of a specific outcome. Alternatively it can return a predicted outcome and the probability of occurrence of the ou tcome.
ODM provides Java an d PL/SQL interfaces for data mining. These interfaces make is possible to embed data mining in applications.
The ODM Java API allows programmers to develop data mining applications or tools in the J2SE/J2EE environ ment. The API defines a set of classes that can be used to develop a complete data mining solution. The API has built-in data mining metadata management and provides an infrastructure to build data mining applications easily. The API supports importing and exporting of PMML models for Naive Bayes and Association Rules models. The API supports asynchronous execution of mining operations and provid es mechanisms to retrieve state transition information for running or completed operations. The API supports real-time scoring for al l the supervised models and clustering models. This API supports text mining.
The ODM Java API provides flexible data preparat ion options. Applications can use either automated data preparation, or they perform data processing using external transformations d efined as utility methods in the oracle.dmt.odm.transformation.Transformation class. In addition, applications can embed externals tr ansformation details in the logical data specification as an input for the build operation, then the system will persist the details with the model and perform the embedded transformations in the future apply and test operations.
The ODM Java API design refle cts concepts present in the emerging Java standard (JSR-73) for Data Mining, which is being developed through the Java Community Proc ess.
The following supplied packages support data mining in PL/SQL program s:
DBMS_DATA_MINING
DBMS_DATA_MINING_TRANSFORM
DBMS_DATA_MINING provides support for in-database data mining. This package can be used to build and test
models and to apply models to new data (scoring). The package provides the basic building blocks for data mining, along with utiliti
es and functions to inspect models and their results. The package also supports export and import of native models from a user's sche
ma or database instance.
DBMS_DATA_MINING_TRANSFORM, a complementary package, provides support for popular data t
ransformations such as numerical and categorical binning and linear and z-score normalization. The DBMS_DATA_MINING_TRANSFORM
code> package is open source in nature, in that the package code is distributed with the product, so that users can study the utility
routines and learn how to define their own data transformations using Oracle SQL and PL/SQL scripting.
ODM also supports specialized sequence matching and an notation algorithms. In life sciences, vast quantities of data including nucleotide and amino acid sequences are stored, typically, i n a database. This sequence data help biologists determine the chemical structure, biological function, and evolutionary history of o rganisms.
A version of BLAST, based on NCBI BLAST 2.0, has been implemented in the Oracle Database using table functions. This enables users to perform BLAST queries against data in an Oracle database.