RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

Oracle 10g Data Mining faq

来源: 作者:unkonwn 时间:2004-12-10 点击:
General

What is Oracle Data Mining?
Oracle Data Mining is an option to Oracle 10g Database Enterprise Edition (EE) that embeds data mining functionality for making classifications, predictions, and associations, as well as extracting new features from data, clustering data and ranking relative attribute importance. All model-building and scoring functions are accessible through either a Java interface or PL/SQL interface. A graphical interface, the Oracle Data Miner, supports "point and click" data mining with the benefit of generating Java code to expedite application development.

What is the value propostion for data mining, and Oracle Data Mining?
Data mining has been proven in numerous vertical markets to reduce costs and increase profits when applied to specific business problems such as response modeling, customer attrition, fraud detection, etc. By its nature, data mining provides accurate, actionable, and timely information upon which to base business decisions. 数据挖掘研究院

Oracle Data Mining enables users to get more value from their data warehouse investment by providing application development features such as a Java and PL/SQL interfaces, as well as the Oracle Data Miner graphical interface. This enables users to quickly see return on investment (ROI) in production applications. Oracle Data Mining plays a key role in an overall business intelligence solution involving Oracle technology, consulting, and training.

What is the target market?
Oracle Data Mining addresses the needs of the data analyst as well as the application developer. ODM can be applied to a wide range of datasets (in terms of number of records and columns) and customer problems. It is especially suited for companies that have large data, are committed to the Oracle platform, and want to automate and operationalize their extraction of business intelligence. The initial end user is the data analyst and Java or PL/SQL application developer, although the end user of the application enhanced by data mining could be a customer service rep, marketing manager, customer, business manager, or just about any other imaginable user.

数据挖掘研究院

How does Oracle Data Mining fit into the Oracle strategy?
The Oracle 10g Database is positioned as a computational engine, an analytical platform, not just a repository for static data. Oracle Data Mining fits into Oracle′s strategy to derive additional value from your data and your investment in Oracle, as well as to simplify application development. Since data remains in the database, the process of data mining is simplified — all database features and product offerings are readily available.

Since the results of data mining are in the database, these results are available to any other user or application. Oracle Data Mining helps leverage your investment dollars by making the new business information available to everyone. 数据挖掘研究院

Why would a business benefit from using Oracle Data Mining?
Data mining can sift through massive amounts of data and find hidden information — valuable information that can help you better understand your customers and anticipate their behavior. Oracle Data Mining software helps you build applications to uncover this hidden information about your customers. Armed with this information, you can build a close relationship with and understand your customers, which helps you to:
数据挖掘研究院

  • Better retain customers and avoid churn
  • Profile customers and understand their behavior
  • Maintain and improve profit margins
  • Reduce customer acquisition costs
  • Target profitable customers with the right offer
Oracle Data Mining can also find patterns hidden in scientific, government, manufacturing, medical, and other types of data. Applications of data mining in these areas include:
  • Predicting the quality of a manufactured part
  • Finding associations between patients, drugs, and outcomes
  • Identifying possible network intrusions
Insights discovered by Oracle Data Mining can be revealing, significant, and valuable. 数据挖掘研究院

What are Oracle Data Mining′s competitive advantages?
Oracle Data Mining provides several distinctive competitive advantages: 

数据挖掘研究院

  1. Data Mining Embedded in Oracle 10g Database
    By being embedded in the Oracle 10g Database, Oracle Data Mining facilitates extracting business intelligence from large volumes of data for production applications. It eliminates off-loading data to external special-purpose analytic servers for data mining and the subsequent scoring of even larger volumes of data. The data, data preparation, data mining, and scoring all exist within the database, which greatly simplifies the data mining process.  In addition, overall application security is increased since data need never leave the secure database environment.

    Oracle Data Mining is available on all platforms supported by Oracle. This provides the widest range of platform support of any competitive data mining vendor. 

    数据挖掘研究院

    Oracle Data Mining can scale to the size of the problem by adding hardware or switching to more powerful platforms. Oracle Data Mining takes advantage of Oracle′s parallelism for faster computing by leveraging Oracle′s Real Application Clusters (RAC) technology.

    数据挖掘研究院

  2. Ability to Enhance Applications with Predictions and Insights
    Oracle Data Mining enables companies to systemize the extraction and integration of new business intelligence within their operations. Application developers can use Oracle Data Mining′s Java-based interface to add data mining insights and predictions to enhance business applications, such as Customer Relationship Management (CRM), Enterprise Resource Management (ERP), Web portals, and wireless applications. If customers wish to migrate their data mining application to another platform, their investment is preserved. 

    Rather than having special departments of advanced data analysts who work on ad hoc data mining projects, the true value of data mining is realized when the new insights and predictions are integrated and "operationalized" into existing business applications.  数据挖掘研究院

    Telecommunications companies, for example, can use Oracle Data Mining to build churn applications that identify customers that are likely to churn before they leave for a competitor. Oracle Data Mining′s predictions can be used to anticipate customer behavior and proactively manage them in mutually beneficial 1:1 relationships.

    数据挖掘研究院

  3. Programming Interfaces

    Application developers access Oracle Data Mining′s functionality through a Java-based or PL/SQL interface. Programmatic control of all data mining functions enables automation of data preparation, model-building, and model-scoring operations in production applications.

    Java Interface
    Java enables the development of platform independent applications and draws on SUN′s Java Community Process (JCP) for evolving and extending the language. Java-based applications can leverage the J2EE and J2SE platforms supported by a wide variety of vendors. Java is supported in the Oracle database which enables Java applications to run inside or outside the database. With the Oracle Application Server, developers can build web-based, distributed applications leveraging the J2EE platform within the Oracle product suite.

    The Java API allows application programmers to control all aspects of the data mining process, from data preparation and model building to model testing and data scoring. The interface can be used at two levels. For the data mining expert, the Java interface exposes algorithm-specific settings for advanced users. For data mining novices, default values are provided for nearly all settings to minimize the specification.

    Programmatic control extends from data preparation and model building to on-demand scoring of single records and batch scoring of large data sets. Batch scores may be stored in relational tables for access by other business applications (e.g. call centers or marketing campaign systems) or called "on-demand" in interactive applications where new information is collected and provided to the predictive model, i.e. real-time scoring.

    Oracle Data Mining′s interface provides an early look at concepts and approaches being proposed for the Java Data Mining (JDM) standard. Ultimately, Oracle Data Mining will comply with the standard after it is published.
    JDM is an emerging data mining standard, following SUN′s Java Community Process as a Java Specification Request (JSR). JDM has participation from Oracle, Sun, IBM, SAS, SPSS, and many other companies that recognize the need for a Java- based standard for specifying and using data mining. JDM leverages several other data mining standards, including Object Management Group′s Common Warehouse Metadata (CWM), the Data Mining Group′s Predictive Model Markup Language (PMML), and International Standards Organization′s SQL/MM for Data Mining. 数据挖掘研究院

    PL/SQL Interface
    PL/SQL enables database developers to seamlessly integrate data mining functionality with their database applications. Oracle database developers and users are able to perform data mining operations using
    a familiar language and development methodology. Mining operations are presented as primitive capabilities, clearly separating transformations from model building and apply. PL/SQL introduces new capabilities not present in the Java API including native export/import of all supported models between ODM databases and advanced model evaluation using receiver operating characteristics (ROC).


  4. Well-integrated with other Oracle Products

    By virtue of being in the
    Oracle 10g database, data preparation can be performed in Oracle 10g Warehouse Builder and deployed as part of an overall data mining application. Users can take advantage of key analytical and statistical capabilties in the Oracle 10g database to analyze data and results. 数据挖掘研究院

    Oracle 10g Data Mining is fully integrated with Oracle Universal Installer and RDBMS upgrade/downgrade process. This reduces the overhead associated with migrating applications between releases of non-Oracle data mining products.

    Oracle10g Data Mining is integrated with Oracle 11i Applications Suite, such as Oracle Marketing. 

What are some typical applications that could be enhanced by Oracle Data Mining?
Oracle Data Mining can automate the extraction and integration of new insight and predictions into a variety of business applications, including call centers, Web sites, campaign management systems, ATMs, enterprise resource management (ERM), and other operational and business planning applications.
数据挖掘研究院


数据挖掘实验室

Features

What features are new in Oracle Data Mining 10g?

What mining capabilities does Oracle Data Mining support?
Oracle Data Mining provides programmatic access to six data mining algorithms embedded in Oracle Database. Data mining algorithms are machine-learning techniques for analyzing data for specific categories of problems. Different algorithms excel at different types of analysis. 

Classification: Oracle Data Mining′s Classification algorithms can predict binary or multi-class outcomes. In binary problems, each record either will or will not exhibit the modeled behavior. For example, a model could be built to predict whether a customer will churn or remain loyal. These algorithms can also make predictions for multi-class problems where there are several possible outcomes. For example, a model could be built to predict which class of service will be preferred by each prospect. 数据挖掘实验室

Binary model example:
Q: Is this customer likely to become a high-profit customer?
A: Yes, with 85% probability
数据挖掘研究院

Multi-class model example:
Q: Which one of five customer segments is this customer most likely to fit into — Grow, Stable, Defect, Decline, or Insignificant?
A: Stable, with 55% probability

Algorithm options: Support Vector Machines (SVM), Naive Bayes, and Adaptive Bayes Networks (ABN)
数据挖掘实验室

Regression: Regression creates predictive models. The difference between regression and classification is that regression deals with numerical/continuous target attributes, whereas classification deals with discrete/categorical target attributes. In other words, if the target attribute contains continuous (floating-point) values, a regression technique is required. If the target attribute contains categorical (string or discrete integer) values, a classification technique is called for. 数据挖掘研究院

The most common form of regression is linear regression, in which a line that best fits the data is calculated, that is, the line that minimizes the average distance of all the points from the line. 数据挖掘研究院

This line becomes a predictive model when the value of the dependent variable is not known; its value is predicted by the point on the line that corresponds to the values of the independent variables for that record. Oracle Data Mining provides both linear and non-linear regression models.

Algorithm options: Support Vector Machines (SVM)

Association Rules: Association Rules detect "associated" or co-occurring events hidden in data. Association analysis, or unsupervised learning, is often used to find popular bundles (e.g. market basket analysis) of products that are related for customers, such as "milk" and "cereal" being associated with "bananas." Oracle Data Mining′s Association Rules can be used to identify co-occurring items or events in a variety of business problems, such as:
数据挖掘研究院

Clustering: Oracle Data Mining provides two Clustering algorithms for the segmentation of individuals (cases) in a dataset. Both the enhanced version of K-Means and the proprietary O-Cluster algorithms produce clusters with descriptive rules for membership. Clusters are organized hierarchically with histograms provided for each attribute in a cluster. 数据挖掘研究院

Algorithm options: Enhanced K-Means and O-Cluster

数据挖掘研究院

Attribute Importance: Attribute Importance measures the predictive power of each attribute in classifying the target values and produces a list of attributes ranked by relative importance. This information can be used to reduce the size of input data, increasing the speed of mining tasks.

Algorithm options: Minimum Description Length (MDL)

Feature Extraction:  ODM Feature Extraction creates a new set of features by decomposing the original data. Feature extraction lets you describe the data with a number of features far smaller than the number of original dimensions (attributes). A feature is a combination of attributes in the data that is of special interest and captures important characteristics of the data.

Some applications of feature extraction are latent semantic analysis, data compression, data decomposition and projection, and pattern recognition. Feature extraction can also be used to enhance the speed and effectiveness of supervised learning.
数据挖掘研究院

For example, feature extraction can be used to extract the themes of a document collection, where documents are represented by a set of key words and their frequencies. Each theme (feature) is represented by a combination of keywords. The documents in the collection can then be expressed in terms of the discovered themes.

Algorithm options: Non-negative Matrix Factorization (NMF)

Text Mining: Text mining is conventional data mining done using "text features." Text features are usually keywords, frequencies of words, or other document-derived features. Once you derive text features, you mine them just as you would any other data. Some of the applications for text mining include
Algorithm options: SVM and NMF in both Java and PL/SQL interfaces; also Association and k-Means in PL/SQL interface

数据挖掘研究院

Does Oracle Data Mining support RAC and the GRID?
Yes. ODM runs on RAC taking advantage of RAC in two ways: individual jobs (mining tasks) can be distributed in parallel across RAC nodes automatically, in addition, several of the algorithms execute in parallel to the extent the database  processes queries in parallel using RAC. Several of the algorithms, e.g., NB and ABN, are written as SQL queries. When the database executes these queries, they are automatically able to leverage RAC according to standard database parallelism. Kmeans and OCluster do not leverage parallelism for model build, but do leverage parallelism for scoring.


The new 10g algorithms, SVM and NMF, are written as C table functions, and are not yet parallel.

Does Oracle Data Mining support PMML? 数据挖掘研究院
Oracle Data Mining supports the Data Mining Group′s Predictive Model Markup Language (PMML) for Naive Bayes and Association Rules models. Users can import and export models from these algorithms between Oracle database instances. Oracle Data Mining can import PMML from other vendors for these algorithms provided no vendor-proprietary extensions have been used.

数据挖掘研究院

Can ODM import PMML models from other vendors?
If other vendors generate standard core PMML for Naive Bayes and Association Rules, ODM should be able to import these models. PMML defines numerous optional capabilities which can make it difficult to ensure a given vendor′s PMML can be consumed. Some vendors use proprietary extensions to PMML which makes the models impossible to import.

What can be done with code generated from Oracle Data Miner (ODMr)?
A JDeveloper extension can be downloaded from the ODMr page of OTN and added to the JDeveloper 10g environment for the purpose of accessing the Java code associated with the data mining operations of ODMr. For example, a model is built and applied in ODMr - the Java program that applies the model is generated by JDeveloper.

Where can this program execute, what is the normal execution environment?

Type Description
Manual Manually run the scoring code in JDeveloper each time to overwrite previous result values
Java Stored Procedure Push the Java code into the database and create a SQL wrapper so the code can be called by other applications, SQL*Plus etc.
Include in OWB 数据挖掘研究院
Create wrapper as in #2 and then include in an OWB flow (e.g. to load a warehouse dimension)
Generalize Scoring
Generalize the Java and/or the SQL Wrapper to include parameter control for output table name, scoring options etc.

Does Oracle Data Mining support neural networks?
Although ODM does not support Neural Networks explicitly, Support Vector Machines (SVM) in 10g can act as a superset of Neural Networks and a much superior one at that. SVMs work with high dimensional data, they generalize better, and are easier to tune and train. You get no overfitting, no early stopping, and no voting. SVMs can apply to the same class of problems as neural networks. Generally, SVM kernels map to activation functions and support vectors to nodes.

Graphical Interface

What is the Oracle Data Miner?

数据挖掘研究院
Interfaces

What programming interfaces does Oracle Data Mining provide?
Oracle Data Mining provides two programming language interfaces: Java and PL/SQL. See
APIs.
数据挖掘研究院

How does the PL/SQL interface differ from the Java interface?
See Appendix A of the ODM Concepts Guide.

When will Oracle Data Mining support the Java Data Mining (JSR-73) standard interface?
Oracle Data Mining will provide a JDM interface in the next available release, which is currently 10gR2.

What will happen to the 10g Java interface?
The ODM Java interface in 10gR1 will be replaced by the Java Data Mining (JSR-73) standard interface in 10gR2. As such the 10gR1 Java interface is being desupported in 10gR1 in preparation for the new interface.

Why is the 10g Java interface being desupported?
The goal is to avoid having two Java interfaces in the product for code complexity, support, and maintenance issues, as well as confusion in the marketplace, and among customers / developers. We also want to avoid having one of the interfaces (ODM Java) not be interoperable with the other two (JDM and Pl/SQL). JDM and Pl/SQL will be fully interoperable.

Migration

Can users of ODM 9i migrate to the ODM 10g PL/SQL interface?

The PL/SQL interface is new in 10g and leverages a different repository than that for the 9i and 10g Java interface. As such, ODM 9i models can only be migrated to the ODM 10g Java repository. Models created via the Java interface are not interoperable with the PL/SQL interface in 10g. 数据挖掘实验室

Users of the PL/SQL interface need to redefine the Java objects as required for the PL/SQL interface and rebuild models, or recreate results.

ODM 9i customers can migrate models created with the Java interface to ODM 10g and continue to use the Java interface. However, this Java interface will be replaced with the JDM interface in 10gR2.

What is the migration strategy for users of the 10gR1 Java interface in 10gR2?

The 10gR1 ODM Java API will be desupported in 12 months. This is motivated by the Oracle-led Java Data Mining (JDM) standard (JSR-73), which will be available in 10gR2 replacing the current Java API. As such, the current Java API will not exist in any way in 10gR2.

The JDM API will be implemented as a layer on top of the 10g ODM PL/SQL API. This has the benefit of interoperability between the Java and PL/SQL interfaces. With 10gR1, the Java API and PL/SQL API are not interoperable, i.e., a model created in Java cannot be used in PL/SQL and vice versa. This resulted from our efforts to integrate data mining more tightly with the core RDBMS. At present, there is no utility to migrate Java-produced models to PL/SQL models. This is due to significant changes in the metadata and automated transformations which make this rather intractable.

In general, applications periodically need to refresh their models as models grow stale over time. In converting to 10gR2, applications would have to change code to use the JDM or PL/SQL APIs and rebuild their models.

To mitigate the migration problems, applications new to 10g can use the PL/SQL API as this will be supported in 10gR2 along with model migration. Later, the user can determine if the application should be converted to the new JDM API where PL/SQL-generated models will still be usable. For ODM 9i applications, users can continue to use the 10gR1 ODM Java API which supports migration from ODM 9i, however, moving to 10gR2 will require code changes and rebuilding of models as noted above. If Java is the only option and a customer does not want to rework applications, the customer can wait until 10gR2 and JDM.

数据挖掘研究院
System

What platforms does Oracle Data Mining run on?
All platforms supported by Oracle, including Windows, Solaris, HP-UX, IBM AIX, Compaq Tru64, and Linux.

What are the system requirements to run Oracle Data Mining?
Oracle Data Mining runs in the Oracle 10g Database on all supported platforms. Oracle Partitioning is recommended for large data mining problems.
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?