RSS
热门关键字:  数据挖掘  数据仓库  人工智能  搜索引擎  数据挖掘导论

Oracle Data Mining Features

来源: 作者:互联网作品 时间:2007-02-26 点击:

What features are new in Oracle Data Mining 10g?

What mining capabilities does Oracle Data Mining support?
Oracle Data Mining provides programmatic access to six data mining algorithms embedded in Oracle Database. Data mining algorithms are machine-learning techniques for analyzing data for specific categories of problems. Different algorithms excel at different types of analysis. 

Classification: Oracle Data Mining's Classification algorithms can predict binary or multi-class outcomes. In binary problems, each record either will or will not exhibit the modeled behavior. For example, a model could be built to predict whether a customer will churn or remain loyal. These algorithms can also make predictions for multi-class problems where there are several possible outcomes. For example, a model could be built to predict which class of service will be preferred by each prospect. 数据挖掘研究院

Binary model example:
Q: Is this customer likely to become a high-profit customer?
A: Yes, with 85% probability

数据挖掘研究院

Multi-class model example:
Q: Which one of five customer segments is this customer most likely to fit into — Grow, Stable, Defect, Decline, or Insignificant?
A: Stable, with 55% probability

Algorithm options:
Support Vector Machines (SVM), Naive Bayes, and Adaptive Bayes Networks (ABN)

Regression: Regression creates predictive models. The difference between regression and classification is that regression deals with numerical/continuous target attributes, whereas classification deals with discrete/categorical target attributes. In other words, if the target attribute contains continuous (floating-point) values, a regression technique is required. If the target attribute contains categorical (string or discrete integer) values, a classification technique is called for.

数据挖掘研究院

The most common form of regression is linear regression, in which a line that best fits the data is calculated, that is, the line that minimizes the average distance of all the points from the line.

This line becomes a predictive model when the value of the dependent variable is not known; its value is predicted by the point on the line that corresponds to the values of the independent variables for that record. Oracle Data Mining provides both linear and non-linear regression models.

Algorithm options: Support Vector Machines (SVM)

Association Rules: Association Rules detect "associated" or co-occurring events hidden in data. Association analysis, or unsupervised learning, is often used to find popular bundles (e.g. market basket analysis) of products that are related for customers, such as "milk" and "cereal" being associated with "bananas." Oracle Data Mining's Association Rules can be used to identify co-occurring items or events in a variety of business problems, such as:
数据挖掘研究院

Clustering: Oracle Data Mining provides two Clustering algorithms for the segmentation of individuals (cases) in a dataset. Both the enhanced version of K-Means and the proprietary O-Cluster algorithms produce clusters with descriptive rules for membership. Clusters are organized hierarchically with histograms provided for each attribute in a cluster. 数据挖掘研究院

Algorithm options:
Enhanced K-Means and O-Cluster 数据挖掘研究院

Attribute Importance: Attribute Importance measures the predictive power of each attribute in classifying the target values and produces a list of attributes ranked by relative importance. This information can be used to reduce the size of input data, increasing the speed of mining tasks.

Algorithm options: Minimum Description Length (MDL)

Feature Extraction:  ODM Feature Extraction creates a new set of features by decomposing the original data. Feature extraction lets you describe the data with a number of features far smaller than the number of original dimensions (attributes). A feature is a combination of attributes in the data that is of special interest and captures important characteristics of the data.

Some applications of feature extraction are latent semantic analysis, data compression, data decomposition and projection, and pattern recognition. Feature extraction can also be used to enhance the speed and effectiveness of supervised learning.

数据挖掘研究院



For example, feature extraction can be used to extract the themes of a document collection, where documents are represented by a set of key words and their frequencies. Each theme (feature) is represented by a combination of keywords. The documents in the collection can then be expressed in terms of the discovered themes.

Algorithm options: Non-negative Matrix Factorization (NMF)

Text Mining: Text mining is conventional data mining done using "text features." Text features are usually keywords, frequencies of words, or other document-derived features. Once you derive text features, you mine them just as you would any other data. Some of the applications for text mining include
Algorithm options: SVM and NMF in both Java and PL/SQL interfaces; also Association and k-Means in PL/SQL interface

数据挖掘研究院

Does Oracle Data Mining support RAC and the GRID?
Yes. ODM runs on RAC taking advantage of RAC in two ways: individual jobs (mining tasks) can be distributed in parallel across RAC nodes automatically, in addition, several of the algorithms execute in parallel to the extent the database  processes queries in parallel using RAC. Several of the algorithms, e.g., NB and ABN, are written as SQL queries. When the database executes these queries, they are automatically able to leverage RAC according to standard database parallelism. Kmeans and OCluster do not leverage parallelism for model build, but do leverage parallelism for scoring.


The new 10g algorithms, SVM and NMF, are written as C table functions, and are not yet parallel.

Does Oracle Data Mining support PMML?

数据挖掘研究院


Oracle Data Mining supports the Data Mining Group's Predictive Model Markup Language (PMML) for Naive Bayes and Association Rules models. Users can import and export models from these algorithms between Oracle database instances. Oracle Data Mining can import PMML from other vendors for these algorithms provided no vendor-proprietary extensions have been used.

Can ODM import PMML models from other vendors?
If other vendors generate standard core PMML for Naive Bayes and Association Rules, ODM should be able to import these models. PMML defines numerous optional capabilities which can make it difficult to ensure a given vendor's PMML can be consumed. Some vendors use proprietary extensions to PMML which makes the models impossible to import.
数据挖掘研究院

What can be done with code generated from Oracle Data Miner (ODMr)?
A JDeveloper extension can be downloaded from the ODMr page of OTN and added to the JDeveloper 10g environment for the purpose of accessing the Java code associated with the data mining operations of ODMr. For example, a model is built and applied in ODMr - the Java program that applies the model is generated by JDeveloper.

Where can this program execute, what is the normal execution environment?

Type Description
Manual Manually run the scoring code in JDeveloper each time to overwrite previous result values
Java Stored Procedure Push the Java code into the database and create a SQL wrapper so the code can be called by other applications, SQL*Plus etc.
Include in OWB 数据挖掘实验室
Create wrapper as in #2 and then include in an OWB flow (e.g. to load a warehouse dimension)
Generalize Scoring
Generalize the Java and/or the SQL Wrapper to include parameter control for output table name, scoring options etc.

Does Oracle Data Mining support neural networks?
Although ODM does not support Neural Networks explicitly, Support Vector Machines (SVM) in 10g can act as a superset of Neural Networks and a much superior one at that. SVMs work with high dimensional data, they generalize better, and are easier to tune and train. You get no overfitting, no early stopping, and no voting. SVMs can apply to the same class of problems as neural networks. Generally, SVM kernels map to activation functions and support vectors to nodes. 数据挖掘实验室

Graphical Interface

What is the Oracle Data Miner?

Interfaces

What programming interfaces does Oracle Data Mining provide?
Oracle Data Mining provides two programming language interfaces: Java and PL/SQL. See APIs.
数据挖掘研究院

How does the PL/SQL interface differ from the Java interface?
See Appendix A of the ODM Concepts Guide.

When will Oracle Data Mining support the Java Data Mining (JSR-73) standard interface?
Oracle Data Mining will provide a JDM interface in the next available release, which is currently 10gR2.

What will happen to the 10g Java interface?
The ODM Java interface in 10gR1 will be replaced by the Java Data Mining (JSR-73) standard interface in 10gR2. As such the 10gR1 Java interface is being desupported in 10gR1 in preparation for the new interface.

数据挖掘研究院



Why is the 10g Java interface being desupported?
The goal is to avoid having two Java interfaces in the product for code complexity, support, and maintenance issues, as well as confusion in the marketplace, and among customers / developers. We also want to avoid having one of the interfaces (ODM Java) not be interoperable with the other two (JDM and Pl/SQL). JDM and Pl/SQL will be fully interoperable.

Migration

Can users of ODM 9i migrate to the ODM 10g PL/SQL interface?

The PL/SQL interface is new in 10g and leverages a different repository than that for the 9i and 10g Java interface. As such, ODM 9i models can only be migrated to the ODM 10g Java repository. Models created via the Java interface are not interoperable with the PL/SQL interface in 10g. 数据挖掘研究院

Users of the PL/SQL interface need to redefine the Java objects as required for the PL/SQL interface and rebuild models, or recreate results.

ODM 9i customers can migrate models created with the Java interface to ODM 10g and continue to use the Java interface. However, this Java interface will be replaced with the JDM interface in 10gR2.

What is the migration strategy for
users of the 10gR1 Java interface in 10gR2?

The 10gR1 ODM Java API will be desupported in 12 months. This is motivated by the Oracle-led Java Data Mining (JDM) standard (JSR-73), which will be available in 10gR2 replacing the current Java API. As such, the current Java API will not exist in any way in 10gR2.

The JDM API will be implemented as a layer on top of the 10g ODM PL/SQL API. This has the benefit of interoperability between the Java and PL/SQL interfaces. With 10gR1, the Java API and PL/SQL API are not interoperable, i.e., a model created in Java cannot be used in PL/SQL and vice versa. This resulted from our efforts to integrate data mining more tightly with the core RDBMS. At present, there is no utility to migrate Java-produced models to PL/SQL models. This is due to significant changes in the metadata and automated transformations which make this rather intractable. 数据挖掘研究院

In general, applications periodically need to refresh their models as models grow stale over time. In converting to 10gR2, applications would have to change code to use the JDM or PL/SQL APIs and rebuild their models.

To mitigate the migration problems, applications new to 10g can use the PL/SQL API as this will be supported in 10gR2 along with model migration. Later, the user can determine if the application should be converted to the new JDM API where PL/SQL-generated models will still be usable. For ODM 9i applications, users can continue to use the 10gR1 ODM Java API which supports migration from ODM 9i, however, moving to 10gR2 will require code changes and rebuilding of models as noted above. If Java is the only option and a customer does not want to rework applications, the customer can wait until 10gR2 and JDM.

数据挖掘研究院


System

What platforms does Oracle Data Mining run on?
All platforms supported by Oracle, including Windows, Solaris, HP-UX, IBM AIX, Compaq Tru64, and Linux.

What are the system requirements to run Oracle Data Mining?
Oracle Data Mining runs in the Oracle 10g Database on all supported platforms. Oracle Partitioning is recommended for large data mining problems.
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?