Predictive Model Markup Language

Predictive Model Markup Language

The Predictive Model Markup Language (PMML) is an XML-based language developed by the Data Mining Group (DMG) which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications.

PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor's application, and use other vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.

Since PMML is an XML based standard, the specification comes in the form of an XML Schema.

PMML Components

PMML follows a very intuitive structure to describe a data mining model, be it an artificial neural network or a logistic regression model. Sequentially, it can be described by the following components:

* Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation.

* Data Dictionary: contains definitions for all the possible fields used by the model. It is in the data dictionary that a field is defined as continuous, categorical, or ordinal (attribute optype). Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double).

* Model: contains the definition of the data mining model. A multi-layered feedforward neural network is the most common neural network representation in contemporary applications, given the popularity and efficacy associated with its training algorithm known as Backpropagation. Such a network is represented in PMML by a "NeuralNetwork" element which contains attributes such as:
** Model Name (attribute modelName)
** Function Name (attribute functionName)
** Algorithm Name (attribute algorithmName)
** Activation Function (attribute activationFunction)
** Number of Layers (attribute numberOfLayers)This information is then followed by three kinds of neural layers which specify the architecture of the neural network model being represented in the PMML document. These attributes are NeuralInputs, NeuralLayer, and NeuralOutputs. Besides neural networks, PMML allows for the representation of many other data mining models including support vector machines, association rules, naive bayes classifier, clustering models, text models, decision trees, and different regression models.

* Mining Schema: the mining schema lists all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as:
** Name (attribute name): must refer to a field in the data dictionary
** Usage type (attribute usageType): defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model.
** Outlier Treatment (attribute outliers): defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values (based on the definition of high and low values for a particular field), or as is.
** Missing Value Replacement Policy (attribute missingValueReplacement): if this attribute is specified then a missing value is automatically replaced by the given values.
** Missing Value Treatment (attribute missingValueTreatment): indicates how the missing value replacement was derived (e.g. as value, mean or median).
* Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations.
** Normalization: map values to numbers, the input can be continuous or discrete.
** Discretization: map continuous values to discrete values.
** Value mapping: map discrete values to discrete values.
** Functions: derive a value by applying a function to one or more parameters.
** Aggregation: used to summarize or collect groups of values.

PMML Products

A range of products are being offered to produce and consume PMML:
* [http://www-306.ibm.com/software/data/db2/warehouse/ IBM DB2 Data Warehouse Edition] : produces PMML 3.0 and 3.1 for sequences only models. Consumes (scores and visualizes) PMML 3.1 and earlier.
* [http://rattle.togaware.com/ Rattle] : Uses the R programming language to build several predictive models. It offers a PMML package to export models built in R to PMML 3.2. This package includes export support for support vector machines, linear regression, and binary logistic regression models.
* [http://www.salford-systems.com/cart.php Salford-Systems CART] : a decision tree system that produces PMML 3.1.
* [http://www.sas.com/technologies/analytics/datamining/miner/ SAS Enterprise Miner] : produces PMML 2.1 for several mining models, including linear regression, logistic regression, decision trees, neural networks, K-means clustering, and associations.
* [http://www.spss.com/spss/family.cfm SPSS] : produces and scores PMML 3.1.
* [http://www.zementis.com/adapa.htm Zementis ADAPA] : batch and real-time scoring of PMML 3.2 and earlier for several mining models, including decision trees, support vector machines, neural networks, linear and logistic regression models.

External links

* [http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/converterwidget.xml PMML Converter] - This is an iGoogle gadget that can be used to convert a variety of PMML elements from older versions (2.1, 3.0 and 3.1) to PMML 3.2. It also validates any PMML file against the PMML schema (for older versions as well as version 3.2).
* [http://www.google.com/ig/adde?hl=en&moduleurl=hosting.gmodules.com/ig/gadgets/file/115640297026242314759/adapawidget.xml ADAPA Predictive Analytics Engine] - Available as an iGoogle gadget or as a Service through the Amazon Elastic Compute Cloud, the ADAPA engine can import several PMML models (in version 3.2). After uploading, models are available for scoring or verification.
* [http://www.dmg.org/v3-2/GeneralStructure.html PMML 3.2 Specification]
* [http://www.dmg.org/index.html Data Mining Group Home]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Predictive Model Markup Language — (PMML) ist ein seit 1997 fortlaufend entwickelter, auf XML basierter Standard, zum Austausch von Ergebnissen zwischen verschiedenen Programmen für die Data Mining. Meist bietet das jeweilige Programm (prudsys Expert Mining Suite, IBM Intelligent… …   Deutsch Wikipedia

  • Predictive Model Markup Language — ou PMML est un langage de marquage basé sur XML conçu pour définir des modèles de données et visant à rendre interopérables les systèmes de datamining. La version 4.0 est sortie le 16 juin 2009. Voir aussi Liens internes Exploration de données… …   Wikipédia en Français

  • PMML : Predictive Model Markup Language — Predictive Model Markup Language Predictive Model Markup Language ou PMML est un langage de marquage basé sur XML conçu pour définir des modèles de données et visant à rendre interopérables les systèmes de datamining. Liens externes (en) PMML 3.1 …   Wikipédia en Français

  • Predictive analytics — encompasses a variety of techniques from statistics and data mining that analyze current and historical data to make predictions about future events. Such predictions rarely take the form of absolute statements, and are more likely to be… …   Wikipedia

  • Pmml — Predictive Model Markup Language Predictive Model Markup Language ou PMML est un langage de marquage basé sur XML conçu pour définir des modèles de données et visant à rendre interopérables les systèmes de datamining. Liens externes (en) PMML 3.1 …   Wikipédia en Français

  • PMML — Predictive Model Markup Language (PMML) ist ein seit 1997 fortlaufend entwickelter, auf XML basierter Standard, zum Austausch von Ergebnissen zwischen verschiedenen Programmen für die Data Mining. Meist bietet das jeweilige Programm (prudsys… …   Deutsch Wikipedia

  • Data mining — Not to be confused with analytics, information extraction, or data analysis. Data mining (the analysis step of the knowledge discovery in databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science[2][3] is… …   Wikipedia

  • ADAPA — This article describes ADAPA, a decision engine used to manage and design automated decisions systems. For the Babylonian and Summerian god of wisdom and of the ancient city of Eridu see Adapa. ADAPA is intrinsically a decision engine. It… …   Wikipedia

  • RuleML — The Rule Markup Language (RuleML) is a markup language developed to express both forward (bottom up) and backward (top down) rules in XML for deduction, rewriting, and further inferential transformational tasks. It is defined by the Rule Markup… …   Wikipedia

  • Minería de datos — La minería de datos (DM, Data Mining) consiste en la extracción no trivial de información que reside de manera implícita en los datos. Dicha información era previamente desconocida y podrá resultar útil para algún proceso. En otras palabras, la… …   Wikipedia Español

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”