 Discretization of continuous features

In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of binning, as in making a histogram.
Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).^{[1]}
Some mechanisms for discretizing continuous data include:
 Fayyad & Irani's MDL method^{[2]}  Uses Information Gain to recursively define the best bins.
 And many more^{[3]}
Many Machine Learning algorithms are known to produce better models by discretizing continuous attributes^{[4]}
See also
References
 ^ "Entropy and MDL Discretization of Continuous Variables for Bayesian Belief Networks". http://sci2s.ugr.es/keel/pdf/specific/articulo/IJIS00.pdf. Retrieved 20080710.
 ^ "MultiInterval Discretization of ContinuousValued Attributes for Classification Learning". hdl:2014/35171.
 ^ "Supervised and Unsupervised Discretization of Continuous Features". http://www.ifir.edu.ar/~redes/curso/disc.ps. Retrieved 20080710.
 ^ "S. Kotsiantis, D. Kanellopoulos, Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 4758.". http://www.math.upatras.gr/~esdlab/en/members/kotsiantis/discretization%20survey%20kotsiantis.pdf.
This statisticsrelated article is a stub. You can help Wikipedia by expanding it.