- Factorial code
Most real world data sets consist of data vectors whose individual components are not
statistically independent , that is, they areredundant in thestatistical sense. Then it is desirable to create afactorial code of the data, i. e., a new vector-valuedrepresentation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.Later
supervised learning usually works much better when the raw input data is first translated into such a factorial code. For example, suppose the final goal is toclassify images with highly redundant pixels. Anaive Bayes classifier will assume the pixels arestatistically independent random variables and therefore fail to produce good results. If the data are first encoded in a factorial way, however, then the naive Bayes classifier will achieve its optimal performance (compare Schmidhuber et al. 1996).To create factorial codes,
Horace Barlow and co-workers suggested to minimize the sum of thebit entropies of the code components of binary codes (1989).Jürgen Schmidhuber (1992) re-formulated the problem in terms of predictors and binaryfeature detectors , each receiving the raw data as an input. For each detector there is a predictor that sees the other detectors and learns to predict the output of its own detector in response to the various input vectors or images. But each detector uses amachine learning algorithm to become as unpredictable as possible. Theglobal optimum of thisobjective function corresponds to a factorial code represented in adistributed fashion across the outputs of the feature detectors.See also
* Blind signal separation (BSS)
* Principal component analysis (PCA)
*Factor analysis
*Unsupervised learning
*Image processing
*Signal processing References
*
Horace Barlow , T. P. Kaushal, and G. J. Mitchison. Finding minimum entropy codes. Neural Computation, 1:412-423, 1989.*
Jürgen Schmidhuber . Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992* J. Schmidhuber and M. Eldracher and B. Foltin. Semilinear predictability minimzation produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996
Wikimedia Foundation. 2010.