Regularization (mathematics)

Regularization (mathematics)

In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm.

A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.

The same idea arose in many fields of science. For example, the least-squares method can be viewed as a very simple form of regularization. A simple form of regularization applied to integral equations, generally termed Tikhonov regularization after Andrey Nikolayevich Tikhonov, is essentially a trade-off between fitting the data and reducing a norm of the solution. More recently, non-linear regularization methods, including total variation regularization have become popular.

Regularization in statistics

In statistics and machine learning, regularization is used to prevent overfitting. Typical examples of regularization in statistical machine learning include ridge regression, lasso, and L2-norm in support vector machines.

Regularization methods are also used for model selection, where they work by implicitly or explicitly penalizing models based on the number of their parameters. For example, Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting not involving regularization include cross-validation.

Examples of applications of different methods of regularization to the linear model are:

Model Fit measure Entropy measure
AIC/BIC \|Y-X\beta\|_2 \|\beta\|_0
Ridge regression \|Y-X\beta\|_2 \|\beta\|_2
Lasso[1] \|Y-X\beta\|_2 \|\beta\|_1
Basis pursuit denoising \|Y-X\beta\|_2 \lambda\|\beta\|_1
RLAD[2] \|Y-X\beta\|_1 \|\beta\|_1
Dantzig Selector[3] \|X^\top (Y-X\beta)\|_\infty \|\beta\|_1

Notes

  1. ^ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso" (PostScript). Journal of the Royal Statistical Society, Series B (Methodology) 58 (1): 267–288. MR1379242. http://www-stat.stanford.edu/~tibs/ftp/lasso.ps. Retrieved 2009-03-19. 
  2. ^ Li Wang, Michael D. Gordon & Ji Zhu (December 2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". Sixth International Conference on Data Mining. pp. 690–700. doi:10.1109/ICDM.2006.134. 
  3. ^ Candes, Emmanuel; Tao, Terence (2007). "The Dantzig selector: Statistical estimation when p is much larger than n". Annals of Statistics 35 (6): 2313–2351. arXiv:math/0506081. doi:10.1214/009053606000001523. MR2382644. 

References

  • A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636-666. Available in pdf from author's website.

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Regularization — The mathematical term regularization has two main meanings, both associated with making a function more regular or smooth.See: * Regularization (physics) * Regularization (mathematics) *Regularization, the act of giving legal residency and… …   Wikipedia

  • List of mathematics articles (R) — NOTOC R R. A. Fisher Lectureship Rabdology Rabin automaton Rabin signature algorithm Rabinovich Fabrikant equations Rabinowitsch trick Racah polynomials Racah W coefficient Racetrack (game) Racks and quandles Radar chart Rademacher complexity… …   Wikipedia

  • Tikhonov regularization — Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill posed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously …   Wikipedia

  • Zeta function regularization — In mathematics and theoretical physics, zeta function regularization is a type of regularization or summability method that assigns finite values to superficially divergent sums. The technique is now commonly applied to problems in physics, but… …   Wikipedia

  • Distribution (mathematics) — This article is about generalized functions in mathematical analysis. For the probability meaning, see Probability distribution. For other uses, see Distribution (disambiguation). In mathematical analysis, distributions (or generalized functions) …   Wikipedia

  • List of mathematics articles (T) — NOTOC T T duality T group T group (mathematics) T integration T norm T norm fuzzy logics T schema T square (fractal) T symmetry T table T theory T.C. Mits T1 space Table of bases Table of Clebsch Gordan coefficients Table of divisors Table of Lie …   Wikipedia

  • List of mathematics articles (D) — NOTOC D D distribution D module D D Agostino s K squared test D Alembert Euler condition D Alembert operator D Alembert s formula D Alembert s paradox D Alembert s principle Dagger category Dagger compact category Dagger symmetric monoidal… …   Wikipedia

  • Linear least squares (mathematics) — This article is about the mathematics that underlie curve fitting using linear least squares. For statistical regression analysis using least squares, see linear regression. For linear regression on a single variable, see simple linear regression …   Wikipedia

  • List of mathematics articles (Z) — NOTOC Z Z channel (information theory) Z factor Z function Z group Z matrix (mathematics) Z notation Z order (curve) Z test Z transform Z* theorem Zadoff–Chu sequence Zahorski theorem Zakai equation Zakharov–Schulman system Zakharov system ZAMM… …   Wikipedia

  • Blind deconvolution — In applied mathematics, blind deconvolution is a deconvolution technique that permits recovery of the target object from set of blurred images in the presence of a poorly determined or unknown point spread function (PSF). Regular linear and non… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”