Kernel smoother

Kernel smoother

A kernel smoother is a statistical technique for estimating a real valued function f(X),,left( Xin mathbb{R}^{p} ight) by using its noisy observations, when no parametric model for this function is known. The estimated function is smooth, and the level of smoothness is set by a single parameter.

Little or no training is required for operation of the kernel smoother. This technique is most appropriate for low dimensional (p<3) data visualization purposes. Actually, the kernel smoother represents the set of irregular data points as a smooth line or surface.

Definitions

Let K_{h_{lambda (X_{0},X) be a kernel defined by

:K_{h_{lambda (X_{0},X)=Dleft( frac{left| X-X_{0} ight{h_{lambda }(X_{0})} ight)

where:
* X,X_{0}in mathbb{R}^{p}
* left| cdot ight| is the Euclidean norm
* h_{lambda }(X_{0}) is a parameter (kernel radius)
* D(t) typically is a positive real valued function, which value is decreasing (or not increasing) for the increasing distance between the X and X0.

Popular kernels used for smoothing include
* Epanechnikov
* Tri-cube
* Gaussian

Let hat{Y}(X):mathbb{R}^{p} o mathbb{R} be a continuous function of X. For each X_{0}in mathbb{R}^{p}, the Nadaraya-Watson kernel-weighted average (smooth Y(X) estimation) is defined by

:hat{Y}(X_{0})=frac{sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})Y(X_{i}){sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})

where:
* N is the number of observed points
* Y(Xi) are the observations at Xi points.

In the following sections, we describe some particular cases of kernel smoothers.

Nearest neighbor smoother

The idea of the nearest neighbor smoother is the following. For each point X0, take m nearest neighbors and estimate the value of Y(X0) by averaging the values of these neighbors.

Formally, h_{m}(X_{0})=left| X_{0}-X_{ [m] } ight|, where X_{ [m] } is the "m"’s closest to X0 neighbor, and D(t)=left{ egin{align} & 1,,,,if,,left| t ight|le 1 \ & 0,,,otherwise \ end{align} ight.

Example:

In this example, X is one-dimensional. For each X0, the hat{Y}(X_{0}) is an average value of 16 closest to X0 points (denoted by red). The result is not smooth enough.

Kernel average smoother

The idea of the kernel average smoother is the following. For each data point X0, choose a constant distance size lambda (kernel radius, or window width for p=1 dimension), and compute a weighted average for all data points that are closer than lambda to X0 (the closer to X0 points get higher weights).

Formally, h_{lambda }(X_{0})=lambda =const, and D(t) is one of the popular kernels.

Example:

For each X0 the window width is constant, and the weight of each point in the window is schematically denoted by the yellow figure in the graph. It can be seen that the estimation is smooth, but the boundary points are biased. The reason for that is the non-equal number of points (from the right and from the left to the X0) in the window, when the X0 is close enough to the boundary.

Local linear regression

In the two previous sections we assumed that the underlying Y(X) function is locally constant, therefore we were able to use the weighted average for the estimation. The idea of local linear regression is to fit locally a stright line (or a hyperplane for higher dimensions), and not the constant (horizontal line). After fitting the line, the estimation hat{Y}(X_{0}) is provided by the value of this line at X0 point. By repeating this procedure for each X0, one can get the estimation function hat{Y}(X).Like in previous section, the window width is constant h_{lambda }(X_{0})=lambda =const.Formally, the local linear regression is computed by solving a weighted least square problem.

For one dimension (p=1):

egin{align} & underset{alpha (X_{0}),eta (X_{0})}{mathop{min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-alpha (X_{0})-eta (X_{0})X_{i} ight)^{2 \ & ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Downarrow \ & ,,,,,,,,,,,,,,,,,,,,,,,,,hat{Y}(X_{0})=alpha (X_{0})+eta (X_{0})X_{0} \ end{align}

The closed form solution is given by:

hat{Y}(X_{0})=left( 1,X_{0} ight)left( B^{T}W(X_{0})B ight)^{-1}B^{T}W(X_{0})y

where:
* y=left( Y(X_{1}),...,Y(X_{N}) ight)^{T}
* W(X_{0})=diagleft( K_{h_{lambda (X_{0},X_{i}) ight)_{N imes N}
* B^{T}=left( egin{matrix} 1 & 1 & ... & 1 \ X_{1} & X_{2} & ... & X_{N} \end{matrix} ight) Example:

The resulting function is smooth, and the problem with the biased boundary points is solved.

Local polynomial regression

Instead of fitting locally linear functions, one can fit polynomial functions.

For p=1, one should minimize:

underset{alpha (X_{0}),eta _{j}(X_{0}),j=1,...,d}{mathop{min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-alpha (X_{0})-sumlimits_{j=1}^{d}{eta _{j}(X_{0})X_{i}^{j ight)^{2

with hat{Y}(X_{0})=alpha (X_{0})+sumlimits_{j=1}^{d}{eta _{j}(X_{0})X_{0}^{j

In general case (p>1), one should minimize:

egin{align} & hat{eta }(X_{0})=underset{eta (X_{0})}{mathop{arg min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-b(X_{i})^{T}eta (X_{0}) ight)}^{2} \ & b(X)=left( egin{matrix} 1, & X_{1}, & X_{2},... & X_{1}^{2}, & X_{2}^{2},... & X_{1}X_{2},,,... \end{matrix} ight) \ & hat{Y}(X_{0})=b(X_{0})^{T}hat{eta }(X_{0}) \ end{align}

ee also

*Kernel (statistics)
*Kernel methods
*Kernel density estimation
*Kernel regression
*Local regression

References

* T. Hastie, R. Tibshirani and J. Friedman, "The Elements of Statistical Learning", Chapter 6, Springer, 2001. ISBN 0387952845 ( [http://www-stat.stanford.edu/~tibs/ElemStatLearn/ companion book site] ).


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Kernel (statistics) — A kernel is a weighting function used in non parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables density functions, or in kernel regression to estimate the conditional expectation of a… …   Wikipedia

  • List of mathematics articles (K) — NOTOC K K approximation of k hitting set K ary tree K core K edge connected graph K equivalence K factor error K finite K function K homology K means algorithm K medoids K minimum spanning tree K Poincaré algebra K Poincaré group K set (geometry) …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Smoothing — In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine scale structures/rapid phenomena. Many different… …   Wikipedia

  • Kalman filter — Roles of the variables in the Kalman filter. (Larger image here) In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise (random variations)… …   Wikipedia

  • ClearType — is a trademark for Microsoft s implementation of subpixel rendering technology. ClearType attempts to improve the appearance of text on certain types of computer display screens by sacrificing color fidelity for additional intensity variation.… …   Wikipedia

  • History of Microsoft Windows — In 1983, Microsoft announced the development of Windows, a graphical user interface (GUI) for its own operating system (MS DOS), which had shipped for IBM PC and compatible computers since 1981. The product line has changed from a GUI product to… …   Wikipedia

  • Harris affine region detector — In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or …   Wikipedia

  • Anexo:Versiones de Ubuntu — Artículo principal: Ubuntu Ubuntu 10.04 (Lucid Lynx), última versión con soporte a largo plazo (LTS). Este es un listado de las versiones de Ubuntu, una distribución Linux. La primera versión de Ubuntu fue lanzada el 20 de octubre de 2004 …   Wikipedia Español

  • Features new to Windows XP — Windows XP has many features not found in previous versions of Windows. Improved device supportWindows XP provides new and/or improved drivers and user interfaces for devices compared to Windows Me and 98.Windows Image Acquisition (WIA),… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”