Kernel smoother

A kernel smoother is a statistical technique for estimating a real valued function $f(X),,left( Xin mathbb{R}^{p}
ight)$ by using its noisy observations, when no parametric model for this function is known. The estimated function is smooth, and the level of smoothness is set by a single parameter.

Little or no training is required for operation of the kernel smoother. This technique is most appropriate for low dimensional (p<3) data visualization purposes. Actually, the kernel smoother represents the set of irregular data points as a smooth line or surface.

Definitions

Let $K_{h_{lambda (X_{0},X)$ be a kernel defined by

: $K_{h_{lambda (X_{0},X)=Dleft( frac{left| X-X_{0}
ight{h_{lambda }(X_{0})}
ight)$

where:
* $X,X_{0}in mathbb{R}^{p}$
* $left| cdot
ight|$ is the Euclidean norm
* $h_{lambda }(X_{0})$ is a parameter (kernel radius)
* D(t) typically is a positive real valued function, which value is decreasing (or not increasing) for the increasing distance between the X and X₀.

Popular kernels used for smoothing include
* Epanechnikov
* Tri-cube
* Gaussian

Let $hat{Y}(X):mathbb{R}^{p} o mathbb{R}$ be a continuous function of X. For each $X_{0}in mathbb{R}^{p}$ , the Nadaraya-Watson kernel-weighted average (smooth Y(X) estimation) is defined by

: $hat{Y}(X_{0})=frac{sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})Y(X_{i}){sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})$

where:
* N is the number of observed points
* Y(X_i) are the observations at X_i points.

In the following sections, we describe some particular cases of kernel smoothers.

Nearest neighbor smoother

The idea of the nearest neighbor smoother is the following. For each point X₀, take m nearest neighbors and estimate the value of Y(X₀) by averaging the values of these neighbors.

Formally, $h_{m}(X_{0})=left| X_{0}-X_{ [m] }
ight|$ , where $X_{ [m] }$ is the "m"’s closest to X₀ neighbor, and $D(t)=left{ egin{align} & 1,,,,if,,left| t
ight|le 1 \ & 0,,,otherwise \ end{align}
ight.$

Example:

In this example, X is one-dimensional. For each X₀, the $hat{Y}(X_{0})$ is an average value of 16 closest to X₀ points (denoted by red). The result is not smooth enough.

Kernel average smoother

The idea of the kernel average smoother is the following. For each data point X₀, choose a constant distance size $lambda$ (kernel radius, or window width for p=1 dimension), and compute a weighted average for all data points that are closer than $lambda$ to X₀ (the closer to X₀ points get higher weights).

Formally, $h_{lambda }(X_{0})=lambda =const$ , and D(t) is one of the popular kernels.

Example:

For each X₀ the window width is constant, and the weight of each point in the window is schematically denoted by the yellow figure in the graph. It can be seen that the estimation is smooth, but the boundary points are biased. The reason for that is the non-equal number of points (from the right and from the left to the X₀) in the window, when the X₀ is close enough to the boundary.

Local linear regression

In the two previous sections we assumed that the underlying Y(X) function is locally constant, therefore we were able to use the weighted average for the estimation. The idea of local linear regression is to fit locally a stright line (or a hyperplane for higher dimensions), and not the constant (horizontal line). After fitting the line, the estimation $hat{Y}(X_{0})$ is provided by the value of this line at X₀ point. By repeating this procedure for each X₀, one can get the estimation function $hat{Y}(X)$ .Like in previous section, the window width is constant $h_{lambda }(X_{0})=lambda =const$ .Formally, the local linear regression is computed by solving a weighted least square problem.

For one dimension (p=1):

$egin{align} & underset{alpha (X_{0}),eta (X_{0})}{mathop{min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-alpha (X_{0})-eta (X_{0})X_{i}
ight)^{2 \ & ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Downarrow \ & ,,,,,,,,,,,,,,,,,,,,,,,,,hat{Y}(X_{0})=alpha (X_{0})+eta (X_{0})X_{0} \ end{align}$

The closed form solution is given by:

$hat{Y}(X_{0})=left( 1,X_{0}
ight)left( B^{T}W(X_{0})B
ight)^{-1}B^{T}W(X_{0})y$

where:
* $y=left( Y(X_{1}),...,Y(X_{N})
ight)^{T}$
* $W(X_{0})=diagleft( K_{h_{lambda (X_{0},X_{i})
ight)_{N imes N}$
* $B^{T}=left( egin{matrix} 1 & 1 & ... & 1 \ X_{1} & X_{2} & ... & X_{N} \end{matrix}
ight)$ Example:

The resulting function is smooth, and the problem with the biased boundary points is solved.

Local polynomial regression

Instead of fitting locally linear functions, one can fit polynomial functions.

For p=1, one should minimize:

$underset{alpha (X_{0}),eta _{j}(X_{0}),j=1,...,d}{mathop{min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-alpha (X_{0})-sumlimits_{j=1}^{d}{eta _{j}(X_{0})X_{i}^{j
ight)^{2$

with $hat{Y}(X_{0})=alpha (X_{0})+sumlimits_{j=1}^{d}{eta _{j}(X_{0})X_{0}^{j$

In general case (p>1), one should minimize:

$egin{align} & hat{eta }(X_{0})=underset{eta (X_{0})}{mathop{arg min ,sumlimits_{i=1}^{N}{K_{h_{lambda (X_{0},X_{i})left( Y(X_{i})-b(X_{i})^{T}eta (X_{0})
ight)}^{2} \ & b(X)=left( egin{matrix} 1, & X_{1}, & X_{2},... & X_{1}^{2}, & X_{2}^{2},... & X_{1}X_{2},,,... \end{matrix}
ight) \ & hat{Y}(X_{0})=b(X_{0})^{T}hat{eta }(X_{0}) \ end{align}$

ee also

*Kernel (statistics)
*Kernel methods
*Kernel density estimation
*Kernel regression
*Local regression

References

* T. Hastie, R. Tibshirani and J. Friedman, "The Elements of Statistical Learning", Chapter 6, Springer, 2001. ISBN 0387952845 ( [http://www-stat.stanford.edu/~tibs/ElemStatLearn/ companion book site] ).

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

Kernel (statistics) — A kernel is a weighting function used in non parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables density functions, or in kernel regression to estimate the conditional expectation of a… … Wikipedia
List of mathematics articles (K) — NOTOC K K approximation of k hitting set K ary tree K core K edge connected graph K equivalence K factor error K finite K function K homology K means algorithm K medoids K minimum spanning tree K Poincaré algebra K Poincaré group K set (geometry) … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
Smoothing — In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine scale structures/rapid phenomena. Many different… … Wikipedia
Kalman filter — Roles of the variables in the Kalman filter. (Larger image here) In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise (random variations)… … Wikipedia
ClearType — is a trademark for Microsoft s implementation of subpixel rendering technology. ClearType attempts to improve the appearance of text on certain types of computer display screens by sacrificing color fidelity for additional intensity variation.… … Wikipedia
History of Microsoft Windows — In 1983, Microsoft announced the development of Windows, a graphical user interface (GUI) for its own operating system (MS DOS), which had shipped for IBM PC and compatible computers since 1981. The product line has changed from a GUI product to… … Wikipedia
Harris affine region detector — In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or … Wikipedia
Anexo:Versiones de Ubuntu — Artículo principal: Ubuntu Ubuntu 10.04 (Lucid Lynx), última versión con soporte a largo plazo (LTS). Este es un listado de las versiones de Ubuntu, una distribución Linux. La primera versión de Ubuntu fue lanzada el 20 de octubre de 2004 … Wikipedia Español
Features new to Windows XP — Windows XP has many features not found in previous versions of Windows. Improved device supportWindows XP provides new and/or improved drivers and user interfaces for devices compared to Windows Me and 98.Windows Image Acquisition (WIA),… … Wikipedia

Academic Dictionaries and Encyclopedias

Kernel smoother

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Kernel smoother

Look at other dictionaries:

Share the article and excerpts

Direct link