 Numerical smoothing and differentiation

An experimental datum value can be conceptually described as the sum of a signal and some noise, but in practice the two contributions cannot be separated. The purpose of smoothing is to increase the Signaltonoise ratio without greatly distorting the signal (i.e. to get rid of the noise). One way to achieve this is by fitting successive sets of m data points to a polynomial of degree less than m by the method of linear least squares. Once the coefficients of the smoothing polynomial have been calculated they can be used to give estimates of the signal or its derivatives.
Contents
Convolution coefficients
When the data points are equally spaced a relatively simple analytical solution to the leastsquares equations can be found. This solution forms the basis of the convolution method of numerical smoothing and differentiation.
Suppose that the data consists of a set of n {x_{i}, y_{i}} points (i = 1...n), where x is an independent variable and y_{i} is an observed value. A polynomial will be fitted to a set of m (an odd number) adjacent data points, each separated by an interval h. Firstly, a change of variable is made
where is the value of the central point. z takes the values (1m)/2 .. 0 .. (m1)/2 (e.g. m = 5 → z = [2,1,0,1,2]). The polynomial, of degree k is defined as
The coefficients a_{0}, a_{1} etc. are obtained by solving the normal equations
where the ith row of the Jacobian matrix J has the values {1 z_{i} z_{i}^{2} … z_{i}^{k}}. For example, for a quadratic polynomial fitted to 5 points
In this example, a_{0} = ( − 3y_{1} + 12y_{2} + 17y_{3} + 12y_{4} − 3y_{5}) / 35. This is the smoothed value for the central point (z = 0) of the five data points used in the calculation. The coefficients (3 12 17 12 3)/35 are known as convolution coefficients as they are applied in succession to sets of m points at a time.
Tables of convolution coefficients were published by Savitzky and Golay in 1964, though the procedure for calculating them was known in the 19th century (See E. T. Whittaker and G. Robinson, The Calculus of Observations)
The numerical derivatives are obtained by differentiating Y. For a cubic polynomial
It is not necessary always to use the SavitzkyGolay tables as algebraic formulae can be derived for the convolution coefficients. For a cubic polynomial the expressions are
Signal distortion and noise reduction
It is inevitable that the signal will be distorted in the convolution process. Both the extent of the distortion and signaltonoise improvement:
 decrease as the degree of the polynomial increases
 increase as the width, m of the convolution function increases
For example, If the noise in all data points has a constant Standard deviation, σ, when smoothing by a mpoint linear polynomial the standard deviation on the noise will be decreased to , but with a quadratic polynomial it reduces to approximately . So, for a 9point quadratic smooth only half the noise is removed.
Frequency characteristics of convolution filters
Convolution maps to multiplication in the Fourier codomain (see pseudocode below). The (finite) Fourier transform of a convolution filter shows that it is most efficient for highfrequency noise and can therefore be described as a lowpass filter. The noise that is not removed is primarily lowfrequency noise.
### Smooth the vector x[1,...,nx] with an exponentially damped kernel. The ### result is a vector "smooth" with indeterminate values at the edges, and ### smoothed values in between cutoff = 0.05 ### weights to zero below this value alpha = 1.8 ### Decay of weight with distance from center logacutoff = log(cutoff)/log(alpha) ### log base alpha of cutoff span = floor(logacutoff ) ### width to left and right weights = alpha^(abs(sequence(left=span, right=span, step=1))) ### Overloaded "^" kernel = weights / sum(weights) ### Overloaded "/" nx = length(x) nk = 2*span+1 ### length(kernel) assert( nx>nk ) x1 = concatenate( sequence(0,length=ny1), x ) k1 = concatenate( kernel, sequence(0,length=nx1) ) s1 = inverse_fft( fft(x1) * fft(k1) ) ### Overloaded "*" smooth = sequence(NaN, length=nx) smooth[1+span:nxspan] = s1[ ny+nk1 : nx+nk1 ] ### using 1 offset notation
Applications
 Smoothing by convolution is performed primarily for aesthetic reasons. Fitting statistical models to smoothed data is generally a mistake, since the smoothing process alters the distribution of noise.
 Location of maxima and minima in experimental data curves. The first derivative of a function is zero at a maximum or minimum.
 Location of an endpoint in a titration curve. An endpoint is an inflection point where the second derivative of the function is zero.
 Resolution enhancement in spectroscopy. Bands in the second derivative of a spectroscopic curve are narrower than the bands in the spectrum: they have reduced Halfwidth. This allows partially overlapping bands to be "resolved" into separate peaks.
See also
 Smoothing
 Polynomial regression
 Savitzky–Golay smoothing filter
References
 P. Gans, Data fitting in the chemical sciences, Wiley, 1992.
External links
 SavitskyGolay filter (in Dutch, but with good illustrations)
 SavitskyGolay/Lownoise Lanczos differentiators Derivation, reference formulas, study on noise suppression properties in frequency domain.
 Smooth noiserobust differentiation Alternative approach to differentiate noisy data/function.
Wikimedia Foundation. 2010.