Least-squares spectral analysis

Least-squares spectral analysis

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum, based on a least squares fit of sinusoids to data samples, similar to Fourier analysis. [cite book | title = Variable Stars As Essential Astrophysical Tools | author = Cafer Ibanoglu | publisher = Springer | year = 2000 | isbn = 0792360842 | url = http://books.google.com/books?id=QzGbOiZ3OnkC&pg=PA269&dq=vanicek+spectral+sinusoids&as_brr=0&ei=HDgkR9GwBqT8pwK8hbzdAg&sig=uTdgGB-ACfLo_37_h8BVXgYrAGs ] Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems. cite book | url = http://books.google.com/books?id=9GhDHTLzFDEC&pg=PA685&dq=%22spectral+analysis%22+%22vanicek%22+inauthor:press&as_brr=3&ei=10EKR6akEovqoQLOy9iqDQ&ie=ISO-8859-1&sig=Pt6HJ2hsLodcsrr2PUQDxnVSlPU | author = Press et al. | title = Numerical Recipes | edition = 3rd Edition | year = 2007 | publisher = Cambridge University Press | isbn = 0521880688]

LSSA is also known as the Vaníček methodcite journal | author = J. Taylor and S. Hamilton | title = Some tests of the Vaníček Method of spectral analysis | date = 1972-03-20 | journal = Astrophysics and Space Science | url = http://www.springerlink.com/content/mn3201n6j2j67t01/ ] after Petr Vaníček, and as the Lomb method (or the Lomb periodogram [cite book | title = Nonlinear Dynamics and Statistics | author = Alistair I. Mees | publisher = Springer | year = 2001 | isbn = 0817641637 | url = http://books.google.com/books?id=pH_OmkD4ZaQC&pg=PA227&dq=Lomb-periodogram&as_brr=3&ei=JPQaR7__J6XepQLLiuTZCg&sig=6a8O_Z2xxN2-WWim6l_JOgPthus ] ) and the Lomb–Scargle method [cite book | title = Climate Change: Critical Concepts in the Environment | author = Frank Chambers | publisher = Routledge | year = 2002 | isbn = 0415278589 | url = http://books.google.com/books?id=LnqFIpFJ9cwC&pg=PA178&dq=Lomb-Scargle-method&as_brr=3&ei=J_IaR8KlN5zIogKy8O3hCg&ie=ISO-8859-1&sig=zNxa7Lt_dp5L2NKxyk_ACIhB1iw ] (or Lomb–Scargle periodogram [cite journal | title = Searching for Biological Rhythms: Peak Detection in the Periodogram of Unequally Spaced Data | author = Hans P. A. Van Dongen et al. | journal = Journal of Biological Rhythms | volume = 14 | issue = 6 | date = 1999 | pages = pp.617–620 | url = http://jbr.sagepub.com/cgi/content/abstract/14/6/617 | doi = 10.1177/074873099129000984 ] cite book | title = Observational Astronomy | author = D. Scott Birney, David Oesper, and Guillermo Gonzalez | publisher = Cambridge University Press | year = 2006 | isbn = 0521853702 | url = http://books.google.com/books?id=cc9L8QWcZWsC&pg=RA3-PA263&dq=Lomb-Scargle-periodogram&as_brr=3&ei=JvMaR6q5GJ6KpwKr6vnoCQ&sig=XLyLtvjgBXovvLZGaG3q_taRdxI ] ), based on the contributions of Nicholas R. Lomb and, independently, Jeffrey D. Scargle. Closely related methods have been developed by Michael Korenberg and by Scott Chen and David Donoho.

Historical background

The close connections between Fourier analysis, the periodogram, and least-squares fitting of sinusoids have long been known. [cite book | title = The Combination of Observations | author = David Brunt | publisher = Cambridge University Press | year = 1931 | edition = 2nd ed. ] Most developments, however, are restricted to complete data sets of equally spaced samples. In 1963, J. F. M. Barning of Mathematisch Centrum, Amsterdam, handled unequally spaced data by similar techniques, [Barning, F.J.M. " [http://adsabs.harvard.edu/abs/1963BAN....17...22B The numerical analysis of the light-curve of 12 lacertae] ," "Bulletin of the Astronomical Institutes of the Netherlands," 17, pp.22–28.] including both a periodogram analysis equivalent to what is now referred to the Lomb method, and least-squares fitting of selected frequencies of sinusoids determined from such periodograms, connected by a procedure that is now known as matching pursuit with post-backfittingcite journal | title = Kernel Matching Pursuit | author = Pascal Vincent and Yoshua Bengio | journal = Machine Learning | volume=48 | pages=165–187 |year=2002 | url = http://www.iro.umontreal.ca/~vincentp/Publications/kmp_mlj.pdf | doi = 10.1023/A:1013955821559 ] or orthogonal matching pursuit. [Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition," in "Proc. 27th Asilomar Conference on Signals, Systems and Computers," A. Singh, ed., Los Alamitos, CA, USA, IEEE Computer Society Press, 1993.]

Petr Vaníček, a Canadian geodesist of the University of New Brunswick, also proposed the matching-pursuit approach, which he called "successive spectral analysis", but with equally spaced data, in 1969.Vanícek P. " [http://adsabs.harvard.edu/abs/1969Ap&SS...4..387V Approximate Spectral Analysis by Least-squares Fit] ", "Astrophysics and Space Science", pp.387–391, Volume 4 (1969).] He further developed this method, and analyzed the treatment of unequally spaced samples, in 1971.Vanícek P. " [http://adsabs.harvard.edu/abs/1971Ap%26SS..12...10V Further development and properties of the spectral analysis by least-squares fit] ," "Astrophysics and Space Science", pp.10–33, Volume 12 (1971).]

The Vaníček method was then simplified in 1976 by Nicholas R. Lomb of the University of Sydney, who pointed out its close connection to periodogram analysis.Lomb, N. R., " [http://adsabs.harvard.edu/abs/1976Ap&SS..39..447L Least-squares frequency analysis of unequally spaced data] ," "Astrophysics and Space Science" 39, p.447–462 (1976).] The definition of a periodogram of unequally spaced data was subsequently further modified and analyzed by Jeffrey D. Scargle of NASA Ames Research Center,Scargle, J. D., " [http://adsabs.harvard.edu/abs/1982ApJ...263..835S Studies in astronomical time series analysis II: Statistical aspects of spectral analysis of unevenly spaced data] ," "Astrophysics and Space Science" 302, p.757–763 (1982).] who showed that with minor changes it could be made identical to Lomb's least-squares formula for fitting individual sinusoid frequencies.

Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent."

Press summarizes the development this way:

Michael Korenberg of Queen's University in 1989 developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as orthogonal matching pursuit. In 1994, Scott Chen and David Donoho of Stanford University have developed the "basis pursuit" method using miminization of the L1 norm of coefficients to cast the problem as a linear programming problem, for which efficient solutions are available.

The Vaníček method

In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies, using a standard linear regression, or least-squares fit. The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known as matching pursuit with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids).

A data vector "Φ" is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector "x":

:phi approx extbf{A}x

where the weight vector "x" is chosen to minimize the sum of squared errors in approximating "Φ". The solution for "x" is closed-form, using standard linear regression:

:x = ( extbf{A}^{mathrm{T extbf{A})^{-1} extbf{A}^{mathrm{Tphi.

Here the matrix A can be based on any set of functions that are mutually independent (not necessarily orthogonal) when evaluated at the sample times; for spectral analysis, the functions used are typically sines and cosines evenly distributed over the frequency range of interest. If too many frequencies are chosen in a too-narrow frequency range, the functions will not be sufficiently independent, the matrix will be badly conditioned, and the resulting spectrum will not be meaningful.

When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise dot products), the matrix ATA is a diagonal matrix; when the columns all have the same power (sum of squares of elements), then that matrix is an identity matrix times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and the sinusoids are chosen to be sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycle per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This particular case is known as the discrete Fourier transform, slightly rewritten in terms of real data and coefficients.

:x = extbf{A}^{mathrm{Tphi (DFT case for "N" equally spaced samples and frequencies, within a scalar factor)

Lomb proposed using this simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not too closely spaced. This is essentially the traditional periodogram formulation, but now adopted for use with unevenly spaced samples. The vector "x" is a good estimate of an underlying spectrum, but since correlations are ignored, A"x" is no longer a good approximation to the signal, and the method is no longer a least-squares method – yet it has continued to be referred to as such.

The Lomb–Scargle periodogram

Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula to first find a time delay τ such that this pair of sinusoids would be mutually orthogonal at sample times "tj", and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency, which made his modified periodogram method exactly equivalent to Lomb's least-squares method:

: an{2 pi au} = frac{sum_j sin 2 omega t_j}{sum_j cos 2 omega t_j}.

The periodogram at frequency ω is then estimated as:

:P_x(omega) = frac{1}{2} left( frac { left [ sum_j X_j cos omega ( t_j - au ) ight] ^ 2} { sum_j cos^2 omega ( t_j - au ) }+ frac {left [ sum_j X_j sin omega ( t_j - au ) ight] ^ 2} { sum_j sin^2 omega ( t_j - au ) } ight)

which Scargle reports then has the same statistical distribution as the periodogram in the evenly-sampled case.

At any individual frequency ω, this method gives the same power as does a least-squares fit to sinusoids of that frequency, of the form

:phi(t) approx A sin omega t + B cos omega t. [cite book | title = Data Analysis Methods in Physical Oceanography | author = William J. Emery, Richard E. Thomson | publisher = Elsevier | year = 2001 | isbn = 0444507566 | url = http://books.google.com/books?id=gYc4fp_ixmwC&pg=PA458&dq=vanicek+least-squares+spectral-analysis+lomb&as_brr=0&ei=aD0kR6HuA53qpwKBjdndAg&sig=b3HrCSnls8gzvZDvnxV7SuO-Sb8#PPA460,M1 ]

Korenberg's "fast orthogonal search" method

Michael Korenberg of Queens University in Kingston, Ontario, developed a method for choosing a sparse set of components from an over-complete set, such as sinusoidal components for spectral analysis, called fast orthogonal search (FOS). Mathematically, FOS uses a slightly modified Cholesky decomposition in a mean-square error reduction (MSER) process, implemented as a sparse matrix inversion.cite journal | author = Korenberg, M. J. | title = A robust orthogonal algorithm for system identification and time-series analysis | journal = Biological Cybernetics | volume = 60 | pages = 267–276 | date = 1989 | url = http://www.springerlink.com/content/m675341rn64k22tp/ | doi = 10.1007/BF00204124] [cite journal | title = Raman Spectral Estimation via Fast Orthogonal Search | url = http://www.rsc.org/Publishing/Journals/AN/article.asp?doi=a700902j | journal = The Analyst | year = 1997 | volume = 122 | pages = 879–882 | doi = 10.1039/a700902j | author = Korenberg, Michael J.] As with the other LSSA methods, FOS avoids the major shortcoming of discrete Fourier analysis, and can achieve highly accurate identifications of embedded periodicities and excels with unequally-spaced data; the fast orthogonal search method has also been applied to other problems such as nonlinear system identification.

Chen and Donoho's "basis pursuit" method

Chen and Donoho have developed a procedure called "basis pursuit" for fitting a sparse set of sinusoids or other functions from an over-complete set. The method defines an optimal solution as the one that minimizes the L1 norm of the coefficients, so that the problem can be cast as a linear programming problem, for which efficient solution methods are available.S. Chen and D.L. Donoho (1994), "On Basis Pursuit." Technical Report, Department of Statistics, Stanford University]


The most useful feature of the LSSA method is enabling incomplete records to be spectrally analyzed, without the need to manipulate the record or to invent otherwise non-existent data.

Magnitudes in the LSSA spectrum depict the contribution of a frequency or period to the variance of the time series. Generally, spectral magnitudes defined in the above manner enable the output's straightforward significance level regime.Beard, A.G., Williams, P.J.S., Mitchell, N.J. & Muller, H.G. A special climatology of planetary waves and tidal variability, J Atm. Solar-Ter. Phys. 63 (09), p.801–811 (2001).] Alternatively, magnitudes in the Vanícek spectrum can also be expressed in dB.Pagiatakis, S. Stochastic significance of peaks in the least-squares spectrum, J of Geodesy 73, p.67-78 (1999).] Note that magnitudes in the Vaníček spectrum follow β-distribution. [Steeves, R.R. A statistical test for significance of peaks in the least squares spectrum, Collected Papers of the Geodetic Survey, Department of Energy, Mines and Resources, Surveys and Mapping, Ottawa, Canada, p.149-166 (1981)]

Inverse transformation of Vaníček's LSSA is possible, as is most easily seen by writing the forward transform as a matrix; the matrix inverse (when the matrix is not singular) or pseudo-inverse will then be an inverse transformation; the inverse will exactly match the original data if the chosen sinusoids are mutually independent at the sample points and their number is equal to the number of data points.Craymer, M.R., [ftp://geod.nrcan.gc.ca/pub/GSD/craymer/pubs/thesis1998.zip "The Least Squares Spectrum, Its Inverse Transform and Autocorrelation Function: Theory and Some Applications in Geodesy"] , Ph.D. Dissertation, University of Toronto, Canada (1998).] No such inverse procedure is known for the periodogram method.


The LSSA can be implemented in less than a page of MATLAB code. [cite book | title = Ice Ages and Astronomical Causes: Data, Spectral Analysis, and Mechanisms | author = Richard A. Muller and Gordon J. MacDonald | publisher = Springer | year = 2000 | isbn = 3540437797 | url = http://books.google.com/books?id=P8ideTkMQisC&pg=PA289&dq=spectral+lomb+scargle&as_brr=0&ei=MRAPR5a8BZLcpwLpiaSzBg&sig=ELcyaGCiJLv5mQ_GE22CG_9i6xU#PPA289,M1 ] For each frequency in a desired set of frequencies, sine and cosine functions are evaluated at the times corresponding to the data samples, and dot products of the data vector with the sinusoid vectors are taken and appropriately normalized; following the method known as Lomb/Scargle periodogram, a time shift is calculated for each frequency to orthogonalize the sine and cosine components before the dot product, as described by Craymer; finally, a power is computed from those two amplitude components. This same process implements a discrete Fourier transform when the data are uniformly spaced in time and the frequencies chosen correspond to integer numbers of cycles over the finite data record.

As Craymer explains, this method treats each sinusoidal component independently, or out of context, even though they may not be orthogonal on the data points, whereas Vaníček's original method does a full simultaneous least-squares fit by solving a matrix equation, partitioning the total data variance between the specified sinusoid frequencies. Such a matrix least-squares solution is natively available in MATLAB as the backslash operator. [cite book | title = MATLAB Primer | author = Timothy A. Davis and Kermit Sigmon | publisher = CRC Press | year = 2005 | isbn = 1584885238 | url = http://books.google.com/books?id=MXWypqcHECkC&pg=PA12&dq=matlab+least-squares+backslash&as_brr=0&ei=4R0PR-rLHZuOpwLwjqCzBg&sig=edaQTBMdV8HRXNPh0CmZISKyBxg#PPA12,M1 ]

Craymer explains that the least-squares method, as opposed to the independent or periodogram version due to Lomb, can not fit more components (sines and cosines) than there are data samples, and further that:

Lomb's periodogram method, on the other hand, can use an arbitrarily high number of, or density of, frequency components, as in a standard periodogram; that is, the frequency domain can be over-sampled by an arbitrary factor.

In Fourier analysis, such as the Fourier transform or the discrete Fourier transform, the sinusoids being fitted to the data are all mutually orthogonal, so there is no distinction between the simple out-of-context dot-product-based projection onto basis functions versus a least-squares fit; that is, no matrix inversion is required to least-squares partition the variance between orthogonal sinusoids of different frequencies. [cite book | title = Discrete-Time Signal Processing: An Algebraic Approach | author = Darrell Williamson | publisher = Springer | year = 1999 | isbn = 1852331615 | url = http://books.google.com/books?id=JCKAirWQdqkC&pg=PA314&dq=fourier-transform+orthogonal+least-squares&as_brr=3&ei=iJQSR7qlA5uGpgKlztizBg&sig=l_UDUKhE2-oYnp67lWtGTAI7vCg ] This method is usually preferred for its efficient fast Fourier transform implementation, when complete data records with equally spaced samples are available.

ee also

* Orthogonal functions
* Spectral density
* Spectral density estimation


External links

* [ftp://ftp.geod.nrcan.gc.ca/pub/GSD/craymer/software/lssa/ LSSA software freeware download] (via ftp), FORTRAN, Vaníček's method, from the Natural Resources Canada.

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Least squares — The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. Least squares means that the overall solution minimizes the sum of… …   Wikipedia

  • Spectral density estimation — In statistical signal processing, the goal of spectral density estimation is to estimate the spectral density (also known as the power spectrum) of a random signal from a sequence of time samples of the signal. Intuitively speaking, the spectral… …   Wikipedia

  • Fourier analysis — In mathematics, Fourier analysis is a subject area which grew out of the study of Fourier series. The subject began with trying to understand when it was possible to represent general functions by sums of simpler trigonometric functions. The… …   Wikipedia

  • analysis — /euh nal euh sis/, n., pl. analyses / seez /. 1. the separating of any material or abstract entity into its constituent elements (opposed to synthesis). 2. this process as a method of studying the nature of something or of determining its… …   Universalium

  • Analysis of variance — In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of… …   Wikipedia

  • List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra …   Wikipedia

  • Principal component analysis — PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction. The vectors shown are the eigenvectors of the covariance matrix scaled by… …   Wikipedia

  • Regression Analysis of Time Series — Infobox Software name = RATS caption = developer = Estima latest release version = 7.0 latest release date = 2007 operating system = Cross platform genre = econometrics software license = Proprietary website =… …   Wikipedia

  • Detrended fluctuation analysis — In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis (DFA) is a method for determining the statistical self affinity of a signal. It is useful for analysing time series that appear to be long memory… …   Wikipedia

  • Multivariate analysis of variance — (MANOVA) is a generalized form of univariate analysis of variance (ANOVA). It is used when there are two or more dependent variables. It helps to answer : 1. do changes in the independent variable(s) have significant effects on the dependent …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”