- Least-squares spectral analysis
**Least-squares spectral analysis (LSSA)**is a method of estimating afrequency spectrum , based on aleast squares fit ofsinusoid s to data samples, similar toFourier analysis . [*cite book | title = Variable Stars As Essential Astrophysical Tools | author = Cafer Ibanoglu | publisher = Springer | year = 2000 | isbn = 0792360842 | url = http://books.google.com/books?id=QzGbOiZ3OnkC&pg=PA269&dq=vanicek+spectral+sinusoids&as_brr=0&ei=HDgkR9GwBqT8pwK8hbzdAg&sig=uTdgGB-ACfLo_37_h8BVXgYrAGs*]Fourier analysis , the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems. cite book | url = http://books.google.com/books?id=9GhDHTLzFDEC&pg=PA685&dq=%22spectral+analysis%22+%22vanicek%22+inauthor:press&as_brr=3&ei=10EKR6akEovqoQLOy9iqDQ&ie=ISO-8859-1&sig=Pt6HJ2hsLodcsrr2PUQDxnVSlPU | author = Press et al. | title = Numerical Recipes | edition = 3rd Edition | year = 2007 | publisher = Cambridge University Press | isbn = 0521880688]LSSA is also known as the

**Vaníček method**cite journal | author = J. Taylor and S. Hamilton | title = Some tests of the Vaníček Method of spectral analysis | date = 1972-03-20 | journal = Astrophysics and Space Science | url = http://www.springerlink.com/content/mn3201n6j2j67t01/ ] afterPetr Vaníček , and as the**Lomb method**(or the Lomb periodogram [*cite book | title = Nonlinear Dynamics and Statistics | author = Alistair I. Mees | publisher = Springer | year = 2001 | isbn = 0817641637 | url = http://books.google.com/books?id=pH_OmkD4ZaQC&pg=PA227&dq=Lomb-periodogram&as_brr=3&ei=JPQaR7__J6XepQLLiuTZCg&sig=6a8O_Z2xxN2-WWim6l_JOgPthus*] ) and the**Lomb–Scargle method**[*cite book | title = Climate Change: Critical Concepts in the Environment | author = Frank Chambers | publisher = Routledge | year = 2002 | isbn = 0415278589 | url = http://books.google.com/books?id=LnqFIpFJ9cwC&pg=PA178&dq=Lomb-Scargle-method&as_brr=3&ei=J_IaR8KlN5zIogKy8O3hCg&ie=ISO-8859-1&sig=zNxa7Lt_dp5L2NKxyk_ACIhB1iw*] (or Lomb–Scargle periodogram [*cite journal | title = Searching for Biological Rhythms: Peak Detection in the Periodogram of Unequally Spaced Data | author = Hans P. A. Van Dongen et al. | journal = Journal of Biological Rhythms | volume = 14 | issue = 6 | date = 1999 | pages = pp.617–620 | url = http://jbr.sagepub.com/cgi/content/abstract/14/6/617 | doi = 10.1177/074873099129000984*] cite book | title = Observational Astronomy | author = D. Scott Birney, David Oesper, and Guillermo Gonzalez | publisher = Cambridge University Press | year = 2006 | isbn = 0521853702 | url = http://books.google.com/books?id=cc9L8QWcZWsC&pg=RA3-PA263&dq=Lomb-Scargle-periodogram&as_brr=3&ei=JvMaR6q5GJ6KpwKr6vnoCQ&sig=XLyLtvjgBXovvLZGaG3q_taRdxI ] ), based on the contributions of Nicholas R. Lomb and, independently, Jeffrey D. Scargle. Closely related methods have been developed by Michael Korenberg and by Scott Chen and David Donoho.**Historical background**The close connections between

Fourier analysis , theperiodogram , andleast-squares fitting ofsinusoids have long been known. [*cite book | title = The Combination of Observations | author = David Brunt | publisher = Cambridge University Press | year = 1931 | edition = 2nd ed.*] Most developments, however, are restricted to complete data sets of equally spaced samples. In 1963, J. F. M. Barning ofMathematisch Centrum , Amsterdam, handled unequally spaced data by similar techniques, [*Barning, F.J.M. " [*] including both a periodogram analysis equivalent to what is now referred to the Lomb method, and least-squares fitting of selected frequencies of sinusoids determined from such periodograms, connected by a procedure that is now known as*http://adsabs.harvard.edu/abs/1963BAN....17...22B The numerical analysis of the light-curve of 12 lacertae*] ," "Bulletin of the Astronomical Institutes of the Netherlands,"**17**, pp.22–28.matching pursuit with post-backfittingcite journal | title = Kernel Matching Pursuit | author = Pascal Vincent and Yoshua Bengio | journal = Machine Learning | volume=48 | pages=165–187 |year=2002 | url = http://www.iro.umontreal.ca/~vincentp/Publications/kmp_mlj.pdf | doi = 10.1023/A:1013955821559 ] or orthogonal matching pursuit. [*Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition," in "Proc. 27th Asilomar Conference on Signals, Systems and Computers," A. Singh, ed., Los Alamitos, CA, USA, IEEE Computer Society Press, 1993.*]Petr Vaníček , a Canadian geodesist of theUniversity of New Brunswick , also proposed the matching-pursuit approach, which he called "successive spectral analysis", but with equally spaced data, in 1969.Vanícek P. " [*http://adsabs.harvard.edu/abs/1969Ap&SS...4..387V Approximate Spectral Analysis by Least-squares Fit*] ", "Astrophysics and Space Science", pp.387–391, Volume 4 (1969).] He further developed this method, and analyzed the treatment of unequally spaced samples, in 1971.Vanícek P. " [*http://adsabs.harvard.edu/abs/1971Ap%26SS..12...10V Further development and properties of the spectral analysis by least-squares fit*] ," "Astrophysics and Space Science", pp.10–33, Volume 12 (1971).]The Vaníček method was then simplified in 1976 by Nicholas R. Lomb of the

University of Sydney , who pointed out its close connection toperiodogram analysis.Lomb, N. R., " [*http://adsabs.harvard.edu/abs/1976Ap&SS..39..447L Least-squares frequency analysis of unequally spaced data*] ," "Astrophysics and Space Science" 39, p.447–462 (1976).] The definition of a periodogram of unequally spaced data was subsequently further modified and analyzed by Jeffrey D. Scargle ofNASA Ames Research Center ,Scargle, J. D., " [*http://adsabs.harvard.edu/abs/1982ApJ...263..835S Studies in astronomical time series analysis II: Statistical aspects of spectral analysis of unevenly spaced data*] ," "Astrophysics and Space Science" 302, p.757–763 (1982).] who showed that with minor changes it could be made identical to Lomb's least-squares formula for fitting individual sinusoid frequencies.Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent."

Press summarizes the development this way:

Michael Korenberg of Queen's University in 1989 developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as orthogonal matching pursuit. In 1994, Scott Chen and David Donoho of Stanford University have developed the "basis pursuit" method using miminization of the

L1 norm of coefficients to cast the problem as alinear programming problem, for which efficient solutions are available.**The Vaníček method**In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies, using a standard

linear regression , orleast-squares fit. The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known asmatching pursuit with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids).A data vector "Φ" is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix

**A**by evaluating each function at the sample times, with weight vector "x"::$phi\; approx\; extbf\{A\}x$

where the weight vector "x" is chosen to minimize the sum of squared errors in approximating "Φ". The solution for "x" is closed-form, using standard

linear regression ::$x\; =\; (\; extbf\{A\}^\{mathrm\{T\; extbf\{A\})^\{-1\}\; extbf\{A\}^\{mathrm\{Tphi$.

Here the matrix A can be based on any set of functions that are mutually independent (not necessarily orthogonal) when evaluated at the sample times; for spectral analysis, the functions used are typically sines and cosines evenly distributed over the frequency range of interest. If too many frequencies are chosen in a too-narrow frequency range, the functions will not be sufficiently independent, the matrix will be badly conditioned, and the resulting spectrum will not be meaningful.

When the basis functions in

**A**are orthogonal (that is, not correlated, meaning the columns have zero pair-wisedot product s), the matrix**A**is a diagonal matrix; when the columns all have the same power (sum of squares of elements), then that matrix is an^{T}Aidentity matrix times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and the sinusoids are chosen to be sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycle per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This particular case is known as thediscrete Fourier transform , slightly rewritten in terms of real data and coefficients.:$x\; =\; extbf\{A\}^\{mathrm\{Tphi$ (DFT case for "N" equally spaced samples and frequencies, within a scalar factor)

Lomb proposed using this simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not too closely spaced. This is essentially the traditional

periodogram formulation, but now adopted for use with unevenly spaced samples. The vector "x" is a good estimate of an underlying spectrum, but since correlations are ignored,**A**"x" is no longer a good approximation to the signal, and the method is no longer a least-squares method – yet it has continued to be referred to as such.**The Lomb–Scargle periodogram**Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula to first find a time delay τ such that this pair of sinusoids would be mutually orthogonal at sample times "t

_{j}", and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency, which made his modified periodogram method exactly equivalent to Lomb's least-squares method::$an\{2\; pi\; au\}\; =\; frac\{sum\_j\; sin\; 2\; omega\; t\_j\}\{sum\_j\; cos\; 2\; omega\; t\_j\}.$

The periodogram at frequency ω is then estimated as:

:$P\_x(omega)\; =\; frac\{1\}\{2\}\; left(\; frac\; \{\; left\; [\; sum\_j\; X\_j\; cos\; omega\; (\; t\_j\; -\; au\; )\; ight]\; ^\; 2\}\; \{\; sum\_j\; cos^2\; omega\; (\; t\_j\; -\; au\; )\; \}+\; frac\; \{left\; [\; sum\_j\; X\_j\; sin\; omega\; (\; t\_j\; -\; au\; )\; ight]\; ^\; 2\}\; \{\; sum\_j\; sin^2\; omega\; (\; t\_j\; -\; au\; )\; \}\; ight)$

which Scargle reports then has the same statistical distribution as the periodogram in the evenly-sampled case.

At any individual frequency ω, this method gives the same power as does a least-squares fit to sinusoids of that frequency, of the form

:$phi(t)\; approx\; A\; sin\; omega\; t\; +\; B\; cos\; omega\; t$. [

*cite book | title = Data Analysis Methods in Physical Oceanography | author = William J. Emery, Richard E. Thomson | publisher = Elsevier | year = 2001 | isbn = 0444507566 | url = http://books.google.com/books?id=gYc4fp_ixmwC&pg=PA458&dq=vanicek+least-squares+spectral-analysis+lomb&as_brr=0&ei=aD0kR6HuA53qpwKBjdndAg&sig=b3HrCSnls8gzvZDvnxV7SuO-Sb8#PPA460,M1*]**Korenberg's "fast orthogonal search" method**Michael Korenberg of

Queens University inKingston, Ontario , developed a method for choosing a sparse set of components from an over-complete set, such as sinusoidal components for spectral analysis, called fast orthogonal search (FOS). Mathematically, FOS uses a slightly modifiedCholesky decomposition in a mean-square error reduction (MSER) process, implemented as asparse matrix inversion.cite journal | author = Korenberg, M. J. | title = A robust orthogonal algorithm for system identification and time-series analysis | journal = Biological Cybernetics | volume = 60 | pages = 267–276 | date = 1989 | url = http://www.springerlink.com/content/m675341rn64k22tp/ | doi = 10.1007/BF00204124] [*cite journal | title = Raman Spectral Estimation via Fast Orthogonal Search | url = http://www.rsc.org/Publishing/Journals/AN/article.asp?doi=a700902j | journal = The Analyst | year = 1997 | volume = 122 | pages = 879–882 | doi = 10.1039/a700902j | author = Korenberg, Michael J.*] As with the other LSSA methods, FOS avoids the major shortcoming of discrete Fourier analysis, and can achieve highly accurate identifications of embedded periodicities and excels with unequally-spaced data; the fast orthogonal search method has also been applied to other problems such as nonlinear system identification.**Chen and Donoho's "basis pursuit" method**Chen and Donoho have developed a procedure called "basis pursuit" for fitting a sparse set of sinusoids or other functions from an over-complete set. The method defines an optimal solution as the one that minimizes the

L1 norm of the coefficients, so that the problem can be cast as alinear programming problem, for which efficient solution methods are available.S. Chen and D.L. Donoho (1994), "On Basis Pursuit." Technical Report, Department of Statistics, Stanford University]**Applications**The most useful feature of the LSSA method is enabling incomplete records to be spectrally analyzed, without the need to

manipulate the record or to invent otherwise non-existent data.Magnitudes in the LSSA spectrum depict the contribution of a frequency or period to the

variance of thetime series . Generally, spectral magnitudes defined in the above manner enable the output's straightforwardsignificance level regime.Beard, A.G., Williams, P.J.S., Mitchell, N.J. & Muller, H.G. A special climatology of planetary waves and tidal variability, J Atm. Solar-Ter. Phys. 63 (09), p.801–811 (2001).] Alternatively, magnitudes in the Vanícek spectrum can also be expressed in dB.Pagiatakis, S. Stochastic significance of peaks in the least-squares spectrum, J of Geodesy 73, p.67-78 (1999).] Note that magnitudes in the Vaníček spectrum follow β-distribution. [*Steeves, R.R. A statistical test for significance of peaks in the least squares spectrum, Collected Papers of the Geodetic Survey, Department of Energy, Mines and Resources, Surveys and Mapping, Ottawa, Canada, p.149-166 (1981)*]Inverse transformation of Vaníček's LSSA is possible, as is most easily seen by writing the forward transform as a matrix; the matrix inverse (when the matrix is not singular) or pseudo-inverse will then be an inverse transformation; the inverse will exactly match the original data if the chosen sinusoids are mutually independent at the sample points and their number is equal to the number of data points.Craymer, M.R., [ftp://geod.nrcan.gc.ca/pub/GSD/craymer/pubs/thesis1998.zip "The Least Squares Spectrum, Its Inverse Transform and Autocorrelation Function: Theory and Some Applications in Geodesy"] , Ph.D. Dissertation, University of Toronto, Canada (1998).] No such inverse procedure is known for the periodogram method.

**Implementation**The LSSA can be implemented in less than a page of

MATLAB code. [*cite book | title = Ice Ages and Astronomical Causes: Data, Spectral Analysis, and Mechanisms | author = Richard A. Muller and Gordon J. MacDonald | publisher = Springer | year = 2000 | isbn = 3540437797 | url = http://books.google.com/books?id=P8ideTkMQisC&pg=PA289&dq=spectral+lomb+scargle&as_brr=0&ei=MRAPR5a8BZLcpwLpiaSzBg&sig=ELcyaGCiJLv5mQ_GE22CG_9i6xU#PPA289,M1*] For each frequency in a desired set of frequencies,sine andcosine functions are evaluated at the times corresponding to the data samples, anddot product s of the data vector with the sinusoid vectors are taken and appropriately normalized; following the method known as Lomb/Scargle periodogram, a time shift is calculated for each frequency to orthogonalize the sine and cosine components before the dot product, as described by Craymer; finally, a power is computed from those twoamplitude components. This same process implements adiscrete Fourier transform when the data are uniformly spaced in time and the frequencies chosen correspond to integer numbers of cycles over the finite data record.As Craymer explains, this method treats each sinusoidal component independently, or out of context, even though they may not be orthogonal on the data points, whereas Vaníček's original method does a full simultaneous least-squares fit by solving a matrix equation, partitioning the total data variance between the specified sinusoid frequencies. Such a matrix least-squares solution is natively available in MATLAB as the

backslash operator. [*cite book | title = MATLAB Primer | author = Timothy A. Davis and Kermit Sigmon | publisher = CRC Press | year = 2005 | isbn = 1584885238 | url = http://books.google.com/books?id=MXWypqcHECkC&pg=PA12&dq=matlab+least-squares+backslash&as_brr=0&ei=4R0PR-rLHZuOpwLwjqCzBg&sig=edaQTBMdV8HRXNPh0CmZISKyBxg#PPA12,M1*]Craymer explains that the least-squares method, as opposed to the independent or periodogram version due to Lomb, can not fit more components (sines and cosines) than there are data samples, and further that:

Lomb's periodogram method, on the other hand, can use an arbitrarily high number of, or density of, frequency components, as in a standard

periodogram ; that is, the frequency domain can be over-sampled by an arbitrary factor.In Fourier analysis, such as the

Fourier transform or thediscrete Fourier transform , the sinusoids being fitted to the data are all mutually orthogonal, so there is no distinction between the simple out-of-context dot-product-based projection onto basis functions versus a least-squares fit; that is, no matrix inversion is required to least-squares partition the variance between orthogonal sinusoids of different frequencies. [*cite book | title = Discrete-Time Signal Processing: An Algebraic Approach | author = Darrell Williamson | publisher = Springer | year = 1999 | isbn = 1852331615 | url = http://books.google.com/books?id=JCKAirWQdqkC&pg=PA314&dq=fourier-transform+orthogonal+least-squares&as_brr=3&ei=iJQSR7qlA5uGpgKlztizBg&sig=l_UDUKhE2-oYnp67lWtGTAI7vCg*] This method is usually preferred for its efficientfast Fourier transform implementation, when complete data records with equally spaced samples are available.**ee also***

Orthogonal functions

*Spectral density

*Spectral density estimation **References****External links*** [ftp://ftp.geod.nrcan.gc.ca/pub/GSD/craymer/software/lssa/ LSSA software freeware download] (via ftp), FORTRAN, Vaníček's method, from the

Natural Resources Canada .

*Wikimedia Foundation.
2010.*