Partial correlation

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.

1 Formal definition
2 Computation
3 Interpretation
- 3.1 Geometrical
- 3.2 As conditional independence test
4 Semipartial correlation (part correlation)
5 Use in time series analysis
6 See also
7 External links
8 References
- 8.1 Other

Formal definition

Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z₁, Z₂, …, Z_n}, written ρ_XY·Z, is the correlation between the residuals R_X and R_Y resulting from the linear regression of X with Z and of Y with Z, respectively. In fact, the first-order partial correlation is nothing else than a difference between a correlation and the product of the removable correlations divided by the product of the coefficients of alienation of the removable correlations. The coefficient of alienation, and its relation with joint variance through correlation are available in Guilford (1973, pp. 344–345).

Computation

Using linear regression

A simple way to compute the partial correlation for some data is to solve the two associated linear regression problems, get the residuals, and calculate the correlation between the residuals. If we write x_i, y_i and z_i to denote i.i.d. samples of some joint probability distribution over X, Y and Z, solving the linear regression problem amounts to finding n-dimension vectors

$\mathbf{w}_X^* = \arg\min_{\mathbf{w}} \left\{ \sum_{i=1}^N (x_i - \langle\mathbf{w}, \mathbf{z}_i \rangle)^2 \right\}$

$\mathbf{w}_Y^* = \arg\min_{\mathbf{w}} \left\{ \sum_{i=1}^N (y_i - \langle\mathbf{w}, \mathbf{z}_i \rangle)^2 \right\}$

with N being the number of samples and $\langle\mathbf{v},\mathbf{w} \rangle$ the scalar product between the vectors v and w. Note that in some implementations the regression includes a constant term, so the matrix $\mathbf{z}$ would have an additional column of ones.

The residuals are then

$r_{X,i} = x_i - \langle\mathbf{w}_X^*,\mathbf{z}_i \rangle$

$r_{Y,i} = y_i - \langle\mathbf{w}_Y^*,\mathbf{z}_i \rangle$

and the sample partial correlation is

$\hat{\rho}_{XY\cdot\mathbf{Z}}=\frac{N\sum_{i=1}^N r_{X,i}r_{Y,i}-\sum_{i=1}^N r_{X,i}\sum_{i=1}^N r_{Y,i}} {\sqrt{N\sum_{i=1}^N r_{X,i}^2-\left(\sum_{i=1}^N r_{X,i}\right)^2}~\sqrt{N\sum_{i=1}^N r_{Y,i}^2-\left(\sum_{i=1}^N r_{Y,i}\right)^2}}.$

Using recursive formula

It can be computationally expensive to solve the linear regression problems. Actually, the nth-order partial correlation (i.e., with |Z| = n) can be easily computed from three (n - 1)th-order partial correlations. The zeroth-order partial correlation ρ_XY·Ø is defined to be the regular correlation coefficient ρ_XY.

It holds, for any $Z_0 \in \mathbf{Z}$ :

$\rho_{XY\cdot \mathbf{Z} } = \frac{\rho_{XY\cdot\mathbf{Z}\setminus\{Z_0\}} - \rho_{XZ_0\cdot\mathbf{Z}\setminus\{Z_0\}}\rho_{Z_0Y\cdot\mathbf{Z}\setminus\{Z_0\}}} {\sqrt{1-\rho_{XZ_0\cdot\mathbf{Z}\setminus\{Z_0\}}^2} \sqrt{1-\rho_{Z_0Y\cdot\mathbf{Z}\setminus\{Z_0\}}^2}}.$

Naïvely implementing this computation as a recursive algorithm yields an exponential time complexity. However, this computation has the overlapping subproblems property, such that using dynamic programming or simply caching the results of the recursive calls yields a complexity of $\mathcal{O}(n^3)$ .

Note in the case where Z is a single variable, this reduces to:

$\rho_{XY\cdot Z } = \frac{\rho_{XY} - \rho_{XZ}\rho_{ZY}} {\sqrt{1-\rho_{XZ}^2} \sqrt{1-\rho_{ZY}^2}}.$

Using matrix inversion

In $\mathcal{O}(n^3)$ time, another approach allows all partial correlations to be computed between any two variables X_i and X_j of a set V of cardinality n, given all others, i.e., $\mathbf{V} \setminus \{X_i,X_j\}$ , if the correlation matrix (or alternatively covariance matrix) Ω = (ω_ij), where ω_ij = ρ_{X_iX_j}, is invertible^{[citation needed]} . If we define P = Ω⁻¹, we have:

$\rho_{X_iX_j\cdot \mathbf{V} \setminus \{X_i,X_j\}} = -\frac{p_{ij}}{\sqrt{p_{ii}p_{jj}}}.$

Interpretation

Geometrical interpretation of partial correlation

Geometrical

Let three variables X, Y, Z [where x is the Independent Variable (IV), y is the Dependent Variable (DV), and Z is the "control" or "extra variable"] be chosen from a joint probability distribution over n variables V. Further let v_i, 1 ≤ i ≤ N, be N n-dimensional i.i.d. samples taken from the joint probability distribution over V. We then consider the N-dimensional vectors x (formed by the successive values of X over the samples), y (formed by the values of Y) and z (formed by the values of Z).

It can be shown that the residuals R_X coming from the linear regression of X using Z, if also considered as an N-dimensional vector r_X, have a zero scalar product with the vector z generated by Z. This means that the residuals vector lives on a hyperplane S_z that is perpendicular to z.

The same also applies to the residuals R_Y generating a vector r_Y. The desired partial correlation is then the cosine of the angle φ between the projections r_X and r_Y of x and y, respectively, onto the hyperplane perpendicular to z.^[1]

As conditional independence test

Semipartial correlation (part correlation)

The semipartial (or part) correlation statistic is similar to the partial correlation statistic. Both measure variance after certain factors are controlled for, but to calculate the semipartial correlation one holds the third variable constant for either X or Y, whereas for partial correlations one holds the third variable constant for both. The semipartial correlation measures unique and joint variance while the partial correlation measures unique variance. The semipartial (or part) correlation can be viewed as more practically relevant "because it is scaled to (i.e., relative to) the total variability in the dependent (response) variable." ^[5] Conversely, it is less theoretically useful because it is less precise about the unique contribution of the independent variable. Although it may seem paradoxical, the semipartial correlation of X'' with Y is always less than the partial correlation of X with Y.

Use in time series analysis

In time series analysis, the partial autocorrelation function (sometimes "partial correlation function") of a time series is defined, for lag h, as

$\phi(h)= \rho_{X_0X_h\cdot \{X_1,\dots,X_{h-1} \}}.$

External links

Prokhorov, A.V. (2001), "Partial correlation coefficient", in Hazewinkel, Michiel, Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104, http://eom.springer.de/P/p071610.htm
What is a partial correlation?
Mathematical formulae in the "Description" section of the IMSL Numerical Library PCORR routine
A three-variable example

References

^ Rummel, R. J. (1976). "Understanding Correlation". http://www.hawaii.edu/powerkills/UC.HTM.
^ Baba, Kunihiro; Ritei Shibata & Masaaki Sibuya (2004). "Partial correlation and conditional correlation as measures of conditional independence". Australian and New Zealand Journal of Statistics 46 (4): 657–664. doi:10.1111/j.1467-842X.2004.00360.x.
^ Kendall MG, Stuart A. (1973) The Advanced Theory of Statistics, Volume 2 (3rd Edition), ISBN 0-85264-215-6, Section 27.22
^ Fisher, R.A. (1924). "The distribution of the partial correlation coefficient". Metron 3 (3–4): 329–332. http://digital.library.adelaide.edu.au/dspace/handle/2440/15182.
^ StatSoft, Inc. (2010). "Semi-Partial (or Part) Correlation", Electronic Statistics Textbook. Tulsa, OK: StatSoft, accessed January 15, 2011.

Other

Guilford J. P., Fruchter B. (1973). Fundamental statistics in psychology and education. Tokyo: MacGraw-Hill Kogakusha, LTD..

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) · Median · Mode

Dispersion	Range · Standard deviation · Coefficient of variation · Percentile · Interquartile range

Shape	Variance · Skewness · Kurtosis · Moments · L-moments

Count data

Index of dispersion

Summary tables

Grouped data · Frequency distribution · Contingency table

Dependence

Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Scatter plot

Statistical graphics

Bar chart · Biplot · Box plot · Control chart · Correlogram · Forest plot · Histogram · Q-Q plot · Run chart · Scatter plot · Stemplot · Radar chart

Data collection

Designing studies	Effect size · Standard error · Statistical power · Sample size determination

Survey methodology	Sampling · Stratified sampling · Opinion poll · Questionnaire

Controlled experiment	Design of experiments · Factorial experiment · Randomized experiment · Random assignment · Replication · Blocking · Optimal design

Uncontrolled studies	Natural experiment · Quasi-experiment · Observational study

Statistical inference

Statistical theory	Sampling distribution · Sufficient statistic · Meta-analysis

Bayesian inference	Bayesian probability · Prior · Posterior · Credible interval · Bayes factor · Bayesian estimator · Maximum posterior estimator

Frequentist inference	Confidence interval · Hypothesis testing · Likelihood-ratio

Specific tests	Z-test (normal) · Student's t-test · F-test · Pearson's chi-squared test · Wald test · Mann–Whitney U · Shapiro–Wilk · Signed-rank · Kolmogorov–Smirnov test

General estimation	Mean-unbiased · Median-unbiased · Maximum likelihood · Method of moments · Minimum distance · Density estimation

Correlation and regression analysis

Correlation	Pearson product-moment correlation · Partial correlation · Confounding variable · Coefficient of determination

Regression analysis	Errors and residuals · Regression model validation · Mixed effects models · Simultaneous equations models

Linear regression	Simple linear regression · Ordinary least squares · General linear model · Bayesian regression

Non-standard predictors	Nonlinear regression · Nonparametric · Semiparametric · Isotonic · Robust

Generalized linear model	Exponential families · Logistic (Bernoulli) · Binomial · Poisson

Partition of variance	Analysis of variance (ANOVA) · Analysis of covariance · Multivariate ANOVA · Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data	Cohen's kappa · Contingency table · Graphical model · Log-linear model · McNemar's test

Multivariate statistics	Multivariate regression · Principal components · Factor analysis · Cluster analysis · Copulas

Time series analysis	Decomposition (Trend · Stationary process) · ARMA model · ARIMA model · Vector autoregression · Spectral density estimation

Survival analysis	Survival function · Kaplan–Meier · Logrank test · Failure rate · Proportional hazards models · Accelerated failure time model

Applications

Biostatistics	Bioinformatics · Biometrics · Clinical trials & studies · Epidemiology · Medical statistics · Pharmaceutical statistics

Engineering statistics	Methods engineering · Probabilistic design · Process & Quality control · Reliability · System identification

Social statistics	Actuarial science · Census · Crime statistics · Demography · Econometrics · National accounts · Official statistics · Population · Psychometrics

Spatial statistics	Cartography · Environmental statistics · Geographic information system · Geostatistics · Kriging

Category · Portal · Outline · Index

Categories:

Articles to be expanded with sources
Covariance and correlation
Time series analysis

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

partial correlation — dalinė koreliacija statusas T sritis fizika atitikmenys: angl. partial correlation vok. partielle Korrelation, f rus. частная корреляция, f pranc. corrélation partielle, f … Fizikos terminų žodynas
partial correlation — noun a correlation between two variables when the effects of one or more related variables are removed • Topics: ↑statistics • Hypernyms: ↑correlation, ↑correlational statistics • Hyponyms: ↑first order correlation * * * … Useful english dictionary
Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation … Wikipedia
Partial autocorrelation function — In time series analysis, the partial autocorrelation function (PACF) or PARtial autoCORrelation (PARCOR) plays an important role in data analyses aimed at identifying the extent of the lag in an autoregressive model. The use of this function was… … Wikipedia
Partial regression plot — In applied statistics, a partial regression plot attempts to show the effect of adding an additional variable to the model (given that one or more indpendent variables are already in the model). Partial regression plots are also referred to as… … Wikipedia
Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation … Wikipedia
Corrélation partielle (statistiques) — Sommaire 1 Formule 2 Démonstration géométrique 3 Domaines d application 4 Références Formule Le coef … Wikipédia en Français
corrélation partielle — dalinė koreliacija statusas T sritis fizika atitikmenys: angl. partial correlation vok. partielle Korrelation, f rus. частная корреляция, f pranc. corrélation partielle, f … Fizikos terminų žodynas
Partial androgen insensitivity syndrome — Classification and external resources AIS results when the function of the androgen receptor (AR) is impaired. The AR protein (pictured) mediates the effects of androgens in the human body. ICD 10 … Wikipedia
Correlation attack — In cryptography, correlation attacks are a class of known plaintext attacks for breaking stream ciphers whose keystream is generated by combining the output of several linear feedback shift registers (called LFSRs for the rest of this article)… … Wikipedia

Academic Dictionaries and Encyclopedias

Partial correlation

Contents

Formal definition

Computation

Using linear regression

Using recursive formula

Using matrix inversion

Interpretation

Geometrical

As conditional independence test

Semipartial correlation (part correlation)

Use in time series analysis

See also

External links

References

Other

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Partial correlation

Contents

Formal definition

Computation

Using linear regression

Using recursive formula

Using matrix inversion

Interpretation

Geometrical

As conditional independence test

Semipartial correlation (part correlation)

Use in time series analysis

See also

External links

References

Other

Look at other dictionaries:

Share the article and excerpts

Direct link