Confounding

Confounding: "Confounding factor" redirects here. For other uses, see Confounding factor (disambiguation).

In statistics, a confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. The methodologies of scientific studies therefore need to account for these variables - either through experimental design, in which case, one achieves control, or through statistical means, in which case we are said to account for them - to avoid a false positive (Type I) error; an erroneous conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship. Thus, confounding is a major threat to the validity of inferences made about cause and effect, i.e. internal validity, as the observed effects should be attributed to the independent variable rather than the confounder.

Contents

1 Example

2 Experimental controls

3 Types of confounding

4 See also

5 References

6 External links

Example

For example, consider the statistical relationship between ice cream sales and drowning deaths. These two variables have a positive, and potentially statistically significant, correlation with each other.

At first sight, an evaluator might be tempted to infer a causal relationship in one direction or the other (either that ice cream causes drowning or that drowning causes ice cream consumption):

On one hand, the evaluator might attribute the entirety of the correlation to the causal chain "Since a) a nonzero fraction of people who eat ice cream go swimming shortly thereafter, b) swimming after eating causes cramps in a nonzero fraction of that fraction of people, and c) those cramps cause the inability to swim and the subsequent drowning of a nonzero fraction of the latter fraction, an increase in ice cream sales will cause an increase in drowning deaths."

On the other, the evaluator might attribute the entirety of that correlation to the causal chain "Since a) drowning deaths cause bereavement among almost all of the deceased's loved ones and b) some nonzero fraction of grieving persons console themselves with ice cream, an increase in drowning deaths will cause an increase in ice cream consumption, purchases, and sales."

In turn, if both of these patterns hold true, they will amplify each other, although that amplification is bounded at a horizontal asymptote: Some of the people who eat ice cream and then drown will leave behind grieving loved ones who console themselves with ice cream, some of those ice-cream-eating loved ones will go swimming after eating their ice cream, and some of those ice-cream-eating-and-then-swimming loved ones will drown, etc., but even in a world where these two factors are the only ones in play, the small percentages at issue quickly reduce the amplification at each successive iteration to almost nil.

In the world in which these observations are made, however, although either or both of these causal relationships might hold true in some minute fraction of cases, and although an accordingly minute fraction of the correlation may be attributable to either or both of them, the evaluator will vastly overstate the force of these relationships if s/he does not account for a confounding — and indeed far more influential — variable, namely the season: An increase in average temperature causes both an increase in ice cream consumption/purchases/sales (observed event 1) and an increase in the number of people swimming; furthermore, if the fraction of swimmers who drown remains constant, an increase in the number of people swimming will cause an increase in the number of people who drown (observed event 2). This causal structure is by far the greatest contributor to the observed correlation, and since the season's being summer is by far the greatest contributor to warm weather, summertime is the root cause of an overwhelming majority of each observed increase.

Since the "branches" of the causal "event tree" reintersect in only a vanishingly few cases, for all practical purposes, each of the two observed increases merely coincides with, rather than causing or being caused by, the other.

Experimental controls

There are various ways to modify a study design to actively exclude or control confounding variables:^[1]

Case-control studies assign confounders to both groups, cases and controls, equally. For example if somebody wanted to study the cause of myocardial infarct and thinks that the age is a probable confounding variable, each 67 years old infarct patient will be matched with a healthy 67 year old "control" person. In case-control studies, matched variables most often are the age and sex.

Cohort studies: A degree of matching is also possible and it is often done by only admitting certain age groups or a certain sex into the study population, and thus all cohorts are comparable in regard to the possible confounding variable. For example, if age and sex are thought to be confounders, only 40 to 50 years old males would be involved in a cohort study that would assess the myocardial infarct risk in cohorts that either are physically active or inactive.

Stratification: As in the example above, physical activity is thought to be a behaviour that protects from myocardial infarct; and age is assumed to be a possible confounder. The data sampled is then stratified by age group – this means, the association between activity and infarct would be analyzed per each age group. If the different age groups (or age strata) yield much different risk ratios, age must be viewed as a confounding variable. There exist statistical tools, among them Mantel–Haenszel methods, that account for stratification of data sets.

Controlling for confounding by measuring the known confounders and including them as covariates in multivariate analyses; however, multivariate analyses reveal much less information about the strength of the confounding variable than do stratification methods.

All these methods have their drawbacks:

Case-control studies are feasible only when it is easy to find controls, i.e., persons whose status vis-à-vis all known potential confounding factors is the same as that of the case's patient: Suppose a case-control study attempts to find the cause of a given disease in a person who is 1) 45 years old, 2) African-American, 3) from Alaska, 4) an avid football player, 5) vegetarian, and 6) working in education. A theoretically perfect control would be a person who, in addition to not having the disease being investigated, matches all these characteristics and has no diseases that the patient does not also have — but finding such a control would be an enormous task.

In cohort studies, the overexclusion of input data may lead researchers to define too narrowly the set of similarly situated persons for whom they claim the study to be useful, such that other persons to whom the causal relationship does in fact apply may lose the opportunity to benefit from the study's recommendations. Similarly, "over-stratification" of input data within a study may reduce the sample size in a given stratum to the point where generalizations drawn by observing the members of that stratum alone are not statistically significant.

Both case-control studies and cohort studies are inevitably subject to the possibility of "residual confounding": If one or more unknown, improperly quantified, or unquantifiable confounding factors are present, then a study will be tainted unknown to the researchers involved.

The best available defense against this possibility is often to dispense with efforts at stratification and instead conduct a randomized study of a sufficiently large sample taken as a whole, such that all confounding variables (known and unknown) will be distributed by chance across all study groups.

Types of confounding

Confounding by indication^[2]: Evaluating treatment effects from observational data is problematic. Prognostic factors may influence treatment decisions, producing a type of bias referred to as "confounding by indication". Controlling for known prognostic factors may reduce this problem, but it is always possible that a forgotten or unknown factor was not included or that factors interact complexly. Confounding by indication has been described as the most important limitation of observational studies of treatment effects. Randomized trials are not affected by confounding by indication.

Confounding variables may also be categorised according to their source: the choice of measurement instrument (operational compound), situational characteristics (procedural confound), or inter-individual differences (person confound).

An operational confound is a type of confound that can occur in both experimental and nonexperimental research designs. This type of confound occurs when a measure designed to assess a particular construct inadvertently measures something else as well.^[3]

A procedural confound is a type of confound that can occur in a laboratory experiment or a quasi-experiment. This type of confound occurs when the researcher mistakenly allows another variable to change along with the manipulated independent variable.^[3]

See also

Statistics portal

Anecdotal evidence

Joint effect

Simpson's paradox

Procedural confound

Operational confound

References

^ Mayrent, Sherry L (1987). Epidemiology in Medicine. Lippincott Williams & Wilkins. ISBN 0-316-35636-0.

^ Johnston SC. Identifying Confounding by Indication through Blinded Prospective Review. Am J Epidemiol 2001;154:276–84

^ ^a ^b Pelham, Brett (2006). Conducting Research in Psychology. Belmont: Wadsworth Publishing. ISBN 0534532942.

External links

These sites contain descriptions or examples of confounding variables:

Linear Regression (Yale University)

Scatterplots (Simon Fraser University)

Pearl, J. "Why there is no statistical test for confounding, why many think there is, and why they are almost right," UCLA Computer Science Department, Technical Report R-256, January 1998

Tutorial by University of New England

This textbook has a nice overview of confounding factors and how to account for them in design of experiments:

Design and Analysis of Experiments, D. C. Montgomery, see Section 7-3 in 6th edition (2005, John Wiley & Sons)

v · d · eStatistics

Descriptive statistics

Continuous data

Location

Mean (Arithmetic, Geometric, Harmonic) · Median · Mode

Dispersion

Range · Standard deviation · Coefficient of variation · Percentile · Interquartile range

Shape

Variance · Skewness · Kurtosis · Moments · L-moments

Count data

Index of dispersion

Summary tables

Grouped data · Frequency distribution · Contingency table

Dependence

Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Scatter plot

Statistical graphics

Bar chart · Biplot · Box plot · Control chart · Correlogram · Forest plot · Histogram · Q-Q plot · Run chart · Scatter plot · Stemplot · Radar chart

Data collection

Designing studies

Effect size · Standard error · Statistical power · Sample size determination

Survey methodology

Sampling · Stratified sampling · Opinion poll · Questionnaire

Controlled experiment

Design of experiments · Factorial experiment · Randomized experiment · Random assignment · Replication · Blocking · Optimal design

Uncontrolled studies

Natural experiment · Quasi-experiment · Observational study

Statistical inference

Statistical theory

Sampling distribution · Sufficient statistic · Meta-analysis

Bayesian inference

Bayesian probability · Prior · Posterior · Credible interval · Bayes factor · Bayesian estimator · Maximum posterior estimator

Frequentist inference

Confidence interval · Hypothesis testing · Likelihood-ratio

Specific tests

Z-test (normal) · Student's t-test · F-test · Pearson's chi-squared test · Wald test · Mann–Whitney U · Shapiro–Wilk · Signed-rank · Kolmogorov–Smirnov test

General estimation

Mean-unbiased · Median-unbiased · Maximum likelihood · Method of moments · Minimum distance · Density estimation

Correlation and regression analysis

Correlation

Pearson product-moment correlation · Partial correlation · Confounding variable · Coefficient of determination

Regression analysis

Errors and residuals · Regression model validation · Mixed effects models · Simultaneous equations models

Linear regression

Simple linear regression · Ordinary least squares · General linear model · Bayesian regression

Non-standard predictors

Nonlinear regression · Nonparametric · Semiparametric · Isotonic · Robust

Generalized linear model

Exponential families · Logistic (Bernoulli) · Binomial · Poisson

Partition of variance

Analysis of variance (ANOVA) · Analysis of covariance · Multivariate ANOVA · Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data

Cohen's kappa · Contingency table · Graphical model · Log-linear model · McNemar's test

Multivariate statistics

Multivariate regression · Principal components · Factor analysis · Cluster analysis · Copulas

Time series analysis

Decomposition (Trend · Stationary process) · ARMA model · ARIMA model · Vector autoregression · Spectral density estimation

Survival analysis

Survival function · Kaplan–Meier · Logrank test · Failure rate · Proportional hazards models · Accelerated failure time model

Applications

Biostatistics

Bioinformatics · Biometrics · Clinical trials & studies · Epidemiology · Medical statistics · Pharmaceutical statistics

Engineering statistics

Methods engineering · Probabilistic design · Process & Quality control · Reliability · System identification

Social statistics

Actuarial science · Census · Crime statistics · Demography · Econometrics · National accounts · Official statistics · Population · Psychometrics

Spatial statistics

Cartography · Environmental statistics · Geographic information system · Geostatistics · Kriging

Category · Portal · Outline · Index

Categories:
Design of experiments
Analysis of variance
Statistical terminology

Игры ⚽ Нужен реферат?

Look at other dictionaries:

confounding — adj. tending to contradict (a hypothesis). Syn: contradictory. [WordNet 1.5] … The Collaborative International Dictionary of English
confounding — n. a mistake that results from taking one thing to be another. Syn: confusion, mix up. [WordNet 1.5] … The Collaborative International Dictionary of English
confounding — index enigmatic, labyrinthine Burton s Legal Thesaurus. William C. Burton. 2006 … Law dictionary
Confounding — Confound Con*found (k[o^]n*found ), v. t. [imp. & p. p. {Confounded}; p. pr. & vb. n. {Confounding}.] [F. confondre, fr. L. confundere, fusum, to pour together; con + fundere to pour. See {Fuse} to melt, and cf. {Confuse}.] 1. To mingle and blend … The Collaborative International Dictionary of English
Confounding Factor (games company) — For other uses, see Confounding factor (disambiguation). Confounding Factor Former type Video game developer Industry Video games Founded 1997 Founder(s) Toby Gard, Paul Douglas Defunct … Wikipedia
Confounding factor (disambiguation) — In statistics, a confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the… … Wikipedia
confounding variable — noun An extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. Syn: confounding factor … Wiktionary
confounding variable — iškraipantysis veiksnys statusas T sritis biomedicinos mokslai apibrėžtis Veiksnys, iškraipantis tiriamo veiksnio ryšį su liga. atitikmenys: angl. confounding variable šaltinis Pagrindinės epidemiologijos sąvokos : mokomasis žodynas / Kauno… … Lithuanian dictionary (lietuvių žodynas)
Confounding — Die Artikel Confounder und Konfundierungseffekt überschneiden sich thematisch. Hilf mit, die Artikel besser voneinander abzugrenzen oder zu vereinigen. Beteilige dich dazu an der Diskussion über diese Überschneidungen. Bitte entferne diesen… … Deutsch Wikipedia
confounding — 1. A situation in which the effects of two or more processes are not separated; the distortion of the apparent effect of an exposure on risk, brought about by the association with other factors that can influence the outcome. 2. A relationship… … Medical dictionary

Academic Dictionaries and Encyclopedias

Confounding

Contents

Example

Experimental controls

Types of confounding

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Confounding

Contents

Example

Experimental controls

Types of confounding

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link