Statistical power

The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β.

Power analysis can either be done before ("a priori") or after ("post hoc") data is collected. "A priori" power analysis is conducted prior to the research study, and is typically used to determine an appropriate sample size to achieve adequate power. "Post-hoc" power analysis is conducted after a study has been completed, and uses the obtained sample size and effect size to determine what the power was in the study, assuming the effect size in the sample is equal to the effect size in the population.

Statistical tests use data from samples to determine if differences or similarities exist in a population. For example, to test the null hypothesis that the mean scores of men and women on a test do not differ, samples of men and women are drawn, the test is administered to them, and the mean score of one group is compared to that of the other group using a statistical test. The power of the test is the probability that the test will find a statistically significant difference between men and women, as a function of the size of the true difference between those two populations. Despite the use of random samples, which will tend to mirror the population due to mathematical properties such as the central limit theorem, there is always a chance that the samples will appear to support or refute a tested hypothesis when the reality is the opposite. This risk is quantified as the power of the test and as the statistical significance level used for the test.

Statistical power depends on:
* the statistical significance criterion used in the test
* the size of the difference or the strength of the similarity (that is, the effect size) in the population
* the sensitivity of the data.

A significance criterion is a statement of how unlikely a result must be, if the null hypothesis is true, to be considered significant. The most commonly used criteria are probabilities of 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000). If the criterion is 0.05, the probability of the difference must be less than 0.05, and so on. One way to increase the power of a test is to increase (that is, weaken) the significance level. This increases the chance of obtaining a statistically significant result (rejecting the null hypothesis) when the null hypothesis is false, that is, reduces the risk of a Type II error. But it also increases the risk of obtaining a statistically significant result when the null hypothesis is in fact true; that is, it increases the risk of a Type I error.

Calculating the power requires first specifying the effect size you want to detect. The greater the effect size, the greater the power.

Sensitivity can be increased by using statistical controls, by increasing the reliability of measures (as in psychometric reliability), and by increasing the size of the sample. Increasing sample size is the most commonly used method for increasing statistical power.

Although there are no formal standards for power, most researchers who assess the power of their tests use 0.80 as a standard for adequacy.

A common misconception by those new to statistical power is that power is a property of a study or experiment. In reality any statistical result that has a p-value has an associated power. For example, in the context of a single multiple regression, there will be a different level of statistical power associated with the overall r-square and for each of the regression coefficients. When determining an appropriate sample size for a planned study, it is important to consider that power will vary across the different hypotheses.

There are times when the recommendations of power analysis regarding sample size will be inadequate. Power analysis is appropriate when the concern is with the correct acceptance or rejection of a null hypothesis. In many contexts, the issue is less about determining if there is or is not a difference but rather with getting a more refined estimate of the population effect size. For example, if we were expecting a population correlation between intelligence and job performance of around .50, a sample size of 20 will give us approximately 80% power (alpha = .05, two-tail). However, in doing this study we are probably more interested in knowing whether the correlation is .30 or .60 or .50. In this context we would need a much larger sample size in order to reduce the confidence interval of our estimate to a range that is acceptable for our purposes. These and other considerations often result in the recommendation that when it comes to sample size, "More is better!"

Funding agencies, ethics boards and research review panels frequently request that a researcher perform a power analysis. The argument is that if a study is inadequately powered, there is no point in completing the research.

ee also

*Effect size
*Sample size
*Neyman-Pearson lemma

External links

* [http://www.indiana.edu/~statmath/stat/all/power/power.pdf Hypothesis Testing and Statistical Power of a Test]
* [http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/ G*Power – A free program for Statistical Power Analysis]
* [http://cran.r-project.org/web/packages/pwr/index.html R/Splus package of power analysis functions along the lines of Cohen (1988)]
* [http://www.danielsoper.com/statcalc/calc01.aspx Free A-priori Sample Size Calculator for Multiple Regression] from Daniel Soper's "Free Statistics Calculators" website. Computes the minimum required sample size for a study, given the alpha level, the number of predictors, the anticipated effect size, and the desired statistical power level.

References

* Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. (2nd ed.) 1988. ISBN 0-8058-0283-5.

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

statistical power — noun The probability that a statistical test will reject a false null hypothesis, that is, that it will not make a type II error, producing a false negative … Wiktionary
Power — may refer to*any ability to effect change;political or social * Power (philosophy) ** Political power, power held by a person or group in a country s political system ** Reserve power, a power exercised by a head of state in certain exceptional… … Wikipedia
Power — (englisch für Kraft, Macht, Energie) oder Teststärke beschreibt in der Statistik die Aussagekraft eines statistischen Tests. Die Teststärke gibt an, mit welcher Wahrscheinlichkeit ein Signifikanztest zugunsten einer spezifischen… … Deutsch Wikipedia
Power function — may refer to:* Statistical power * Monomial * Power functions: A function of the form f(x)=x^{a} ! where a is a real number, also known as allometric functions. See exponentiation … Wikipedia
Statistical hypothesis testing — This article is about frequentist hypothesis testing which is taught in introductory statistics. For Bayesian hypothesis testing, see Bayesian inference. A statistical hypothesis test is a method of making decisions using data, whether from a… … Wikipedia
Power law — A power law is any polynomial relationship that exhibits the property of scale invariance. The most common power laws relate two variables and have the form:f(x) = ax^k! +o(x^k),where a and k are constants, and o(x^k) is of x. Here, k is… … Wikipedia
Statistical significance — In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. A statistically significant difference simply means there is statistical evidence that there is a difference; it does not mean the… … Wikipedia
Statistical inference — In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical inference,… … Wikipedia
Statistical dispersion — In statistics, statistical dispersion (also called statistical variability or variation) is variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard… … Wikipedia
Power analysis — For power analysis in statistics, see Statistical power. A diagram of differential power analysis … Wikipedia

Academic Dictionaries and Encyclopedias

Statistical power

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Statistical power

Look at other dictionaries:

Share the article and excerpts

Direct link