- Fisher information
In
statistics andinformation theory , the Fisher information (denoted ) is thevariance of the score. It is named in honor of its inventor, thestatistician R.A. Fisher.Definition
The Fisher information is a way of measuring the amount of
information that an observablerandom variable "X" carries about an unknownparameter θ upon which thelikelihood function of , , depends. The likelihood function is the joint probability of the data, the "X"s, conditional on the value of θ, "as a function of θ". Since the expectation of the score is zero, thevariance is simply the second moment of the score, the derivative of the log of thelikelihood function with respect to θ. Hence the Fisher information can be written:
which implies . The Fisher information is thus the expectation of the squared score. A random variable carrying high Fisher information implies that the absolute value of the score is often high.
The Fisher information is not a function of a particular observation, as the random variable "X" has been averaged out. The concept of information is useful when comparing two methods of observing a given random process.
If the following regularity condition is met:
:
then the Fisher information may also be written as:
:
Thus Fisher information is the negative of the expectation of the second
derivative of the log of "f" with respect to θ.Information may thus be seen to be a measure of the "sharpness" of thesupport curve near the maximum likelihood estimate of θ. A "blunt" support curve (one with a shallow maximum) would have a low expected second derivative, and thus low information; while a sharp one would have a high expected second derivative and thus high information.Information is additive, in that the information yielded by two independent experiments is the sum of the information from each experiment separately:
:
This result follows from the elementary fact that if random variables are independent, the variance of their sum is the sum of their variances.Hence the information in a random sample of size "n" is "n" times that in a sample of size 1 (if observations are independent).
The information provided by a sufficient statistic is the same as that of the sample "X". This may be seen by using Neyman's factorization criterion for a sufficient statistic. If is sufficient for θ, then
:
for some functions "g" and "h". See sufficient statistic for a more detailed explanation. The equality of information then follows from the following fact:
:
which follows from the definition of Fisher information, and the independence of from θ. More generally, if is a
statistic , then:
with equality if and only if "T" is a
sufficient statistic .The
Cramér-Rao inequality states that the inverse of the Fisher information is a lower bound on the variance of anyunbiased estimator of θ.Informal derivation
Van Trees (1968) and Frieden (2004) provide the following method of deriving the Fisher information informally:
Consider an
unbiased estimator . Mathematically, we write:
The likelihood function describes the probability that we observe a given sample "given" a known value of . If is sharply peaked, it is easy to intuit the "correct" value of given the data, and hence the data contains a lot of information about the parameter. If the likelihood is flat and spread-out, then it would take many, many samples of to estimate the actual "true" value of . Therefore, we would intuit that the data contain much less information about the parameter.
Now, given the unbiased-ness condition above, we differentiate it to get
:
We now make use of two facts. The first is that the likelihood is just the probability of the data given the parameter. Since it is a probability, it must be normalized, implying that
:.
Second, we know from basic calculus that
:.
Using these two facts in the above let us write
:
Factoring the integrand gives
:
If we square the equation, the
Cauchy-Schwarz inequality lets us write:
The right-most factor is defined to be the Fisher Information
:
The left-most factor is the expected mean-squared error of the estimator , since
:
Notice that the inequality tells us that, fundamentally,
:
In other words, the precision to which we can estimate is fundamentally limited by the Fisher Information of likelihood function.
ingle-parameter Bernoulli experiment
A
Bernoulli trial is a random variable with two possible outcomes, "success" and "failure", with "success" having a probability of . The outcome can be thought of as determined by a coin toss, with the probability of obtaining a "head" being and the probability of obtaining a "tail" being .The Fisher information contained in "n" independent
Bernoulli trial s may be calculated as follows. In the following, "A" represents the number of successes, "B" the number of failures, and is the total number of trials.:
::
::
:: (on differentiating ln "x", see
logarithm )::
:: (as the expected value of , etc.)
::
(1) defines Fisher information.(2) invokes the fact that the information in a
sufficient statistic is the same as that of the sample itself.(3) expands the log term and drops a constant.(4) and (5) differentiate with respect to .(6) replaces "A" and "B" with their expectations. (7) is algebra.The end result, namely,:
is the reciprocal of the
variance of the mean number of successes in "n"Bernoulli trial s, as expected (see last sentence of the preceding section).Matrix form
When there are "N" parameters, so that θ is a "N"x1 vector , then the Fisher information takes the form of an "N"x"N" matrix, the Fisher Information Matrix (FIM), with typical element:
:
The FIM is a "N"x"N" positive definite
symmetric matrix , defining a metric on the "N"-dimension alparameter space . Exploring this topic requiresdifferential geometry .Orthogonal parameters
We say that two parameters and are orthogonal if the element of the i-th row and j-th column of the Fisher Information Matrix is zero. Orthogonal parameters are easy to deal with in the sense that their maximum likelihood estimates are independent and can be calculated separately. When dealing with research problems, it is very common for the researcher to invest some time searching for an orthogonal parametrization of the densities involved in the problem.
Multivariate normal distribution
The FIM for a "N"-variate
multivariate normal distribution has a special form. Let and let be thecovariance matrix . Then the typical element , 0 ≤ "m", "n" < "N", of the FIM for is::
where denotes the
transpose of a vector, denotes the trace of asquare matrix , and:*
*
Properties
The Fisher information depends on the parametrization of the problem. If θ and η are two different parameterizations of a problem, such that and "h" is a differentiable function, then:where and are the Fisher information measures of η and θ, respectively. [Lehmann and Casella, eq. (5.2.11).]
ee also
*
Formation matrix Other measures employed in
information theory :
*Self-information
*Kullback-Leibler divergence
*Shannon entropy Notes
References
*cite book
last = Schervish
first = Mark J.
title = Theory of Statistics
publisher = Springer
year = 1995
location = New York
pages = Section 2.3.1
isbn = 0387945466*cite book
last = Van Trees
first = H. L.
title = Detection, Estimation, and Modulation Theory, Part I
publisher = Wiley
year = 1968
location = New York
isbn = 0471095176*cite book
last = Frieden
first = B. Roy
title = Science from Fisher Information
publisher = Cambridge University Press
year = 2004
location = New York
pages = p. 29-30
isbn = 0521009111* cite book
last = Lehmann
first = E. L.
coauthors = Casella, G.
title = Theory of Point Estimation
year = 1998
publisher = Springer
isbn = 0-387-98502-6
pages = 2nd edFurther weblinks
* James Case: [http://www.siam.org/pdf/news/659.pdf An Unexpected Union — Physics and Fisher Information] , SIAM News, Volume 33, Number 6 (a review of the book "Physics from Fisher Information: A Unification" by B. Roy Frieden)
* D. A. Lavis and R. F. Streater: [http://www.mth.kcl.ac.uk/~dlavis/papers/frieden.pdf Physics from Fisher Information] (a critical review of B. Roy Frieden´s approach to deriving laws of physics from the Fisher information)
* [http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=20008&objectType=File Fisher4Cast: a Matlab, GUI-based Fisher information tool] for research and teaching, primarily aimed at cosmological forecasting applications.
Wikimedia Foundation. 2010.