- Self-information
In
information theory (elaborated byClaude E. Shannon ,1948 ), self-information is a measure of the information content associated with the "outcome" of arandom variable . It is expressed in a unit ofinformation , for examplebit s,
nats, or
hartleys (also known as digits, dits, bans), depending on the base of the logarithm used in its definition.By definition, the amount of self-information contained in a probabilistic event depends only on the
probability of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred.Further, by definition, the measure of self-information has the following property. If an event "C" is composed of two mutually independent events "A" and "B", then the amount of information at the proclamation that "C" has happened, equals the sum of the amounts of information at proclamations of event "A" and event "B" respectively.
Taking into account these properties, the self-information (measured in
bit s) associated with outcome is::
This definition, using the
binary logarithm function, complies with the above conditions.In the above definition, the logarithm of base 2 was used, and thus the unit of is inbit . When using the logarithm of base , the unit will be in
nat. For the log of base 10, the unit will be in hartley.This measure has also been called surprisal, as it represents the "surprise" of seeing the outcome (a highly probable outcome is not surprising). This term was coined by
Myron Tribus in his1961 book "Thermostatics and Thermodynamics".The
information entropy of a random event is theexpected value of its self-information.Self-information is an example of a
proper scoring rule .Examples
*On tossing a coin, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts to : "I"('tail') = log2 (1/0.5) = log2 2 = 1 bits of information.
*When throwing a fair die, the probability of 'four' is 1/6. When it is proclaimed that 'four' has been thrown, the amount of self-information is :"I"('four') = log2 (1/(1/6)) = log2 (6) = 2.585 bits.
*When, independently, two dice are thrown, the amount of information associated with {throw 1 = 'two' & throw 2 = 'four'} equals :"I"('throw 1 is two & throw 2 is four') = log2 (1/P(throw 1 = 'two' & throw 2 = 'four')) = log2 (1/(1/36)) = log2 (36) = 5.170 bits.
This outcome equals the sum of the individual amounts of self-information associated with {throw 1 = 'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.
*Suppose that the average probability of finding survivors in a large evolving population is "P", then - when a survivor has been found, the amount of self-information will be -loge("P") nats (-log2("P") bits).References
*C.E. Shannon, A Mathematical Theory of Communication, Bell Syst. Techn. J., Vol. 27, pp 379-423, (Part I), 1948.
External links
* [http://www.umsl.edu/~fraundor/egsurpri.html Examples of surprisal measures]
* [http://www.hum.uva.nl/mmm/abstracts/honing-2005f.html Towards a measure of surprise]
* [http://www.cmh.edu/stats/definitions/entropy.htm Entropy and surprisal]
* [http://www.lecb.ncifcrf.gov/~toms/glossary.html#surprisal "Surprisal" entry in a glossary of molecular information theory]
* [http://ilab.usc.edu/surprise/ Bayesian Theory of Surprise]
Wikimedia Foundation. 2010.