Multivariate mutual information

Multivariate mutual information

In information theory there have been various attempts over the years to extend the definition of mutual information to more than two random variables. These attempts have met with a great deal of confusion and a realization that interactions among many random variables are poorly understood.

Contents

Definition

The conditional mutual information can be used to inductively define a multivariate mutual information (MMI) in a set- or measure-theoretic sense in the context of information diagrams. In this sense we define the multivariate mutual information as follows:

I(X_1;\ldots;X_{n+1}) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n|X_{n+1}),

where

I(X_1;\ldots;X_n|X_{n+1}) = \mathbb E_{X_{n+1}}\big(I(X_1;\ldots;X_n)|X_{n+1}\big).

This definition is identical to that of interaction information except for a change in sign in the case of an odd number of random variables.

Properties

Multi-variate information and conditional multi-variate information can be decomposed into a sum of entropies.

 I(X_1;\ldots;X_n) = -\sum_{ T \subseteq \{1,\ldots,n\}   }(-1)^{|T|}H(T)


 I(X_1;\ldots;X_n|Y) = -\sum_{T \subseteq \{1,\ldots,n\} } (-1)^{|T|} H(T|Y)

Example of Positive Multivariate mutual information

Positive MMI is typical of common-cause structures. For example, clouds cause rain and also block the sun; therefore, the correlation between rain and darkness is partly accounted for by the presence of clouds, I(rain;dark|cloud) \leq I(rain;dark). The result is positive MMI I(rain;dark;cloud).

Example of Negative Multivariate mutual information

Xor interaction.png

The case of negative MMI is infamously non-intuitive. A prototypical example of negative I(X;Y;Z) has X as the output of an XOR gate to which Y and Z are the independent random inputs. In this case I(Y;Z) will be zero, but I(Y;Z | X) will be positive (1 bit) since once output X is known, the value on input Y completely determines the value on input Z. Since I(Y;Z | X) > I(Y;Z), the result is negative MMI I(X;Y;Z). It may seem that this example relies on a peculiar ordering of X,Y,Z to obtain the positive interaction, but the symmetry of the definition for I(X;Y;Z) indicates that the same positive interaction information results regardless of which variable we consider as the interloper or conditioning variable. For example, input Y and output X are also independent until input Z is fixed, at which time they are totally dependent.

Common-effect.png

This situation is an instance where fixing the common effect X of causes Y and Z induces a dependency among the causes that did not formerly exist. This behavior is colloquially referred to as explaining away and is thoroughly discussed in the Bayesian Network literature (e.g., Pearl 1988}. Pearl's example is auto diagnostics: A car's engine can fail to start (X) due either to a dead battery (Y) or due to a blocked fuel pump (Z). Ordinarily, we assume that battery death and fuel pump blockage are independent events, because of the essential modularity of such automotive systems. Thus, in the absence of other information, knowing whether or not the battery is dead gives us no information about whether or not the fuel pump is blocked. However, if we happen to know that the car fails to start (i.e., we fix common effect X), this information induces a dependency between the two causes battery death and fuel blockage. Thus, knowing that the car fails to start, if an inspection shows the battery to be in good health, we conclude the fuel pump is blocked.

Battery death and fuel blockage are thus dependent, conditional on their common effect car starting. The obvious directionality in the common-effect graph belies a deep informational symmetry: If conditioning on a common effect increases the dependency between its two parent causes, then conditioning on one of the causes must create the same increase in dependency between the second cause and the common effect. In Pearl's automotive example, if conditioning on car starts induces I(X;Y;Z) bits of dependency between the two causes battery dead and fuel blocked, then conditioning on fuel blocked must induce I(X;Y;Z) bits of dependency between battery dead and car starts. This may seem odd because battery dead and car starts are governed by the implication battery dead \rightarrow car doesn't start. However, these variables are still not totally correlated because the converse is not true. Conditioning on fuel blocked removes the major alternate cause of failure to start, and strengthens the converse relation and therefore the association between battery dead and car starts.

Bounds

The bounds for the 3-variable case are


-min\ \{ I(X;Y|Z), I(Y;Z|X), I(X;Z|Y) \} \leq I(X;Y;Z) \leq min\ \{ I(X;Y), I(Y;Z), I(X;Z) \}


Difficulties

A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively. In fact, for n random variables, there are 2n − 1 degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by the various inequalities in information theory.


See also


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Mutual information — Individual (H(X),H(Y)), joint (H(X,Y)), and conditional entropies for a pair of correlated subsystems X,Y with mutual information I(X; Y). In probability theory and information theory, the mutual information (sometimes known by the archaic term… …   Wikipedia

  • Conditional mutual information — In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third. Contents 1 Definition 2… …   Wikipedia

  • Information theory and measure theory — Measures in information theory = Many of the formulas in information theory have separate versions for continuous and discrete cases, i.e. integrals for the continuous case and sums for the discrete case. These versions can often be generalized… …   Wikipedia

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia

  • Information bottleneck method — The information bottleneck method is a technique introduced by Tishby et al [1] for finding the best tradeoff between accuracy and complexity (compression) when summarizing (e.g. clustering) a random variable X, given a joint probability… …   Wikipedia

  • Interaction information — The interaction information (McGill 1954) or co information (Bell 2003) is one of several generalizations of the mutual information, and expresses the amount information (redundancy or synergy) bound up in a set of variables, beyond that which is …   Wikipedia

  • Total correlation — In probability theory and in particular in information theory, total correlation (Watanabe 1960) is one of several generalizations of the mutual information. It is also known as the multivariate constraint (Garner 1962) or multiinformation… …   Wikipedia

  • Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… …   Wikipedia

  • Granular computing — is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information.… …   Wikipedia

  • Chow-Liu tree — A first order dependency tree representing the product on the left. A Chow Liu tree is an efficient method for constructing a second order product approximation of a joint distribution, first described in a paper by Chow Liu (1968). The goals of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”