Joint probability distribution

In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

1 Example
2 Cumulative distribution
3 Discrete case
4 Continuous case
5 Mixed case
6 General multidimensional distributions
7 Joint distribution for independent variables
8 Joint Distribution for conditionally independent variables
9 See also
10 External links

Example

Consider the roll of a die and let $A = 1$ if the number is even (i.e. 2,4, or 6) and $A = 0$ otherwise. Furthermore, let $B = 1$ if the number is prime (i.e. 2,3, or 5) and $B = 0$ otherwise. Then, the joint distribution of $A$ and $B$ is

$\mathrm{P}(A=0,B=0)=P\{1\}=\frac{1}{6},\; \mathrm{P}(A=1,B=0)=P\{4,6\}=\frac{2}{6}$

$\mathrm{P}(A=0,B=1)=P\{3,5\}=\frac{2}{6},\; \mathrm{P}(A=1,B=1)=P\{2\}=\frac{1}{6}$

Cumulative distribution

The cumulative distribution function for a pair of random variables is defined in terms of their joint probability distribution;

$F(x,y)=P(X \le x, Y \le y) .$

Discrete case

The joint probability mass function of two discrete random variables is equal to

$\begin{align} \mathrm{P}(X=x\ \mathrm{and}\ Y=y) & {} = \mathrm{P}(Y=y \mid X=x) \cdot \mathrm{P}(X=x) \\ & {} = \mathrm{P}(X=x \mid Y=y) \cdot \mathrm{P}(Y=y). \end{align}$

In general, the joint probability distribution of $n$ discrete random variables $X 1,..., X n$ is equal to

$\mathrm{P}(X_1=x_1,\dots,X_n=x_n)=\mathrm{P}(X_1=x_1)\cdot \mathrm{P}(X_2=x_2|X_1=x_1)\cdot \mathrm{P}(X_3=x_3|X_1=x_1,X_2=x_2) \cdot ... \cdot P(X_n=x_n|X_1=x_1,\dots,X_{n-1}=x_{n-1})$

This identity is known as the chain rule of probability.

Since these are probabilities, we have

$\sum_x \sum_y \mathrm{P}(X=x\ \mathrm{and}\ Y=y) = 1.\;$

Continuous case

Similarly for continuous random variables, the joint probability density function can be written as f_X,Y(x, y) and this is

$f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) = f_{X|Y}(x|y)f_Y(y)\;$

where f_Y|X(y|x) and f_X|Y(x|y) give the conditional distributions of Y given X = x and of X given Y = y respectively, and f_X(x) and f_Y(y) give the marginal distributions for X and Y respectively.

Again, since these are probability distributions, one has

$\int_x \int_y f_{X,Y}(x,y) \; dy \; dx= 1.$

Mixed case

In some situations X is continuous but Y is discrete. For example, in a logistic regression, one may wish to predict the probability of a binary outcome Y conditional on the value of a continuously-distributed X. In this case, (X, Y) has neither a probability density function nor a probability mass function in the sense of the terms given above. On the other hand, a "mixed joint density" can be defined in either of two ways:

$\begin{align} f_{X,Y}(x,y) &= f_{X|Y}(x|y)\mathrm{P}(Y=y)\\ &= \mathrm{P}(Y=y \mid X=x) f_X(x) \end{align}$

Formally, f_X,Y(x, y) is the probability density function of (X, Y) with respect to the product measure on the respective supports of X and Y. Either of these two decompositions can then be used to recover the joint cumulative distribution function:

$\begin{align} F_{X,Y}(x,y)&=\sum\limits_{t\le y}\int_{s=-\infty}^x f_{X,Y}(s,t)\;ds \end{align}$

The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

General multidimensional distributions

The cumulative distribution function for a vector of random variables is defined in terms of their joint probability distribution;

$F(x_1,\dots,x_n)=P(X_1 \le x_1,\dots, X_n \le x_n) .$

The joint distribution for two random variables can be extended to many random variables X₁, ... X_n by adding them sequentially with the identity

$\begin{align} f_{X_1, \ldots X_n}(x_1, \ldots x_n) =& f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}) f_{X_1, \ldots X_{n-1}}( x_1, \ldots x_{n-1} )\\ =& f_{X_1} (x_1) \\ & \cdot f_{X_2|X_1} (x_2|x_1)\\ & \cdot \dots \\ & \cdot f_{X_{n-1}| X_1 \ldots X_{n-2}}(x_{n-1}| x_1, \ldots x_{n-2} ) \\ & \cdot f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}),\end{align}$

where

$\begin{align} f_{X_i| X_1, \ldots X_{i-1}}(x_i | x_1, \ldots x_{i-1})= &\frac{f_{X_1, \dots X_i}(x_1,\dots x_i)}{\int f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\ = &\frac{\int \dots \int f_{X_1, \dots X_n}(x_1,\dots x_i,u_{i+1}, \dots u_n) \mathrm{d} u_{i+1}\dots \mathrm{d}u_n}{\int \dots \int \int f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i \,\mathrm{d} u_{i+1}\dots \mathrm{d}u_n} \end{align}$

and

$f_{X_1,\dots X_i}(x_1,\dots x_i) = \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_i,x_{i+1},\dots x_n) \mathrm{d} x_{i+1} \dots \mathrm{d} x_n$

(notice, that these latter identities can be useful to generate a random variable $(X_1, \dots X_n)$ with given distribution function $f(x_1,\dots x_n)$ ); the density of the marginal distribution is

$f_{X_i}(x_i) = \int \dots \int \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_{i-1},x_i,x_{i+1},\dots x_n) \mathrm{d} x_1\dots \mathrm{d}x_{i-1} \, \mathrm{d}x_{i+1} \dots \mathrm{d}x_n.$

The joint cumulative distribution function is

$F_{X_1,\dots X_n}\left( x_1, \dots x_n\right)= \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} f_{X_1,\dots X_n}\left(u_1,\dots u_n\right) \mathrm{d} u_1 \dots \mathrm{d}u_n,$

and the conditional distribution function is accordingly

$\begin{align} F_{X_i| X_1, \ldots X_{i-1}}(x_i| x_1, \ldots x_{i-1})= &\frac{\int_{-\infty}^{x_i}f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i)\mathrm{d}u_i}{\int_{-\infty}^\infty f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\ = &\frac{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^{x_i} f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i\dots \mathrm{d}u_n}{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^\infty f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i,\dots u_n) \mathrm{d} u_i \dots \mathrm{d} u_n}. \end{align}$

Expectation reads

$\mathbb{E}\left[h(X_1,\dots X_n) \right]=\int_{-\infty}^\infty \dots \int_{-\infty}^\infty h(x_1,\dots x_n) f_{X_1,\dots X_n}(x_1,\dots x_n) \mathrm{d} x_1 \dots \mathrm{d} x_n;$

suppose that h is smooth enough and $h(u_1,\dots u_n)=h(x_1,\dots x_n)$ for $u_1 \ge x_1, \dots u_n\ge x_n$ , then, by iterated integration by parts,

$\begin{align}\mathbb{E}\left[h(X_1,\dots X_n) \right]=& h(x_1,\dots x_n)+ \\ & (-1)^n \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} F_{X_1,\dots X_n}(u_1,\dots u_n) \frac{\partial^n}{\partial x_1 \dots \partial x_n} h(u_1,\dots u_n) \mathrm{d} u_1 \dots \mathrm{d} u_n.\end{align}$

Joint distribution for independent variables

If for discrete random variables $\ P(X = x \ \mbox{and} \ Y = y ) = P( X = x) \cdot P( Y = y)$ for all x and y, or for absolutely continuous random variables $\ f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ for all x and y, then X and Y are said to be independent.

Joint Distribution for conditionally independent variables

If a subset $A$ of the variables $X_1,\cdots,X_n$ is conditionally independent given another subset $B$ of these variables, then the joint distribution $P(X 1,..., X n)$ is equal to $P(B)\cdot P(A|B)$ . Therefore, it can be efficiently represented by the lower-dimensional probability distributions $P (B)$ and $P (A | B)$ . Such conditional independence relations can be represented with a Bayesian network.

External links

Joint continuous density function on PlanetMath

Categories:

Theory of probability distributions
Types of probability distributions

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Joint probability density function — may refer to:* Probability density function * Joint probability distribution … Wikipedia
Probability distribution — This article is about probability distribution. For generalized functions in mathematical analysis, see Distribution (mathematics). For other uses, see Distribution (disambiguation). In probability theory, a probability mass, probability density … Wikipedia
Compound probability distribution — In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is… … Wikipedia
Conditional probability distribution — Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value. If the conditional distribution of Y given X is a… … Wikipedia
Maximum entropy probability distribution — In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions. According to the principle of… … Wikipedia
Probability metric — A probability metric is a function defining a distance between random variables or vectors. In particular the probability metric does not satisfy the identity of indiscernibles condition required to be satisfied by the metric of the metric… … Wikipedia
Joint entropy — The joint entropy is an entropy measure used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables. If the random variables are X and Y, the joint entropy is written H(X,Y). Like … Wikipedia
probability theory — Math., Statistics. the theory of analyzing and making statements concerning the probability of the occurrence of uncertain events. Cf. probability (def. 4). [1830 40] * * * Branch of mathematics that deals with analysis of random events.… … Universalium
Probability density function — Boxplot and probability density function of a normal distribution N(0, σ2). In probability theory, a probability density function (pdf), or density of a continuous random variable is a function that describes the relative likelihood for this… … Wikipedia
Joint quantum entropy — The joint quantum entropy generalizes the classical joint entropy to the context of quantum information theory. Intuitively, given two quantum states ho and sigma, represented as density operators that are subparts of a quantum system, the joint… … Wikipedia

Academic Dictionaries and Encyclopedias

Joint probability distribution

Contents

Example

Cumulative distribution

Discrete case

Continuous case

Mixed case

General multidimensional distributions

Joint distribution for independent variables

Joint Distribution for conditionally independent variables

See also

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Joint probability distribution

Contents

Example

Cumulative distribution

Discrete case

Continuous case

Mixed case

General multidimensional distributions

Joint distribution for independent variables

Joint Distribution for conditionally independent variables

See also

External links

Look at other dictionaries:

Share the article and excerpts

Direct link