Matrix calculus

Fundamental theorem
Limits of functions
Continuity
Mean value theorem

Differential calculus
Derivative Change of variables Implicit differentiation Taylor's theorem Related rates Rules and identities: Power rule, Product rule, Quotient rule, Chain rule

Integral calculus
Integral Lists of integrals Improper integrals Integration by: parts, disks, cylindrical shells, substitution, trigonometric substitution, partial fractions, changing order

Vector calculus
Gradient Divergence Curl Laplacian Gradient theorem Green's theorem Stokes' theorem Divergence theorem

Multivariable calculus
Matrix calculus Partial derivative Multiple integral Line integral Surface integral Volume integral Jacobian

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices, where it defines the matrix derivative. This notation was to describe systems of differential equations, and taking derivatives of matrix-valued functions with respect to matrix variables. This notation is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.

Note this article uses an alternate definition for vector and matrix calculus than the form often encountered within the field of estimation theory and pattern recognition. The resulting equations therefore appear to be transposed when compared to the equations used in textbooks within these fields.

1 Notation
2 Vector calculus
3 Matrix calculus
4 Identities
5 Examples
6 Relation to other derivatives
7 Usages
8 Alternatives
9 See also
10 Notes
11 External links

Notation

Let M(n,m) denote the space of real n×m matrices with n rows and m columns, such matrices will be denoted using bold capital letters: A, X, Y, etc. An element of M(n,1), that is, a column vector, is denoted with a boldface lowercase letter: a, x, y, etc. An element of M(1,1) is a scalar, denoted with lowercase italic typeface: a, t, x, etc. X^T denotes matrix transpose, tr(X) is trace, and det(X) is the determinant. All functions are assumed to be of differentiability class C¹ unless otherwise noted. Generally letters from first half of the alphabet (a, b, c, …) will be used to denote constants, and from the second half (t, x, y, …) to denote variables.

Vector calculus

Main article: Vector calculus

Because the space M(n,1) is identified with the Euclidean space Rⁿ and M(1,1) is identified with R, the notations developed here can accommodate the usual operations of vector calculus.

The tangent vector to a curve x : R → Rⁿ is

$\frac{\partial \mathbf{x}} {\partial t} = \begin{bmatrix} \frac{\partial x_1}{\partial t} \\ \vdots \\ \frac{\partial x_n}{\partial t} \\ \end{bmatrix}.$
The gradient of a scalar function f : Rⁿ → R

$\frac{\partial f}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n} \\ \end{bmatrix}.$

The directional derivative of f in the direction of v is then

$\nabla_\mathbf{v} f = \frac{\partial f}{\partial \mathbf{x}}\mathbf{v}.$
The pushforward or differential of a function f : R^m → Rⁿ is described by the Jacobian matrix

$\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_m}\\ \vdots & \ddots & \vdots\\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_m}\\ \end{bmatrix}.$

The pushforward along f of a vector v in R^m is

$d\,\mathbf{f}(\mathbf{v}) = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} \mathbf{v}.$

Matrix calculus

For the purposes of defining derivatives of simple functions, not much changes with matrix spaces; the space of n×m matrices is isomorphic to the vector space R^nm.^{[dubious – discuss]} The three derivatives familiar from vector calculus have close analogues here, though beware the complications that arise in the identities below.

The tangent vector of a curve F : R → M(n,m)

$\frac{\partial \mathbf{F}}{\partial t} = \begin{bmatrix} \frac{\partial F_{1,1}}{\partial t} & \cdots & \frac{\partial F_{1,m}}{\partial t}\\ \vdots & \ddots & \vdots\\ \frac{\partial F_{n,1}}{\partial t} & \cdots & \frac{\partial F_{n,m}}{\partial t}\\ \end{bmatrix}.$
The gradient of a scalar function f : M(n,m) → R

$\frac{\partial f}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial f}{\partial X_{1,1}} & \cdots & \frac{\partial f}{\partial X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial X_{1,m}} & \cdots & \frac{\partial f}{\partial X_{n,m}}\\ \end{bmatrix}.$

Notice that the indexing of the gradient with respect to X is transposed as compared with the indexing of X. The directional derivative of f in the direction of matrix Y is given by

$\nabla_\mathbf{Y} f = \operatorname{tr} \left(\frac{\partial f}{\partial \mathbf{X}} \mathbf{Y}\right).$
The differential or the matrix derivative of a function F : M(n,m) → M(p,q) is an element of M(p,q) ⊗ M(m,n), a fourth-rank tensor (the reversal of m and n here indicates the dual space of M(n,m)). In short it is an m×n matrix each of whose entries is a p×q matrix.^{[citation needed]}

$\frac{\partial\mathbf{F}} {\partial\mathbf{X}}= \begin{bmatrix} \frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\ \end{bmatrix},$

and note that each ∂F/∂X_i,j is a p×q matrix defined as above. Note also that this matrix has its indexing transposed; m rows and n columns. The pushforward along F of an n×m matrix Y in M(n,m) is then

$d\mathbf{F}(\mathbf{Y}) = \operatorname{tr}\left(\frac{\partial\mathbf{F}} {\partial\mathbf{X}}\mathbf{Y}\right),$ as formal block matrices.

Note that this definition encompasses all of the preceding definitions as special cases.

According to Jan R. Magnus and Heinz Neudecker, the following notations are both unsuitable, as the determinants of the resulting matrices would have "no interpretation" and "a useful chain rule does not exist" if these notations are being used:^[1]

$\frac{\partial\mathbf{F}} {\partial\mathbf{X}}= \begin{bmatrix} \frac{\partial\mathbf F_{1,1}}{\partial \mathbf X} & \cdots & \frac{\partial \mathbf F_{n,1}}{\partial \mathbf X}\\ \vdots & \ddots & \vdots\\ \frac{\partial\mathbf F_{1,m}}{\partial \mathbf X} & \cdots & \frac{\partial \mathbf F_{n,m}}{\partial \mathbf X}\\ \end{bmatrix}$
$\frac{\partial\mathbf{F}} {\partial\mathbf{X}}= \begin{bmatrix} \frac{\partial\mathbf F}{\partial \mathbf X_{1,1}} & \cdots & \frac{\partial \mathbf F}{\partial \mathbf X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial\mathbf F}{\partial \mathbf X_{1,m}} & \cdots & \frac{\partial \mathbf F}{\partial \mathbf X_{n,m}}\\ \end{bmatrix}$

The Jacobian matrix, according to Magnus and Neudecker,^[1] is

$\mathrm D\, \mathbf F\left(\mathbf X\right) = \frac{\partial\, \mathrm{vec}\ \mathbf F\left(\mathbf X\right)}{\partial\left(\mathrm{vec}\ \mathbf X\right)^{\mathrm T}}.$ ^{[contradiction]}

Identities

Note that matrix multiplication is not commutative, so in these identities, the order must not be changed.

Chain rule: If Z is a function of Y which in turn is a function of X, and these are all column vectors, then ^[2]

$\frac{\partial \mathbf{Z}} {\partial \mathbf{X}} = \frac{\partial \mathbf{Y}} {\partial \mathbf{X}} \frac{\partial \mathbf{Z}} {\partial \mathbf{Y}}$
Product rule:In all cases where the derivatives do not involve tensor products (for example, Y has more than one row and X has more than one column),^{[citation needed]}

$\frac{\partial (\mathbf{Y}\mathbf{Z})}{\partial \mathbf{X}} = \frac{\partial\mathbf{Y}}{\partial\mathbf{X}}{\mathbf{Z}} + \mathbf{Y}\frac{\partial\mathbf{Z}}{\partial \mathbf{X}}$

Examples

Derivative of linear functions

This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector.

$\frac{\partial \; \textbf{a}^T\textbf{x}}{\partial \; \textbf{x}} = \frac{\partial \; \textbf{x}^T\textbf{a}}{\partial \; \textbf{x}} = \textbf{a}$

$\frac{\partial \; \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{A}^T$

$\frac{\partial \; \textbf{x}^T\textbf{A}}{\partial \; \textbf{x}} = \textbf{A}$

Derivative of quadratic functions

This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

$\frac{\partial \; \textbf{x}^T \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{A}\textbf{x} + \textbf{A}^T\textbf{x}$

$\frac{\partial \; \textbf{a}^T\textbf{x}\textbf{x}^T\textbf{b}}{\partial \; \textbf{x}} = (\textbf{a}\textbf{b}^T + \textbf{b}\textbf{a}^T)\textbf{x} = \textbf{a}\textbf{b}^T\textbf{x} + \textbf{b}\textbf{a}^T\textbf{x}$

$\frac{\partial \; (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} (\textbf{D}\textbf{x} + \textbf{e}) }{\partial \; \textbf{x}} = (\textbf{D}\textbf{x} + \textbf{e})^T \textbf{C}^T \textbf{A} + (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} \textbf{D}$

Related to this is the derivative of the Euclidean norm:

$\frac{\partial \; \|\mathbf{x}-\mathbf{a}\|}{\partial \; \textbf{x}} = \frac{(\mathbf{x}-\mathbf{a})^T}{\|\mathbf{x}-\mathbf{a}\|}.$

Derivative of matrix traces

This section shows examples of matrix differentiation of common trace equations.

$\frac{\partial \; \operatorname{tr} ( \textbf{A} \textbf{X} )}{\partial \; \textbf{X}} = \frac{\partial \; \operatorname{tr} ( \textbf{X} \textbf{A} )}{\partial \; \textbf{X}} = \textbf{A}^T$ ^[3]

$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B})}{\partial \; \textbf{X}} = \frac{\partial \; \operatorname{tr}( \textbf{B} \textbf{A} \textbf{X} )}{\partial \; \textbf{X}} = \left(\textbf{B} \textbf{A}\right)^T$

$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B} \textbf{X}^T \textbf{C}) }{\partial \; \textbf{X}} = \left(\textbf{B} \textbf{X}^T \textbf{C} \textbf{A}\right)^T + \left(\textbf{B}^T \textbf{X}^T \textbf{A}^T \textbf{C}^T\right)^T$

Derivative of matrix determinant

$\frac{\partial \det\mathbf{X}}{\partial \mathbf{X}}= \operatorname{adj}\,\mathbf{X}= \det\mathbf{X}\cdot \left ( \mathbf{X}^{-1} \right )^T.$ ^[4]

Relation to other derivatives

The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. The Fréchet derivative is the standard way in the setting of functional analysis to take derivatives with respect to vectors. In the case that a matrix function of a matrix is Fréchet differentiable, the two derivatives will agree up to translation of notations. As is the case in general for partial derivatives, some formulae may extend under weaker analytic conditions than the existence of the derivative as approximating linear mapping.

Usages

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers. This includes the derivation of:

Alternatives

The tensor index notation with its Einstein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. Note that a matrix can be considered simply a tensor of rank two.

Notes

^ ^a ^b Magnus, Jan R.; Neudecker, Heinz (1999 (1988)). Matrix Differential Calculus. Wiley Series in Probability and Statistics (revised ed.). Wiley. pp. 171–173.
^ Introduction to Finite Element Methods http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/ p. D 5
^ Duchi, John C. "Properties of the Trace and Matrix Derivatives". University of California at Berkeley. http://www.cs.berkeley.edu/~jduchi/projects/matrix_prop.pdf. Retrieved 19 July 2011.
^ "Derivation of Derivative of Determinant". http://en.wikipedia.org/wiki/Determinant#Derivative.

External links

Matrix Calculus appendix from Introduction to Finite Element Methods book on University of Colorado at Boulder. Uses the Hessian (transpose to Jacobian) definition of vector and matrix derivatives.
Matrix calculus Matrix Reference Manual , Imperial College London.
The Matrix Cookbook, with a derivatives chapter. Uses the Hessian definition.
Linear Algebra and its applications, Chapter 9, by Peter Lax
[1] Matrix Differentiation

Categories:

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

matrix calculus — a white to light tan urinary calculus of a doughy consistency, containing calcium salts in an organic matrix of a mucoprotein and a sulfated mucopolysaccharide … Medical dictionary
Calculus (disambiguation) — Calculus is Latin for pebble, and has a number of meanings in English: In mathematics and computer science Calculus , in its most general sense, is any method or system of calculation. To modern theoreticians the answer to the question what is a… … Wikipedia
Matrix (mathematics) — Specific elements of a matrix are often denoted by a variable with two subscripts. For instance, a2,1 represents the element at the second row and first column of a matrix A. In mathematics, a matrix (plural matrices, or less commonly matrixes)… … Wikipedia
Matrix — Contents 1 Science and mathematics 2 Technology 3 Arts and entertainment … Wikipedia
Matrix function — In mathematics, a matrix function is a function which maps a matrix to another matrix. Contents 1 Extending scalar functions to matrix functions 1.1 Power series 1.2 Jordan decomposition … Wikipedia
Calculus — This article is about the branch of mathematics. For other uses, see Calculus (disambiguation). Topics in Calculus Fundamental theorem Limits of functions Continuity Mean value theorem Differential calculus Derivative Change of variables … Wikipedia
Calculus with polynomials — Topics in Calculus Fundamental theorem Limits of functions Continuity Mean value theorem Differential calculus Derivative Change of variables Implicit differentiation Taylor s theorem Related rates … Wikipedia
Calculus of constructions — The calculus of constructions (CoC) is a higher order typed lambda calculus, initially developed by Thierry Coquand, where types are first class values. It is thus possible, within the CoC, to define functions from, say, integers to types, types… … Wikipedia
Calculus — A stone, as in the urinary tract. Also, the calcium salt deposits on the teeth. The word calculus in Latin means a pebble. Pebbles were once used for counting, from which came the mathematical field of calculus. A urinary calculus is a pebble in… … Medical dictionary
Calculus (dental) — Heavy staining and calculus deposits exhibited on the lingual surface of the mandibular anterior teeth, along the gumline. In dentistry, calculus or tartar is a form of hardened dental plaque. It is caused by the continual accumulation of… … Wikipedia

Academic Dictionaries and Encyclopedias

Matrix calculus

Contents

Notation

Vector calculus

Matrix calculus

Identities