Automatic differentiation

Automatic differentiation

In mathematics and computer algebra, automatic differentiation, or AD, sometimes alternatively called algorithmic differentiation, is a method to numerically evaluate the derivative of a function specified by a computer program. Two classical ways of doing differentiation are:
* symbolically differentiate the function as an expression, and evaluate it in the point; or
* use numerical differentiation.

The main drawback with symbolic differentiation is low speed, and the difficulty of converting a computer program into a single expression. Moreover, many functions grow quite complex as higher derivatives are calculated. Two important drawbacks with finite differences are round-off errors in the discretization process, and cancellation. Both classical methods have problems with calculating higher derivatives, where the complexity and errors increase. Automatic differentiation solves all of the mentioned problems.

AD exploits the fact that any computer program that implements a vector function y = F(x) (generally) can be decomposed into a sequence of elementary assignments, any one of which may be trivially differentiated by a simple table lookup. These "elemental partial derivatives", evaluated at a particular argument, are combined in accordance with the chain rule from derivative calculus to form some derivative information for F (such as gradients, tangents, the Jacobian matrix, etc.). This process yields exact (to numerical accuracy) derivatives. Because the symbolic transformation occurs only at the most basic level, AD avoids the computational problems inherent in complex symbolic computation.

The chain rule, forward and reverse accumulation

Fundamental to AD is the decomposition of differentials provided by the chain rule. For the simple composition f(x) = g(h(x)) the chain rule gives

:frac{df}{dx} = frac{dg}{dh} frac{dh}{dx}

Usually, two distinct modes of AD are presented, "forward accumulation" (or forward mode) and "reverse accumulation" (or reverse mode, the use of "backward" is discouraged). Forward accumulation specifies that one traverses the chain rule from right to left, that is first one computes dh/dx and then dg/dh, and reverse accumulation from left to right.

Forward accumulation

Forward accumulation automatic differentiation is easiest to understand and to implement. The function f(x_1,x_2) = x_1 x_2 + sin(x_1) is by a computer (or the human programmer) interpreted as the sequence of elementary operations on the work variables w_i, and an AD tool for forward accumulation adds the corresponding operations on the second component of the augmented arithmetic.

The derivative computation for f(x_1,x_2) = x_1 x_2 + sin(x_1) needs to be seeded in order to distinguish between the derivative with respect to x_1 or x_2. The table above seeds the computation with w'_1=1 and w'_2=0 and we see that this results in x_2 + cos(x_1) which is the derivative with respect to x_1. Note that although the table displays the symbolic derivative, in the computer it is always the evaluated (numeric) value that is stored. Figure 2 represents the above statements in a computational graph.

In order to compute the gradient of this example function, that is partial f/partial x_1 and partial f / partial x_2, two sweeps over the computational graph is needed, first with the seeds w'_1 = 1 and w'_2 = 0, then with w'_1 = 0 and w'_2 = 1.

The computational complexity of one sweep of forward accumulation is proportional to the complexity of the original code.

Forward accumulation is superior to reverse accumulation for functions f:mathbb{R} ightarrow mathbb{R}^m with m gg 1 as only one sweep is necessary, compared to m sweeps for reverse accumulation.

Reverse accumulation

Reverse accumulation traverses the chain rule from left to right, or in the case of the computational graph in Figure 3, from top to bottom. The example function is real-valued, and thus there is only one seed for the derivative computation, and only one sweep of the computational graph is needed in order to calculate the (two-component) gradient. This is only half the work when compared to forward accumulation, but reverse accumulation requires the storage of some of the work variables w_i, which may represent a significant memory issue.

The data flow graph of a computation can be manipulated to calculate the gradient of its original calculation. This is done by adding an adjoint node for each primal node, connected by adjoint edges which parallel the primal edges but flow in the opposite direction. The nodes in the adjoint graph represent multiplication by the derivatives of the functions calculated by the nodes in the primal. For instance, addition in the primal causes fanout in the adjoint; fanout in the primal causes addition in the adjoint; a unary function y=f(x) in the primal causes x'=f'(x) y' in the adjoint; etc.

Reverse accumulation is superior to forward accumulation for functions f:mathbb{R}^n ightarrow mathbb{R} with n gg 1.

Backpropagation of errors in multilayer perceptrons, a technique used in machine learning, is a special case of reverse mode AD.

Jacobian computation

The Jacobian J of f:mathbb{R}^n ightarrow mathbb{R}^m is an m imes n matrix. The Jacobian can be computed using n sweeps of forward accumulation, of which each sweep can yield a column vector of the Jacobian, or with m sweeps of reverse accumulation, of which each sweep can yield a row vector of the Jacobian.

Beyond forward and reverse accumulation

Forward and reverse accumulation are just two (extreme) ways of traversing the chain rule. The problem of computing a full Jacobian of F:mathbb{R}^n ightarrow mathbb{R}^m with a minimum number of arithmetic operations is known as the "optimal Jacobian accumulation" (OJA) problem. OJA is NP-complete. [citation|first=Uwe|last=Naumann|contribution=Optimal Jacobian accumulation is NP-complete|journal=Mathematical Programming|volume=112|issue=2|pages=427-441|month=April|year=2008] Central to this proof is the idea that there may exist algebraic dependences between the local partials that label the edges of the graph. In particular, two or more edge labels may be recognized as equal. The complexity of the problem is still open if it is assumed that all edge labels are unique and algebraically independent.

Derivation of augmented arithmetic for forward accumulation AD using dual numbers

Forward mode automatic differentiation is accomplished by augmenting the algebra of real numbers and obtaining a new arithmetic. An additional component is added to every number which will represent the derivative of a function at the number, and all arithmetic operators are extended for the augmented algebra. The augmented algebra is the algebra of dual numbers.

Replace every number ,x with the number x + x'varepsilon, where x' is a real number, but varepsilon is nothing but a symbol with the property varepsilon^2=0. Using only this, we get for the regular arithmetic:(x + x'varepsilon) + (y + y'varepsilon) = x + y + (x' + y')varepsilon:(x + x'varepsilon) cdot (y + y'varepsilon) = xy + xy'varepsilon + yx'varepsilon + x'y'varepsilon^2 = xy + (x y' + yx')varepsilonand likewise for subtraction and division.

Now, we may calculate polynomials in this augmented arithmetic. If P(x) = p_0 + p_1 x + p_2x^2 + cdots + p_n x^n, then

where P^{(1)} denotes the derivative of P with respect to its first argument, andx', called a "seed", can be chosen arbitrarily.

The new arithmetic consists of ordered pairs, elements written langle x, x' angle, with ordinary arithmetics on the first component, and first order differentiation arithmetic on the second component, as described above. Extending the above results on polynomials to analytic functions we obtain a list of the basic arithmetic and some standard functions for the new arithmetic::langle u,u' angle +langle v,v' angle = langle u+v, u'+v' angle :langle u,u' angle -langle v,v' angle = langle u-v, u'-v' angle :langle u,u' angle *langle v,v' angle = langle u v, u'v+uv' angle :langle u,u' angle /langle v,v' angle = leftlangle frac{u}{v}, frac{u'v-uv'}{v^2} ight angle quad ( v e 0) :sinlangle u,u' angle = langle sin(u) , u' cos(u) angle :coslangle u,u' angle = langle cos(u) , -u' sin(u) angle :explangle u,u' angle = langle exp u , u' exp u angle :loglangle u,u' angle = langle log(u) , u'/u angle quad (u>0) :langle u,u' angle^k = langle u^k , k u^{k-1} u' angle quad (u e 0) :left| langle u,u' angle ight| = langle left| u ight| , u' mbox{sign} u angle quad (u e 0)and in general for the primitive function g,:g(langle u,u' angle , langle v,v' angle ) = langle g(u,v) , g^{(1)}(u,v) u' + g^{(2)}(u,v) v' anglewhere g^{(1)} and g^{(2)} are the derivatives of g with respect to its first and second arguments, respectively.

When a binary basic arithmetic operation is applied to mixed arguments — the pair langle u, u' angle and the real number c — the real number is first lifted to langle c, 0 angle. The derivative of a function f : mathbb{R} ightarrowmathbb{R} at the point x_0 is now found by calculating f(langle x_0, 1 angle) using the above arithmetic, which gives langle f ( x_0 ) , f' ( x_0 ) angle as the result.

Vector arguments and functions

Multivariate functions can be handled with the same efficiency and mechanisms as univariate functions by adopting a directional derivative operator, which finds the directional derivative y' in mathbb{R}^m of f:mathbb{R}^n ightarrowmathbb{R}^m at x in mathbb{R}^n in the direction x' in mathbb{R}^n by calculating (langle y_1,y'_1 angle, ldots, langle y_m,y'_m angle) = f(langle x_1,x'_1 angle, ldots, langle x_n,x'_n angle) using the same arithmetic as above.

Higher order differentials

The above arithmetic can be generalized, in the natural way, to second order and higher derivatives. However, the arithmetic rules quickly grow very complicated, complexity will be quadratic in the highest derivative degree. Instead, truncated Taylor series arithmetic is used. This is possible because the Taylor summands in a Taylor series of a function are products of known coefficients and derivatives of the function. Computations of Hessians using AD has proven useful in some optimization contexts.

Implementation

Forward-mode AD is implemented by a Nonstandard interpretation of the program in which real numbers are replaced by dual numbers, constants are lifted to dual numbers with a zero epsilon coefficient, and the numeric primitives are lifted to operate on dual numbers. This nonstandard interpretation is generally implemented using one of two strategies: "source code transformation" or "operator overloading".

Source code transformation

The source code for a function is replaced by an automatically generated source code that includes statements for calculating the derivatives interleaved with the original instructions.

Source code transformation can be implemented for all programming languages, and it is also easier for the compiler to do compile time optimizations. However, the implementation of the AD tool itself is more difficult.

Examples:
* [http://www-fp.mcs.anl.gov/adic/ ADIC] (C/C++, forward mode)
* [http://www-unix.mcs.anl.gov/autodiff/ADIFOR/ ADIFOR] (Fortran77)
* [http://www-unix.mcs.anl.gov/~utke/OpenAD/ OpenAD] (Fortran77, Fortran95, C/C++)
* [http://tapenade.inria.fr:8080/tapenade/index.jsp TAPENADE] (Fortran77, Fortran95)

Operator overloading

Operator overloading is a possibility for source code written in a language supporting it. Objects for real numbers and elementary mathematical operations must be overloaded to cater for the augmented arithemtic depicted above. This requires no change in the original source code for the function to be differentiated.

Operator overloading for forward accumulation is easy to implement, and also possible for reverse accumulation. However, current compilers lag behind in optimizing the code when compared to forward accumulation.

Examples:
* [http://www.vivlabs.com/subpage_adc_v4.php ADC Version 4.0] (C/C++)
* [http://www.vivlabs.com/subpage_adf_v4.php ADF Version 4.0] (Fortran 77, Fortran 95)
* [http://www.math.tu-dresden.de/~adol-c/ ADOL-C] (C/C++)
* [http://www.fadbad.com/ FADBAD++] (C/C++)
* [http://www.coin-or.org/CppAD CppAD] (C/C++)
* [http://matlabad.com/ MAD] (Matlab)
* Sacado (Part of the [http://trilinos.sandia.gov/ Trilinos] collection) (C++, forward/reverse)

References

Literature

* cite book
last = Rall
first = Louis B.
title = Automatic Differentiation: Techniques and Applications
publisher = Springer
series = Lecture Notes in Computer Science
volume = 120
year = 1981
isbn = 0-540-10861-0

* cite book
last = Griewank
first = Andreas
title = Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation
publisher = SIAM
series = Frontiers in Applied Mathematics
volume = 19
year = 2000
isbn = 0-89871-451-6

External links

* [http://www.autodiff.org/ www.autodiff.org] , An "entry site to everything you want to know about automatic differentiation"
* [http://karminghenry.sinaman.com/adindex.html Automatic Differentiation of Parallel OpenMP Programs]
* [http://homepage.mac.com/sigfpe/paper.pdf Automatic Differentiation, C++ Templates and Photogrammetry]
* [http://www.vivlabs.com/subpage_ad.php Automatic Differentiation, Operator Overloading Approach]
* [http://apmonitor.ath.cx Compute analytic derivatives through a web-based interface] Automatic differentiation of nonlinear models
* [http://tapenade.inria.fr:8080/tapenade/index.jsp Compute analytic derivatives of any Fortran77 or Fortran95 through a web-based interface] Automatic Differentiation of Fortran programs


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Numerical differentiation — is a technique of numerical analysis to produce an estimate of the derivative of a mathematical function or function subroutine using values from the function and perhaps other knowledge about the function. Contents 1 Finite difference formulae 1 …   Wikipedia

  • Automatische Differentiation — Das automatische Differenzieren bzw. Differenzieren von Algorithmen ist ein Verfahren der Informatik und angewandten Mathematik. Zu einer Funktion in mehreren Variablen, die als Prozedur in einer Programmiersprache oder als Berechnungsgraph… …   Deutsch Wikipedia

  • List of interactive geometry software — Interactive geometry software (IGS, or dynamic geometry environments, DGEs) are computer programs which allow one to create and then manipulate geometric constructions, primarily in plane geometry. In most IGS, one starts construction by putting… …   Wikipedia

  • Derivative — This article is an overview of the term as used in calculus. For a less technical overview of the subject, see Differential calculus. For other uses, see Derivative (disambiguation) …   Wikipedia

  • PROPT — Infobox Software name = PROPT developer = [http://tomopt.com/ Tomlab Optimization Inc.] latest release version = 6.1 latest release date = release date|2008|06|11 operating system = [http://tomopt.com/tomlab/about/ TOMLAB® OS Support] genre =… …   Wikipedia

  • Mechanics of planar particle motion — Classical mechanics Newton s Second Law History of classical mechanics  …   Wikipedia

  • Diferenciación automática — Saltar a navegación, búsqueda En matemática y álgebra computacional, diferenciación automática, o DA, también conocida como diferenciación algorítmica, es un método para la evaluación de derivadas de una función expresada como un programa de… …   Wikipedia Español

  • Centrifugal force (planar motion) — In classical mechanics, centrifugal force (from Latin centrum center and fugere to flee ) is one of the three so called inertial forces or fictitious forces that enter the equations of motion when Newton s laws are formulated in a non inertial… …   Wikipedia

  • Automatisches Differenzieren — Das automatische Differenzieren bzw. Differenzieren von Algorithmen ist ein Verfahren der Informatik und angewandten Mathematik. Zu einer Funktion in mehreren Variablen, die als Prozedur in einer Programmiersprache oder als Berechnungsgraph… …   Deutsch Wikipedia

  • Computed tomography — tomos (slice) and graphein (to write).Computed tomography was originally known as the EMI scan as it was developed at a research branch of EMI, a company best known today for its music and recording business. It was later known as computed axial… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”