- Cayley–Hamilton theorem
In
linear algebra , the Cayley–Hamilton theorem (named after the mathematiciansArthur Cayley and William Hamilton) states that everysquare matrix over the real or complex field satisfies its own characteristic equation.More precisely; if "A" is the given square "n"×"n" matrix and "In " is the "n"×"n"
identity matrix , then thecharacteristic polynomial of "A" is defined as::
where "det" is the
determinant function. The Cayley–Hamilton theorem states that substituting the matrix in the characteristic polynomial(which involves multiplying its constant term by "In", since that is the zeroth power of "A")results in thezero matrix::
The Cayley–Hamilton theorem also holds for square matrices over
commutative ring s. The Cayley–Hamilton theorem is equivalent to the statement that theminimal polynomial of a square matrix divides itscharacteristic polynomial .Examples
For a 1×1 matrix "A" = ("a") the characteristic polynomial is given by and so is obvious.
For a 2×2 matrix:characteristic polynomial is given by , so the Cayley–Hamilton theorem states that:by working out the coefficents of this can indeed be seen to be always valid.
For larger matrix the expressions for the coefficients of the characteristic polynomial in terms of the matrix coefficients become increasingly complicated, but they can also be expressed in terms of traces of powers of the matrix "A" using
Newton's identities , resulting in more compact expressions (but which involve divisions by certain integers). For instance the coefficient of "t" above is just the trace of "A", while the constant coefficient can be written as (it is also the determinant of "A" of course). In fact the expression gives the coefficient of in the characteristic polynomial of any "n"×"n" matrix, so for a 3×3 matrix "A" the statement of the Cayley–Hamilton theorem can be written as:where the right hand side designates a 3×3 matrix with all entries zero.Similarly one can write for a 4×4 matrix "A"::and so on for larger matrices, with the increasingly complicated expressions for the coefficients being deduced from Newton's identities.
As a concrete example, the characteristic polynomial:is given by:
The Cayley–Hamilton theorem then claims that:which one can easily verify in this case.
The Cayley–Hamilton theorem always provides a relation beween the powers of "A" (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power "A""n" or any higher powers of "A".For instance the result above can be written as:Then, for example, to calculate "A"4, observe:::
For a general invertible "n"×"n" matrix "A", i.e., one with nonzero determinant, can be written as a
polynomial expression in "A" using the Cayley–Hamilton theorem: it gives an identity:which can be written as:and multiplying both sides by one deduces the expression:Proving the Cayley–Hamilton theorem
As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an "n"×"n" matrix requires two steps: first the coefficients "c""i" of the characteristic polynomial are determined by development as a polynomial in "t" of the determinant:and then these coeffcients are used in a linear combination of powers of "A" that is equated to the "n"×"n" null matrix::The left hand side can be worked out to an "n"×"n" matrix whose entries are (enormous) polynomial expressions in the set of entries of "A", so the Cayley–Hamilton theorem states that each of these expressions are equivalent to 0. For any fixed value of "n" these identities can be obtained by tedious but completely straightforward algebraic manipulations. None of these computations can show however why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes "n", so a uniform proof for all "n" is needed.
Preliminaries
If a vector "v" of size "n" happens to be an
eigenvector of "A" with eigenvalue λ, in other words if , then:which is the null vector since (the eigenvalues of "A" are precisely the roots of "p"("t")). This holds for all possible eigenvalues λ, so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if "A" admits a basis of eigenvectors, in other words if "A" isdiagonalizable , then the Cayley–Hamilton theorem must hold for "A", since two matrices that give the same values when applied to each element of a basis must be equal. Not all matrices are diagonalizable, but for matrices with complex coefficients many of them are: the set of diagonalizable complex square matrices of a given size is dense in the set of all such square matrices (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have multiple roots). Now if any of the expressions that the theorem equates to 0 would not reduce to a null expression, in other words if it would be a nonzero polynomial in the coefficients of the matrix, then the set of complex matrices for which this expression happens to give 0 would not be dense in the set of all matrices, which would contradict the fact that the theorem holds for all diagonalizable matrices. Thus one can see that the Cayley–Hamilton theorem must be true.While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor of the kind of entries allowed (for matrices with real entries the diagonizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any
commutative ring . There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.A non-proof
One elementary and non-technical argument, but which is also incorrect, is simply to take the definition:and substitute "A" for "t", obtaining . Comparing with the expanded form of above one can see why this is wrong: in the variables "t" occur inside the diagonal entries, and substituting the entire matrix "A" for "t" in those positions cannot be preformed, and in any case does not result in the matrix (note that before the substitution the multiplication by "I""n" serves to transform the scalar "t" into a matrix, while after the substitution this operation has been interpreted as matrix multiplication that has no effect at all, which is not the same thing). Note also that the result of this computation is the scalar 0, while the Cayley–Hamilton theorem says it should be a "n"×"n" matrix with all entries zero. One of the proofs below will have some similarity to this argument, by introducing a matrix with non-numeric coefficients in which in fact "A" can live inside an entry, but then is not equal to "A", and the conclusion is arrived at differently.
Adjugate matrices
All proofs below use notion of the
adjugate matrix of an "n"×"n" matrix "M". This is a matrix whose coefficients are given by polynomial expressions in the coefficients of "M" (in fact by certain ("n" − 1)×("n" − 1) determinants), in such a way that one has the following fundamental relations:These relations are a direct consequence of the basic properties of determinants: evaluation of the ("i","j") entry of the matrix product on the left gives the expansion by column "j" of the matrix obtained from "M" by replacing column "i" by a copy of column "j", which is if and zero otherwise; the matrix product on the left it is similar, but for expansions by rows. Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied for matrices with non-numeric entries such as polynomials.A direct algebraic proof
This proof just uses the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix whose determinant is the characteristic polynomial is such a matrix, and since polynomials form a commutative ring, it has an adjugate:Then according to the right hand fundamental relation of the adjugate one has:Since "B" is also a matrix with polynomials in "t" as entries, one can for each "i" collect the coefficients of in each entry to form a matrix "B""i" of numbers, such that one has:(the way the entries of "B" are defined makes clear that no powers higher that occur). While this "looks" like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as linear combination of constant matrices, and the coefficient has been written to the left of the matrix to stress this point of view. Now one can expand the matrix product in our equation by bilinearity:Writing , one obtains en equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of "t" as coefficients. Such an equality can only hold if in any matrix position the entry that is multiplied by a given power is the same on both sides; it follows that the constant matrices with coefficient in both expression must be equal. Writing these equations for "i" from "n" down to 0 one finds:We multiply the equation of the coefficients of "t""i" from the left by "A""i", and sum up; the left-hand sides form a
telescoping sum and cancel completely, which results in the equation:This completes the proof.A proof using polynomials with matrix coefficients
This proof is similar to the first one, by tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be appied in this setting. Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of
polynomial function s, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in "t" with matrix coefficients, the variable "t" must not be thought of as an "unknown", but as a formal symbol that is to manipulated according to given rules; in particular one cannot just set "t" to a specific value.Let M = "M""n"("R") be the ring of "n" × "n" matrices with entries in some ring "R" (such as the real or complex numbers) that has "A" as an element. Matrices with as coefficients polynomials in "t", such as or its adjugate "B" in the first proof, are elements of "M""n"("R" ["t"] ). By collecting like powers of "t", such matrices can be written as "polynomials" in "t" with constant matrices as coefficients; write M ["t"] for the set of such polynomials. Since this set is in bijection with "M""n"("R" ["t"] ), one defines arithmetic operations on it correspondingly, in particular multiplication is given by:respecting the order of the coefficient matrices from the two operands; obviously this gives a non-commutative multiplication. Thus the identity:from the first proof can be vieved as one involving a multiplication of elements in M ["t"] .
At this point, it is tempting to set "t" equal to the matrix "A", which makes the first factor on the left equal to the null matrix, and the right hand side equal to "p"("A"); however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map" ev"A" : M ["t"] → M, which replaces each "t""i" by the matrix power "A""i" of "A", where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. However this map is not a ring homomorphism: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product is defined assuming that "t" commutes with "N", but this may fail if "t" is replaced by the matrix "A".
One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix "A" is in the center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting "t" with coefficients is now justified after evaluation). Now "A" is not always in the center of M, but we may replace M with a smaller ring provided it contains all the coefficients of the polynomials in question: , "A", and the coefficients of the polynomial "B". The obvious choice for such a subring is the
centralizer "Z" of "A", the subring of all matrices that commute with "A"; by definition "A" is in the center of "Z". This centralizer obviously contains , and "A", but one has to show that it contains the matrices . To do this one combines the two fundamental relations for adjugates, writing out the adjugate "B" as a polynomial::Equating the coefficients shows that for each "i", we have "A" "B""i" = "B""i" "A" as desired. Having found the proper setting in which ev"A" is indeed a homomorphism of rings, one can complete the proof as suggested above::This completes the proof.A synthesis of the first two proofs
In the first proof, one was able to determine the coefficients "B""i" of "B" based on the right hand fundamental relation for the adjugate only. In fact the first "n" equations derived can be interpreted as determining the quotient "B" of the
Euclidean division of the polynomial on the left by the "monic" polynomial , while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial "P" is defined, and always produces a unique quotient and remainder with the same degree condition as in the commutative case, provided it is specified at which side one wishes "P" to be a factor (here that is to the left). To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write as and observe that since "P" is monic, cannot have a degree less than that of "P", unless .But the dividend and divisor used here both lie in the subring ("R" ["A"] ) ["t"] , where "R" ["A"] is the subring of the matrix ring M generated by "A": the "R"-linear span of all powers of "A". Therefore the Euclidean division can in fact be performed within that "commutative" polynomial ring, and of course it then gives the same quotient "B" and remainder 0 as in the larger ring; in particular this shows that "B" in fact lies in . But in this commutative setting it is valid to set "t" to "A" in the equation , in other words apply the evaluation
which is a ring homomorphism, giving:just like in the second proof, as desired.In addition to proving the theorem, the above argument tells us that the coefficients of "B" are polynomials in "A", while from the second proof we only knew that they lie in the centralizer "Z" of "A"; in general "Z" is a larger subring than "R" ["A"] , and not necessarily commutative. In particular the constant term lies in "R" ["A"] . Since "A" is an arbitrary square matrix, this proves that can always be expressed as a polynomial in (with coefficients that depend on ), something that is not obvious from the definition of the adjugate matrix. In fact the equations found in the first proof allow successively expressing , ..., , as polynomials in "A", which leads to the identity:valid for all "n"×"n" matrices, where is the characteristic polynomial of "A". Note that this identity implies the statement of the Cayley–Hamilton theorem: one may move to the right hand side, multiply the resulting equation (on the left or on the right) by , and use the fact that:.
A proof using matrices of endomorphisms
As was mentioned above, the matrix in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix "A" for "t"; doing that subtitution into the matrix before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where is obtained directly as the value of a certain deteminant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries of "A", and all of "A" itself. One could take for this the ring M of "n" × "n" matrices over "R", where the entry is realised as , and "A" as itself. But considering matrices with matrices as entries might cause confusion with block matrices, which is not intended, as that gives the wrong notion of determinant. It is clearer to distinguish "A" from the endomorphism φ of an "n"-dimensional vector space "V" (or free "R"-module if "R" is not a field) defined by it in a basis "e"1, ..., "e""n", and to take matrices over the ring End("V") of all such endomorphisms. Then is a possible matrix entry, while "A" designates the element of whose entry is endomorphism of scalar multiplcation by ; simlarly "I""n" will be interpreted as element of . However, since End("V") is not a commutative ring, no deteminant is defined on ; this can only be done for matrices over a commutative subring of End("V"). Now the entries of the matrix all lie in the subring "R" [φ] generated by the identity and φ, which is commutative. Then a determinant map is defined, and evaluates to the value "p"(φ) of the characteristic polynomial of "A" at φ (this holds independently of the relation between "A" and φ); the Cayley–Hamilton theorem states that "p"(φ) is the null endomorphism.
In this form, the following proof can be obtained from that of Harvard citations|last1 = Atiyah|last2 = MacDonald|year = 1969|loc = Prop. 2.4 (which in fact is the more general statement related to the
Nakayama lemma ; one takes for the ideal in that proposition the whole ring "R"). The fact that "A" is the matrix of φ in the basis "e"1, ..., "e""n"means that:One can interpret these as "n" components of one equation in "V""n", whose members can be written using the matrix-vector product that is defined as usual, but with individual entries and being "multiplied" by forming ; this gives::where is the element whose component "i" is "e""i" (in other words it is the basis "e"1, ..., "e""n" of "V" written as a column of vectors). Writing this equation as:one recognizes thetranspose of the matrix considered above, and its determinant (as element of ) is also "p"(φ). To derive from this equation that , one left-multiplies by theadjugate matrix of , which is defined in the matrix ring , giving:the associativity of matrix-matrix and matrix-vector multiplication used in the first step is a purely formal property of those operations, independent of the nature of the entries. Now component "i" of this equation says that ; thus "p"(φ) vanishes on all "e""i", and since these elements generate "V" it follows that , completing the proof.One additional fact that follows from this proof is that the matrix "A" whose characteristic polynomial is taken need not be identical to the value φ substituted into that polynomial; it suffices that φ be an endomorphism of "V" satisfying the initial eqautions φ("e""i") = Σ"j" "A""j","i""e""j" for "some" sequence of elements "e"1,...,"e""n" that generate "V" (which space might have smaller dimension than "n", or in case the ring "R" is not a field it might not be a
free module at all).Abstraction and generalizations
The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring "R", and that "p"("φ") = 0 will hold whenever φ is an endomorphism of an "R" module generated by elements "e"1,...,"e""n" that satisfies for "j" = 1,...,"n". This more general version of the theorem is the source of the celebrated
Nakayama lemma in commutative algebra and algebraic geometry.See also
*
Bartel Leendert van der Waerden References
* citation
last1 = Atiyah
first1 = M. F.
author1-link = M. F. Atiyah
last2 = MacDonald
first2 = I. G.
author2-linke = I. G. MacDonald
year = 1969
title = Introduction to Commutative Algebra
publisher = Westview Press
isbn = 0-201-40751-5External links
* [http://planetmath.org/?op=getobj&from=objects&id=7308 A proof from PlanetMath.]
* [http://www.mathpages.com/home/kmath640/kmath640.htm The Cayley-Hamilton Theorem] at MathPages
Wikimedia Foundation. 2010.