Cayley–Hamilton theorem
From Wikipedia, the free encyclopedia
If A is a given n×n matrix and In is the n×n identity matrix, then the characteristic polynomial of A is defined as[4]
The theorem was first proved in 1853[5] in terms of inverses of linear functions of quaternions, a non-commutative ring, by Hamilton.[6][7][8] This corresponds to the special case of certain real 4 × 4 real or 2 × 2 complex matrices. The theorem holds for general quaternionic matrices.[9][nb 1] Cayley in 1858 stated it for 3 × 3 and smaller matrices, but only published a proof for the 2 × 2 case.[2] The general case was first proved by Frobenius in 1878.[10]
Contents
Example
As a concrete example, let
- .
Illustration for specific dimensions and practical applications
For a 2×2 matrix,
For a general n×n invertible matrix A, i.e., one with nonzero determinant, A−1 can thus be written as an (n − 1)-th order polynomial expression in A: As indicated, the Cayley–Hamilton theorem amounts to the identity
This can then be written as
In fact, this expression, ½((trA)2 − tr(A2)), always gives the coefficient cn−2 of λn−2 in the characteristic polynomial of any n×n matrix; so, for a 3×3 matrix A, the statement of the Cayley–Hamilton theorem can also be written as
Similarly, one can write for a 4×4 matrix A,
A practical method for obtaining these coefficients ck for a general n×n matrix, yielding the above ones virtually by inspection, provided no root be zero, relies on the following alternative expression for the determinant,
Differentiation of this expression with respect to λ allows determination of the generic coefficients of the characteristic polynomial for general n, as determinants of m×m matrices,[nb 2]
For instance, the concrete 2×2 example above can be written as
See also: Determinant § Relation to eigenvalues and trace and Characteristic polynomial § PropertiesPower series
It is possible to define matrix valued functions of a real variable by giving its power series. An example of such usage is the exponential map from the Lie algebra of a matrix Lie group into the group. It is given by
Proving the theorem in general
As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an n×n matrix
Preliminaries
While this provides a valid proof (for matrices over the complex numbers), the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any commutative ring.
There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.
Adjugate matrices
All proofs below use the notion of the adjugate matrix adj(M) of an n×n matrix M, the transpose of its cofactor matrix.
This is a matrix whose coefficients are given by polynomial expressions in the coefficients of M (in fact, by certain (n − 1)×(n − 1) determinants), in such a way that the following fundamental relations hold,
Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.
A direct algebraic proof
This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix t In −A whose determinant is the characteristic polynomial of A is such a matrix, and since polynomials form a commutative ring, it has an adjugate
Now one can expand the matrix product in our equation by bilinearity
Such an equality can hold only if in any matrix position the entry that is multiplied by a given power ti is the same on both sides; it follows that the constant matrices with coefficient ti in both expressions must be equal. Writing these equations then for i from n down to 0, one finds
A proof using polynomials with matrix coefficients
This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting. Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of polynomial functions, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in t with matrix coefficients, the variable t must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set t to a specific value.
- .
At this point, it is tempting to simply set t equal to the matrix A, which makes the first factor on the left equal to the null matrix, and the right hand side equal to p(A); however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map" evA : M[t] → M, which replaces each ti by the matrix power Ai of A, where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a ring homomorphism: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product is defined assuming that t commutes with N, but this may fail if t is replaced by the matrix A.
One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix A is in the center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting t with coefficients is now justified after evaluation). Now A is not always in the center of M, but we may replace M with a smaller ring provided it contains all the coefficients of the polynomials in question: , A, and the coefficients of the polynomial B. The obvious choice for such a subring is the centralizer Z of A, the subring of all matrices that commute with A; by definition A is in the center of Z. This centralizer obviously contains , and A, but one has to show that it contains the matrices . To do this one combines the two fundamental relations for adjugates, writing out the adjugate B as a polynomial:
A synthesis of the first two proofs
In the first proof, one was able to determine the coefficients Bi of B based on the right hand fundamental relation for the adjugate only. In fact the first n equations derived can be interpreted as determining the quotient B of the Euclidean division of the polynomial on the left by the monic polynomial , while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial P is defined, and always produces a unique quotient and remainder with the same degree condition as in the commutative case, provided it is specified at which side one wishes P to be a factor (here that is to the left). To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write as and observe that since P is monic, cannot have a degree less than that of P, unless .
But the dividend and divisor used here both lie in the subring (R[A])[t], where R[A] is the subring of the matrix ring M(n, R) generated by A: the R-linear span of all powers of A. Therefore the Euclidean division can in fact be performed within that commutative polynomial ring, and of course it then gives the same quotient B and remainder 0 as in the larger ring; in particular this shows that B in fact lies in (R[A])[t]. But in this commutative setting it is valid to set t to A in the equation , in other words apply the evaluation map
In addition to proving the theorem, the above argument tells us that the coefficients Bi of B are polynomials in A, while from the second proof we only knew that they lie in the centralizer Z of A; in general Z is a larger subring than R[A], and not necessarily commutative. In particular the constant term lies in R[A]. Since A is an arbitrary square matrix, this proves that adj(A) can always be expressed as a polynomial in A (with coefficients that depend on A), something that is not obvious from the definition of the adjugate matrix. In fact the equations found in the first proof allow successively expressing as polynomials in A, which leads to the identity
A proof using matrices of endomorphisms
As was mentioned above, the matrix p(A) in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix A for t; doing that substitution into the matrix before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where p(A) is obtained directly as the value of a certain determinant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries of A, and all of A itself. One could take for this the ring M(n, R) of n×n matrices over R, where the entry is realised as , and A as itself. But considering matrices with matrices as entries might cause confusion with block matrices, which is not intended, as that gives the wrong notion of determinant (recall that the determinant of a matrix is defined as a sum of products of its entries, and in the case of a block matrix this is generally not the same as the corresponding sum of products of its blocks!). It is clearer to distinguish A from the endomorphism φ of an n-dimensional vector space V (or free R-module if R is not a field) defined by it in a basis e1, ..., en, and to take matrices over the ring End(V) of all such endomorphisms. Then φ ∈ End(V) is a possible matrix entry, while A designates the element of M(n, End(V)) whose i,j entry is endomorphism of scalar multiplication by ; similarly In will be interpreted as element of M(n, End(V)). However, since End(V) is not a commutative ring, no determinant is defined on M(n, End(V)); this can only be done for matrices over a commutative subring of End(V). Now the entries of the matrix all lie in the subring R[φ] generated by the identity and φ, which is commutative. Then a determinant map M(n, R[φ]) → R[φ] is defined, and evaluates to the value p(φ) of the characteristic polynomial of A at φ (this holds independently of the relation between A and φ); the Cayley–Hamilton theorem states that p(φ) is the null endomorphism.
In this form, the following proof can be obtained from that of (Atiyah & MacDonald 1969, Prop. 2.4) (which in fact is the more general statement related to the Nakayama lemma; one takes for the ideal in that proposition the whole ring R). The fact that A is the matrix of φ in the basis e1, ..., en means that
One additional fact that follows from this proof is that the matrix A whose characteristic polynomial is taken need not be identical to the value φ substituted into that polynomial; it suffices that φ be an endomorphism of V satisfying the initial equations
A bogus "proof": p(A) = det(AIn − A) = det(A − A) = 0
One elementary but incorrect argument for the theorem is to "simply" take the definition
Actually, if such an argument holds, it should also hold when other multilinear forms instead of determinant is used. For instance, if we consider the permanent function and define , then by the same argument, we should be able to "prove" that q(A) = 0. But this statement is demonstrably wrong. In the 2-dimensional case, for instance, the permanent of a matrix is given by
Abstraction and generalizations
The above proofs show that the Cayley–Hamilton theorem holds for matrices with entries in any commutative ring R, and that p(φ) = 0 will hold whenever φ is an endomorphism of an R module generated by elements e1,...,en that satisfies
See also
Remarks
-
- Due
to the non-commutative nature of the multiplication operation for
quaternions and related constructions, care needs to be taken with
definitions, most notably in this context, for the determinant. The
theorem holds as well for the slightly less well-behaved split-quaternions, see Alagös, Oral & Yüce (2012). The rings of quaternions and split-quaternions can both be represented by certain 2 × 2 complex matrices. (When restricted to unit norm, these are the groups SU(2) and SU(1, 1) respectively.) Therefore it is not surprising that the theorem holds.
There is no such matrix representation for the octonions, since the multiplication operation is not associative in this case. However, a modified Cayley-Hamilton theorem still holds for the octonions, see Tian (2000).
Sunday, 1 November 2015
Cayley–Hamilton theorem
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment