Camera matrix

In computer vision a camera matrix or (camera) projection matrix is a $3 imes 4$ matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.

Let $mathbf{x}$ be a representation of a 3D point in homogeneous coordinates (a 4-dimensional vector), and let $mathbf{y}$ be a representation of the image of this point in the pinhole camera (a 3-dimensional vector). Then the following relation holds

: $mathbf{y} sim mathbf{C} , mathbf{x}$

where $mathbf{C}$ is the camera matrix and the $, sim$ sign implies that the left and right hand sides are equal up to a non-zero scalar multiplication.

Since the camera matrix $mathbf{C}$ is involved in the mapping between elements of two projective spaces, it too can be regarded as a projective element. This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.

Derivation

The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the pinhole camera model is given by

: $egin{pmatrix} y_1 \ y_2 end{pmatrix} = frac{f}{x_3} egin{pmatrix} x_1 \ x_2 end{pmatrix}$

where $(x_1, x_2, x_3)$ are the 3D coordinates of P relative to a camera centered coordinate system, $(y_1, y_2)$ are the resulting image coordinates, and "f" is the camera's focal length.

To derive the camera matrix this expression is rewritten in terms of homogeneous coordinates. Instead of the 2D vector $(y_1,y_2)$ we consider the projective element (a 3D vector) $(y_1,y_2,1)$ and instead of equality we consider equality up to scaling by a non-zero number, denoted $, sim$ . First, we write the homogeneous image coordinates as expressions in the usual 3D coordinates.

: $egin{pmatrix} y_1 \ y_2 \ 1 end{pmatrix} sim -frac{f}{x_3} egin{pmatrix} x_1 \ x_2 \ -frac{x_3}{f} end{pmatrix} sim egin{pmatrix} x_1 \ x_2 \ -frac{x_3}{f} end{pmatrix}$

Finally, also the 3D coordinates are expressed in a homogeneous representation and this is how the camera matrix appears:

: $egin{pmatrix} y_1 \ y_2 \ 1 end{pmatrix} sim egin{pmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & frac{-1}{f} & 0 end{pmatrix} , egin{pmatrix} x_1 \ x_2 \ x_3 \ 1 end{pmatrix}$ or $mathbf{y} sim mathbf{C} , mathbf{x}$

where $mathbf{C}$ is the camera matrix, which here is given by

: $mathbf{C} = egin{pmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & frac{-1}{f} & 0 end{pmatrix}$

, and the corresponding camera matrix now becomes

: $mathbf{C} = egin{pmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & frac{1}{f} & 0 end{pmatrix} sim egin{pmatrix} f & 0 & 0 & 0 \ 0 & f & 0 & 0 \ 0 & 0 & 1 & 0 end{pmatrix}$

The last step is a consequence of $mathbf{C}$ itself being a projective element.

The camera matrix derived here may appear trivial in the sense that it contains very few non-zero element. This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points. In practice, however, other forms of camera matrices are common, as will be shown below.

The camera focal point

The camera matrix $mathbf{C}$ derived in the previous section has a null space which is spanned by the vector

: $mathbf{n} = egin{pmatrix} 0 \ 0 \ 0 \ 1 end{pmatrix}$

This is also the homogeneous representation of the 3D point which has coordinates (0,0,0), that is, the camera focal point O. This means that the focal point (and only this point) cannot be mapped to a particular point in the image plane by the camera. This is consistent with the fact that the projection line becomes ambiguous in this case.

Normalized camera matrix and normalized image coordinates

The camera matrix derived above can be simplified even further if we assume that "f = 1":

: $mathbf{C}_{0} = egin{pmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & 1 & 0 end{pmatrix} = left ( egin{array}{c|c} mathbf{I} & mathbf{0} end{array}
ight )$

where $mathbf{I}$ here denotes a $3 imes 3$ identity matrix. Note that $3 imes 4$ matrix $mathbf{C}$ here is divided into a concatenation of a $3 imes 3$ matrix and a 3-dimensional vector. The camera matrix $mathbf{C}_{0}$ is sometimes referred to as a "canonical form".

So far all points in the 3D world have been represented in a "camera centered" coordinate system, that is, a coordinate system which has its origin at the camera focal point. In practice however, the 3D points may be represented in terms of coordinates relative to an arbitrary coordinate system (X1',X2',X3'). Assuming that the camera coordinate axes (X1,X2,X3) and the axes (X1',X2',X3') are of Euclidean type (orthogonal and isotropic), there is a unique Euclidean 3D transformation (rotation and translation) between the two coordinate systems.

The two operations of rotation and translation of 3D coordinates can be represented as the two $4 imes 4$ matrices

: $left ( egin{array}{c|c} mathbf{R} & mathbf{0} \ hline mathbf{0} & 1 end{array}
ight )$ and $left ( egin{array}{c|c} mathbf{I} & mathbf{t} \ hline mathbf{0} & 1 end{array}
ight )$

where $mathbf{R}$ is a $3 imes 3$ rotation matrix and $mathbf{t}$ is a 3-dimensional translation vector. When the first matrix is multiplied onto the homogeneous representation of a 3D point, the result is the homogeneous representation of the rotated point, and the second matrix performs instead a translation. Performing the two operations in sequence gives a combined rotation and translation matrix

: $left ( egin{array}{c|c} mathbf{R} & mathbf{t} \ hline mathbf{0} & 1 end{array}
ight )$

Assuming that $mathbf{R}$ and $mathbf{t}$ are precisely the rotation and translations which relate the two coordinate system (X1,X2,X3) and (X1',X2',X3') above, this implies that

: $mathbf{x} = left ( egin{array}{c|c} mathbf{R} & mathbf{t} \ hline mathbf{0} & 1 end{array}
ight ) mathbf{x}'$

where $mathbf{x}'$ is the homogeneous representation of the point P in the coordinate system (X1',X2',X3').

Assuming also that the camera matrix is given by $mathbf{C}_{0}$ , the mapping from the coordinates in the (X1',X2',X3') system to homogeneous image coordinates becomes

: $mathbf{y} sim mathbf{C}_{0} , mathbf{x} = left ( egin{array}{c|c} mathbf{I} & mathbf{0} end{array}
ight ) , left ( egin{array}{c|c} mathbf{R} & mathbf{t} \ hline mathbf{0} & 1 end{array}
ight ) mathbf{x}' = left ( egin{array}{c|c} mathbf{R} & mathbf{t} end{array}
ight ) , mathbf{x}'$

Consequently, the camera matrix which relates points in the coordinate system (X1',X2',X3') to image coordinates is

: $mathbf{C}_{N} = left ( egin{array}{c|c} mathbf{R} & mathbf{t} end{array}
ight )$

a concatenation of a 3D rotation matrix and a 3-dimensional translation vector.

This type of camera matrix is referred to as a "normalized camera matrix", it assumes focal length = 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system. The resulting image coordinates are referred to as "normalized image coordinates".

The camera focal point

The null space of the normalized camera matrix $mathbf{C}_{N}$ described above is spanned by the 4-dimensional vector

: $mathbf{n} = egin{pmatrix} -mathbf{R}^{-1} , mathbf{t} \ 1 end{pmatrix} = egin{pmatrix} ildemathbf{n} \ 1 end{pmatrix}$

This is also, again, the coordinates of the focal point but now relative to the (X1',X2',X3') system. This can be seen by applying first the rotation and then the translation to the 3-dimensional vector $ildemathbf{n}$ and the result is the homogeneous representation of 3D coordinates (0,0,0).

This implies that the focal point (in its homogeneous representation) lies in the null space of the camera matrix, provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to.

The normalized camera matrix $mathbf{C}_{N}$ can now be written as

: $mathbf{C}_{N} = mathbf{R} , left ( egin{array}{c|c} mathbf{I} & mathbf{R}^{-1} , mathbf{t} end{array}
ight ) = mathbf{R} , left ( egin{array}{c|c} mathbf{I} & - ildemathbf{n} end{array}
ight )$

where $ildemathbf{n}$ is the 3D coordinates of the focal point relative to the (X1',X2',X3') system.

General camera matrix

Given the mapping produced by a normalized camera matrix, the resulting normalized image coordinates can be transformed by means of an arbitrary 2D homography. This includes 2D translations and rotations as well as scaling (isotropic and anisotropic) but also general 2D perspective transformations. Such a transformation can be represented as a $3 imes 3$ matrix $mathbf{H}$ which maps the homogeneous normalized image coordinates $mathbf{y}$ to the homogeneous transformed image coordinates $mathbf{y}'$ :

: $mathbf{y}' = mathbf{H} , mathbf{y}$

Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives

: $mathbf{y}' = mathbf{H} , mathbf{C}_{N} , mathbf{x}'$

This produces the most general form of camera matrix

: $mathbf{C} = mathbf{H} , mathbf{C}_{N} = mathbf{H} , left ( egin{array}{c|c} mathbf{R} & mathbf{t} end{array}
ight )$

References

* cite book
author=Richard Hartley and Andrew Zisserman
title=Multiple View Geometry in computer vision
publisher=Cambridge University Press
year=2003
id=ISBN 0-521-54051-8

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

Camera resectioning — (often called camera calibration) is the process of finding the true parameters of the camera that produced a given photograph or video. These parameters characterize the transformation that maps 3D points in the scene to 2D points in the camera… … Wikipedia
Matrix — Données clés Titre original The Matrix … Wikipédia en Français
Matrix (monde imaginaire) — Matrix Matrix Titre original The Matrix Titre québécois La Matrice Réalisation Andy et Larry Wachowski Acteurs principaux … Wikipédia en Français
Matrix (univers de fiction) — Matrix Matrix Titre original The Matrix Titre québécois La Matrice Réalisation Andy et Larry Wachowski Acteurs principaux … Wikipédia en Français
Matrix (mathematics) — Specific elements of a matrix are often denoted by a variable with two subscripts. For instance, a2,1 represents the element at the second row and first column of a matrix A. In mathematics, a matrix (plural matrices, or less commonly matrixes)… … Wikipedia
Caméra intelligente — Une caméra Intelligente [1] est un système de vision (Vision industrielle) compact qui capture des images et les interpréte. Sommaire 1 Description 2 Utilisation 3 Caractéri … Wikipédia en Français
Pinhole camera model — The pinhole camera model describes the mathematical relationship between the coordinates of a 3D point and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to … Wikipedia
The Matrix — Matrix Matrix Titre original The Matrix Titre québécois La Matrice Réalisation Andy et Larry Wachowski Acteurs principaux … Wikipédia en Français
The Matrix — For the series, see The Matrix (franchise). For other uses, see Matrix. The Matrix Theatrical release poster Directed by Andy Wachowski Larry Wachowski … Wikipedia
Calibrage de caméra — Calibration de caméra En traitement d image, l opération de calibration de caméra revient à modéliser le processus de formation des images, c est à dire trouver la relation entre les coordonnées spatiales d un point de l espace avec le point… … Wikipédia en Français

Academic Dictionaries and Encyclopedias

Camera matrix

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Camera matrix

Look at other dictionaries:

Share the article and excerpts

Direct link