Essential matrix

Essential matrix

In computer vision, the essential matrix is a 3 imes 3 matrix mathbf{E} , with some additional properties, which relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model.

Function

More specifically, if mathbf{y} and mathbf{y}' are homogeneous "normalized" image coordinates in image 1 and 2, respectively, then

: (mathbf{y}')^{T} , mathbf{E} , mathbf{y} = 0

if mathbf{y} and mathbf{y}' correspond to the same 3D point in the scene.

The above relation which defines the essential matrix was published in 1981 by Longuet-Higgins, introducing the concept to the computer vision community. Hartley & Zisserman's book reports that an analogous matrix appeared in photogrammetry long before that. Longuet-Higgins' paper includes an algorithm for estimating mathbf{E} from a set of corresponding normalized image coordinates as well as an algorithm for determining the relative position and orientation of the two cameras given that mathbf{E} is known. Finally, it shows how the 3D coordinates of the image points can be determined with the aid of the essential matrix.

Use

The essential matrix can be seen as a precursor to the fundamental matrix, which has proven to be of more practical use. Both matrices can be used for establishing constraints between matching image points, but the essential matrix can only be used in relation to calibrated cameras since the inner camera parameters must be known in order to achieve the normalization. If, however, the cameras are calibrated the essential matrix can be useful for determining both the relative position and orientation between the cameras and the 3D position of corresponding image points.

Derivation and definition

This derivation follows the paper by Longuet-Higgins.

Two normalized cameras project the 3D world onto their respective image planes. Let the 3D coordinates of a point P be (x_1, x_2, x_3) and (x'_1, x'_2, x'_3) relative to each camera's coordinate system. Since the cameras are normalized, the corresponding image coordinates are

:egin{pmatrix} y_1 \ y_2 end{pmatrix} = frac{1}{x_3} egin{pmatrix} x_1 \ x_2 end{pmatrix} and egin{pmatrix} y'_1 \ y'_2 end{pmatrix} = frac{1}{x'_3} egin{pmatrix} x'_1 \ x'_2 end{pmatrix}

A homogeneous representation of the two image coordinates is then given by

: egin{pmatrix} y_1 \ y_2 \ 1 end{pmatrix} = frac{1}{x_3} egin{pmatrix} x_1 \ x_2 \ x_{3} end{pmatrix} and egin{pmatrix} y'_1 \ y'_2 \ 1 end{pmatrix} = frac{1}{x'_3} egin{pmatrix} x'_1 \ x'_2 \ x'_{3} end{pmatrix}

which also can be written more compactly as

: mathbf{y} = frac{1}{x_{3 , ildemathbf{x} and mathbf{y}' = frac{1}{x'_{3 , ildemathbf{x}'

where mathbf{y} and mathbf{y}' are homogeneous representations of the 2D image coordinates and ildemathbf{x} and ildemathbf{x}' are proper 3D coordinates but in two different coordinate systems.

Another consequence of the normalized cameras is that their respective coordinate systems are related by means of a translation and rotation. This implies that the two sets of 3D coordinates are related as

: ildemathbf{x}' = mathbf{R} , ( ildemathbf{x} - mathbf{t})

where mathbf{R} is a 3 imes 3 rotation matrix and mathbf{t} is a 3-dimensional translation vector.

Define the essential matrix as

:

where [mathbf{t}] _{ imes} is the matrix representation of the cross product with mathbf{t} .

To see that this definition of the essential matrix describes a constraint on corresponding image coordinates multiply mathbf{E} from left and right with the 3D coordinates of point P in the two different coordinate systems:

: ( ildemathbf{x}')^{T} , mathbf{E} , ildemathbf{x} , stackrel{(1)}{=} ,( ildemathbf{x} - mathbf{t})^{T} , mathbf{R}^{T} , mathbf{R} , [mathbf{t}] _{ imes} , ildemathbf{x} , stackrel{(2)}{=} , ( ildemathbf{x} - mathbf{t})^{T} , [mathbf{t}] _{ imes} , ildemathbf{x} , stackrel{(3)}{=} , 0

# Insert the above relations between ildemathbf{x}' and ildemathbf{x} and the definition of mathbf{E} in terms of mathbf{R} and mathbf{t} .
# mathbf{R}^{T} , mathbf{R} = mathbf{I} since mathbf{R} is a rotation matrix.
# Properties of the matrix representation of the cross product.

Finally, it can be assumed that both x_{3} and x'_{3} are > 0, otherwise they are not visible in both cameras. This gives

: 0 = ( ildemathbf{x}')^{T} , mathbf{E} , ildemathbf{x} = frac{1}{x'_{3 ( ildemathbf{x}')^{T} , mathbf{E} , frac{1}{x_{3 ildemathbf{x} = (mathbf{y}')^{T} , mathbf{E} , mathbf{y}

which is the constraint that the essential matrix defines between corresponding image points.

Properties of the essential matrix

Not any arbitrary 3 imes 3 matrix can be an essential matrix for some stereo cameras. To see this notice that it is defined as the matrix product of one rotation matrix and one skew-symmetric matrix, both 3 imes 3 . The skew-symmetric matrix must have two singular values which are equal and another which is zero. The multiplication of the rotation matrix does not change the singular values which means that also the essential matrix has two singular values which are equal and one which is zero. The properties described here are sometimes referred to as "internal constraints" of the essential matrix.

If the essential matrix mathbf{E} is multiplied by a non-zero scalar, the result is again an essential matrix which defines exactly the same constraint as mathbf{E} does. This means that mathbf{E} can be seen as an element of a projective space, that is, two such matrices are equivalent if one is a non-zero scalar multiplication of the other. This is a relevant position, for example, if mathbf{E} is estimated from image data. However, it is also possible to take the position that mathbf{E} is defined as

: mathbf{E} = mathbf{R} , [mathbf{t}] _{ imes}

and then mathbf{E} has a well-defined "scaling". It depends on the application which position is the more relevant.

The essential matrix has five or six degrees of freedom, depending on whether or not it is seen as a projective element. The rotation matrix mathbf{R} and the translation vector mathbf{t} have three degrees of freedom each, in total six. If the essential matrix is considered as a projective element, however, one degree of freedom related to scalar multiplication must be subtracted leaving five degrees of freedom in total.

Estimation of the essential matrix

Given a set of corresponding image points it is possible to estimate an essential matrix which satisfies the defining epipolar constraint for all the points in the set. However, if the image points are subject to noise, which is the common case in any practical situation, it is not possible to find an essential matrix which satisfies all constraints exactly.

Depending on how the error related to each constraint is measured, it is possible to determine or estimate an essential matrix which optimally satisfies the constraints for a given set of corresponding image points. The most straight-forward approach is to set up a total least squares problem, commonly known as the eight-point algorithm.

Determining mathbf{R} and mathbf{t} from mathbf{E}

Given that the essential matrix has been determined for a stereo camera pair, for example, using the estimation method above this information can be used for determining also the rotation and translation (up to a scaling) between the two camera's coordinate systems. In these derivations mathbf{E} is seen as a projective element rather than having a well-determined scaling.

The following method for determining mathbf{R} and mathbf{t} is based on performing a SVD of mathbf{E} , see Hartley & Zisserman's book. It is also possible to determine mathbf{R} and mathbf{t} without an SVD, for example, following Longuet-Higgins' paper.

Finding one solution

An SVD of mathbf{E} gives

: mathbf{E} = mathbf{U} , mathbf{Sigma} , mathbf{V}^{T}

where mathbf{U} and mathbf{V} are orthogonal 3 imes 3 matrices and mathbf{Sigma} is a 3 imes 3 diagonal matrix with

: mathbf{Sigma} = egin{pmatrix} s & 0 & 0 \ 0 & s & 0 \ 0 & 0 & 0 end{pmatrix}

The diagonal entries of mathbf{Sigma} are the singular values of mathbf{E} which, according to the internal constraints of the essential matrix, must consist of two identical and one zero value. Define

: mathbf{W} = egin{pmatrix} 0 & -1 & 0 \ 1 & 0 & 0 \ 0 & 0 & 1 end{pmatrix} with mathbf{W}^{-1} = mathbf{W}^{T} =egin{pmatrix} 0 & 1 & 0 \ -1 & 0 & 0 \ 0 & 0 & 1 end{pmatrix}

and make the following ansatz

: [mathbf{t}] _{ imes} = mathbf{V} , mathbf{W} , mathbf{Sigma} , mathbf{V}^{T}

: mathbf{R} = mathbf{U} , mathbf{W}^{-1} , mathbf{V}^{T}

Since mathbf{Sigma} may not completely fulfill the constraints when dealing with real world data (f.e. camera images), the alternative

: [mathbf{t}] _{ imes} = mathbf{V} , mathbf{Z} , mathbf{V}^{T} with mathbf{Z} = egin{pmatrix} 0 & -1 & 0 \ 1 & 0 & 0 \ 0 & 0 & 0 end{pmatrix}

may help.

Showing that it is valid

First, these expressions for mathbf{R} and [mathbf{t}] _{ imes} do satisfy the defining equation for the essential matrix

: mathbf{R} , [mathbf{t}] _{ imes} = mathbf{U} , mathbf{W}^{-1} , mathbf{V}^{T} , mathbf{V} , mathbf{W} , mathbf{Sigma} , mathbf{V}^{T} = mathbf{U} , mathbf{Sigma} , mathbf{V}^{T} = mathbf{E}

Second, it must be shown that this [mathbf{t}] _{ imes} is a matrix representation of the cross product for some mathbf{t} . Since

: mathbf{W} , mathbf{Sigma} = egin{pmatrix} 0 & -s & 0 \ s & 0 & 0 \ 0 & 0 & 0 end{pmatrix}

it is the case that mathbf{W} , mathbf{Sigma} is skew-symmetric, i.e., (mathbf{W} , mathbf{Sigma})^{T} = - mathbf{W} , mathbf{Sigma} . This is also the case for our [mathbf{t}] _{ imes} , since

: ( [mathbf{t}] _{ imes})^{T} = mathbf{V} , (mathbf{W} , mathbf{Sigma})^{T} , mathbf{V}^{T} = - mathbf{V} , mathbf{W} , mathbf{Sigma} , mathbf{V}^{T} = - [mathbf{t}] _{ imes}

According to the general properties of the matrix representation of the cross product it then follows that [mathbf{t}] _{ imes} must be the cross product operator of exactly one vector mathbf{t} .

Third, it must also need to be shown that the above expression for mathbf{R} is a rotation matrix. It is the product of three matrices which all are orthogonal which means that mathbf{R}, too, is orthogonal or det(mathbf{R}) = pm 1 . To be a proper rotation matrix it must also satisfy det(mathbf{R}) = 1 . Since, in this case, mathbf{E} is seen as a projective element this can be accomplished by reversing the sign of mathbf{E} if necessary.

Finding all solutions

So far one possible solution for mathbf{R} and mathbf{t} has been established given mathbf{E} . It is, however, not the only possible solution and it may not even be a valid solution from a practical point of view. To begin with, since the scaling of mathbf{E} is undefined, the scaling of mathbf{t} it also undefined. It must be lie in the null space of mathbf{E} since

: mathbf{E} , mathbf{t} = mathbf{R} , [mathbf{t}] _{ imes} , mathbf{t} = mathbf{0}

For the subsequent analysis of the solutions, however, the exact scaling of mathbf{t} is not so important as its "sign", i.e., in which direction it points. Let hatmathbf{t} be normalized vector in the null space of mathbf{E} . It is then the case that both hatmathbf{t} and -hatmathbf{t} are valid translation vectors relative mathbf{E} . It is also possible to change mathbf{W} into mathbf{W}^{-1} in the derivations of mathbf{R} and mathbf{t} above. For the translation vector this only causes a change of sign, which has already been described as a possibility. For the rotation, on the other hand, this will produce a different transformation, at least in the general case.

To summarize, given mathbf{E} there are two opposite directions which are possible for mathbf{t} and two different rotations which are compatible with this essential matrix. In total this gives four classes of solutions for the rotation and translation between the two camera coordinate systems. On top of that, there is also an unknown scaling s > 0 for the chosen translation direction.

It turns out, however, that only one of the four classes of solutions can be realized in practice. Given a pair of corresponding image coordinates, three of the solutions will always produce a 3D point which lies "behind" at least one of the two cameras and therefore cannot be seen. Only one of the four classes will consistently produce 3D points which are in front of both cameras. This must then be the correct solution. Still, however, it has an undetermined positive scaling related to the translation component.

It should be noted that the above determination of mathbf{R} and mathbf{t} assumes that mathbf{E} satisfy the internal constraints of the essential matrix. If this is not the case which, for example, typically is the case if mathbf{E} has been estimated from real (and noisy) image data, it has to be assumed that it approximately satisfy the internal constraints. The vector hatmathbf{t} is then chosen as right singular vector of mathbf{E} corresponding to the smallest singular value.

3D points from corresponding image points

The problem to be solved there is how to compute (x_{1}, x_{2}, x_{3}) given corresponding normalized image coordinates (y_{1}, y_{2}) and (y'_{1}, y'_{2}) . If the essential matrix is known and the corresponding rotation and translation transformations have been determined, this algorithm (described in Longuet-Higgins' paper) provides a solution.

Let mathbf{r}_{k} denote row "k" of the rotation matrix mathbf{R} :

: mathbf{R} = egin{pmatrix} - mathbf{r}_{1} - \ - mathbf{r}_{2} - \ - mathbf{r}_{3} - end{pmatrix}

Combining the above relations between 3D coordinates in the two coordinate systems and the mapping between 3D and 2D points described earlier gives

: y'_{1} = frac{x'_{1{x'_{3 = frac{mathbf{r}_{1} , ( ildemathbf{x} - mathbf{t})}{mathbf{r}_{3} , ( ildemathbf{x} - mathbf{t})} = frac{mathbf{r}_{1} , (mathbf{y} - mathbf{t}/x_{3})}{mathbf{r}_{3} , (mathbf{y} - mathbf{t}/x_{3})}

or

:x_{3} = frac{ (mathbf{r}_{1} - y'_{1} , mathbf{r}_{3}) , mathbf{t} }{ (mathbf{r}_{1} - y'_{1} , mathbf{r}_{3}) , mathbf{y} }

Once x_{3} is determined, the other two coordinates can be computed as

: egin{pmatrix} x_1 \ x_2 end{pmatrix} = x_3 egin{pmatrix} y_1 \ y_2 end{pmatrix}

The above derivation is not unique. It is also possible to start with an expression for y'_{2} and derive an expression for x_{3} according to

:x_{3} = frac{ (mathbf{r}_{2} - y'_{2} , mathbf{r}_{3}) , mathbf{t} }{ (mathbf{r}_{2} - y'_{2} , mathbf{r}_{3}) , mathbf{y} }

In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for x_{3} are equal. In practice, however, they are not and it may be advantageous to combine the two estimates of x_{3} , for example, in terms of some sort of average.

There are also other types of extensions of the above computations which are possible. They started with an expression of the primed image coordinates and derived 3D coordinates in the unprimed system. It is also possible to start with unprimed image coordinates and obtain primed 3D coordinates, which finally can be transformed into unprimed 3D coordinates. Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate.

A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector mathbf{t} is known only up to an unknown positive scaling. As a consequence, the reconstructed 3D points, too, are undetermined with respect to a positive scaling.

References


* cite journal
title=A computer algorithm for reconstructing a scene from two projections
author=H. Christopher Longuet-Higgins
journal=Nature
year=1981
month=Sep
volume=293
pages=133–135
doi=10.1038/293133a0

* cite book
author=Richard Hartley and Andrew Zisserman
title=Multiple View Geometry in computer vision
publisher=Cambridge University Press
year=2003
id=ISBN 978-0-521-54051-3

* cite book
author=Yi Ma
coauthors=Stefano Soatto, Jana Košecká and S. Shankar Sastry
title=An Invitation to 3-D Vision
publisher=Springer
year=2004

* cite book
author=Gang Xu and Zhengyou Zhang
title=Epipolar geometry in Stereo, Motion and Object Recognition
publisher=Kluwer Academic Publishers
year=1996
id=ISBN 978-0-7923-4199-4

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Essential Mixes — альбом ремиксов Аврил Лавин Дата выпуска сентябрь 20, 2010 (2010 09 20) …   Википедия

  • Matrix planting — is a form of self sustaining gardening, with a focus on attractive rather than food bearing plants. Contents 1 The idea 2 Related ideas 3 See also 4 References …   Wikipedia

  • Matrix (mathematics) — Specific elements of a matrix are often denoted by a variable with two subscripts. For instance, a2,1 represents the element at the second row and first column of a matrix A. In mathematics, a matrix (plural matrices, or less commonly matrixes)… …   Wikipedia

  • Matrix exponential — In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. Abstractly, the matrix exponential gives the connection between a matrix Lie algebra and the corresponding Lie group.… …   Wikipedia

  • Essential Factors — The Essential Factors model [http://www.intersafe.com.au/essentialSafety.html] [Geoff McDonald Associates Pty Ltd, HAZARD EVALUATION USING ESSENTIAL FACTORS METHODOLOGY, May 1991] [Vidmark Productions Limited, THE ESSENTIAL FACTORS OF ACCIDENTS]… …   Wikipedia

  • matrix sentence — noun : the one of a pair of sentences joined by means of a transformation that keeps its essential external structure and syntactic status in “the book that I want is gone”, “the book is gone” is the matrix sentence * * * Ling. a sentence in… …   Useful english dictionary

  • Fundamental matrix (computer vision) — In computer vision, the fundamental matrix mathbf{F} is a 3 imes 3 matrix of rank 2 which relates corresponding points in stereo images. In epipolar geometry, with homogeneous image coordinates mathbf{y 1} and mathbf{y 2} of corresponding points… …   Wikipedia

  • Ceramic matrix composite — Fracture surface of a fiber reinforced ceramic composed of SiC fibers and SiC matrix. The fiber pull out mechanism shown is the key to CMC properties …   Wikipedia

  • Extracellular matrix — Illustration depicting extracellular matrix (basement membrane and interstitial matrix) in relation to epithelium, endothelium and connective tissue Latin …   Wikipedia

  • Rotation matrix — In linear algebra, a rotation matrix is a matrix that is used to perform a rotation in Euclidean space. For example the matrix rotates points in the xy Cartesian plane counterclockwise through an angle θ about the origin of the Cartesian… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”