Shape context

Shape context

Shape Context is the term given by Serge Belongie and Jitendra Malik to the feature descriptor they first proposed in their paper "Matching with Shape Contexts" in 2000cite conference
author = S. Belongie and J. Malik
title = Matching with Shape Contexts
url =
booktitle = IEEE Workshop on Contentbased Access of Image and Video Libraries (CBAIVL-2000)
year = 2000
] . Shape context can be used in object recognition.


The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences . The basic idea is to pick n points on the contours of a shape. For each point p_i on the shape, consider the n - 1 vectors obtained by connecting p_i to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed. The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point p_i, the coarse histogram of the relative coordinates of the remaining n - 1 points,

h_i(k) = #{q e p_i : (q - p_i) in mbox{bin}(k)}

is defined to be the shape context of p_i. The bins are normally taken to be uniform in log-polar space. The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown.

(a) and (b) are the sampled edge points of the two shapes. (c) is the diagram of the log-polar bins used to compute the shape context. (d) is the shape context for the circle, (e) is that for the diamond, and (f) is that for the triangle. As can be seen, since (d) and (e) are the shape contexts for two closely related points, they are quite similar, while the shape context in (f) is very different.

Now in order for a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scale, small perturbations, and depending on application rotation. Translational invariance come naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance alpha between all the point pairs in the shape cite journal
author = S. Belongie, J. Malik, and J. Puzicha
title = Shape Matching and Object Recognition Using Shape Contexts
url =
journal = IEEE Transactions on Pattern Analysis and Machine Intelligence
volume = 24
issue = 24
date = April 2002
pages = 509–521
doi = 10.1109/34.993558
] cite conference
author = S. Belongie, J. Malik, and J. Puzicha
title = Matching Shapes
url =
booktitle = Eighth IEEE International Conference on Computer Vision (July 2001)
date=July 2001
] although the median distance can also be usedcite conference
author = S. Belongie, J. Malik, and J. Puzicha
title = Shape Context: A new descriptor for shape matching and object recognition
booktitle = NIPS 2000
year = 2000
] . Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers using synthetic point set matching experiments [cite conference
author = H. Chui and A. Rangarajan
title = A new algorithm for non-rigid point matching
booktitle = CVPR
volume = 2
pages = 44-51
date=June 2000
] .

One can provide complete rotation invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotation invariance e.g. distinguishing a "6" from a "9".

Use in Shape Matching

A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the #Details of Implementation section):

# Finding a list of points on shape edges
# Computing the shape context of each point found in step 1
# Calculating the cost of matching each point in the first shape to each point of the second shape
# Find the one-to-one matching that minimizes the total cost of matching. This is an instance of the assignment problem.
# Find a transformation (e.g. Affine, Thin plate spline, etc) that maps one shape to the other (essentially aligning the two shapes)
# Calculate the "shape distance" between the two shapes as a weighted sum of the shape context distance, image appearance distance, and bending energy (a measure of how much transformation is required to bring the two shapes into alignment)

Now that the shape distance has been calculated, the distance can be used in a nearest-neighbor classifier for a number of different object recognition problems.

Details of Implementation

tep 1: Finding a list of points on shape edges

The approach essentially assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object. These can be simply obtained using the Canny edge detector and picking a random set of points from the edges. Note that these points need not and in general do not correspond to key-points such as maxima of curvature or inflection points. It is preferable to sample the shape with roughly uniform spacing, though it is not critical .

tep 2: Computing the shape context

This step is described in detail in the Theory section.

tep 3: Computing the cost matrix

Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k). As shape contexts are distributions represented as histograms, it is natural to use the chi^2 test statistic as the "shape context cost" of matching the two points:

C_S = frac{1}{2}sum_{k=1}^K frac{ [g(k) - h(k)] ^2}{g(k) + h(k)}

The values of this range from 0 to 1 .In addition to the shape context cost, an extra cost based on the appearance can be added. For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition):

C_A = frac{1}{2}egin{Vmatrix} dbinom{cos( heta_1)}{sin( heta_1)} - dbinom{cos( heta_2)}{sin( heta_2)}end{Vmatrix}

This is half the length of the chord in unit circle between the unit vectors with angles heta_1 and heta_2. Its values also range from 0 to 1. Now the total cost of matching the two points could be a weighted-sum of the two costs:

C = (1 - eta)C_S + eta C_A!,

Now for each point p_i on the first shape and a point q_j on the second shape, calculate the cost as described and call it C_{i,j}. This is the cost matrix.

tep 4: Finding the matching that minimizes total cost

Now, a one-to-one matching pi (i)! that matches each point p_i on shape 1 and q_j on shape 2 that minimizes the total cost of matching,

H(pi) = sum_i Cleft (p_i,q_{pi (i)} ight )

is needed. This can be done in O(N^3) time using the Hungarian method, although there are more efficient algorithms [cite journal
author = R. Jonker and A. Volgenant
title = A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems
journal = Computing
volume = 38
pages = 325–340
year = 1987
doi = 10.1007/BF02278710
] .To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix. This would cause the matching algorithm to match outliers to a "dummy" if there is no real match.

tep 5: Modeling Transformation

Given the set of correspondences between a finite set of points on the two shapes, a transformation T : Bbb{R}^2 o Bbb{R}^2 can be estimated to map any point from one shape to the other. There are several choices for this transformation, described below.


The affine model is a standard choice: T(p) = Ap + o!. The least squares solution for the matrix A and the translational offset vector o is obtained by:

o = frac{1}{n}sum_{i=1}^n left (p_i - q_{pi(i)} ight ), A = (Q^+ P)^t

Where P = egin{pmatrix} 1 & p_{11} & p_{12} \ vdots & vdots & vdots \ 1 & p_{n1} & p_{n2}end{pmatrix} with a similar expression for Q!. Q^+! is the pseudoinverse of Q!.

Thin Plate Spline

The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform: T(x,y) = left (f_x(x,y),f_y(x,y) ight )where each of the f_x! and f_y! have the form: f(x,y) = a_1 + a_xx + a_yy + sum_{i=1}^nomega_iUleft (egin{Vmatrix}(x_i,y_i) - (x,y) end{Vmatrix} ight ),

and the kernel function U(r)! is defined by U(r) = r^2log r^2!. The exact details of how to solve for the parameters can be found elsewhere [cite conference
author = M.J.D. Powell
title = A Thin Plate Spline Method for Mapping Curves into Curves in Two Dimensions
booktitle = Computational Techniques and Applications (CTAC '95)
year = 1995
] [cite journal
author = J. Duchon
title = Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces
journal = Constructive Theory of Functions of Several Variables
pages = 85–100
] but it essentially involves solving a linear system of equations. The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained.

Regularized TPS

The TPS formulation above has exact matching requirement for the pairs of points on the two shapes. For noisy data, it is best to relax this exact requirement. If we let v_i denote the target function values at corresponding locations p_i = (x_i,y_i) (Note that for f_x, v_i would x' the x-coordinate of the point corresponding to p_i and for f_y it would be the y-coordinate, y'), relaxing the requirement amounts to minimizing H [f] = sum_{i=1}^n(v_i - f(x_i,y_i))^2 + lambda I_f where I_f! is the bending energy and lambda! is called the regularization parameter. This f that minimizes H [f] can be found in a fairly straightforward way [cite book
author = G. Wahba
title = Spline Models for Observational Data
publisher = Soc. Industrial and Applied Math
year = 1990
] . If one uses normalize coordinates for (x_i,y_i)mbox{ and } (x'_i,y'_i), then scale invariance is kept. However, if one uses the original non-normalized coordinates, then the regularization parameter needs to be normalized.

Note that in many cases, regardless of the transformation used, the initial estimate of the correspondences contains some errors which could reduce the quality of the transformation. If we iterate the steps of finding correspondences and estimating transformations (i.e. repeating steps 2-5 with the newly transformed shape) we can overcome this problem. Typically, three iterations are all that is needed to obtain reasonable results.

tep 6: Computing the shape distance

Now, a shape distance between two shapes P! and Q!. This distance is going to be a weighted sum of three potential terms:

Shape context distance: this is the symmetric sum of shape context matching costs over best matching points: D_{sc}(P,Q) = frac{1}{n}sum_{p in P} arg underset{q in Q}{min} C(p,T(q)) + frac{1}{m}sum_{q in Q} arg underset{p in P}{min} C(p,T(q))

Where T(.) is the estimated TPS transform that maps the points in Q to those in P.

Appearance cost: After establishing image correspondences and properly warping one image to match the other, one can define an appearance cost as the sum of squared brightness differences in Gaussian windows around corresponding image points:D_{ac}(P,Q) = frac{1}{n}sum_{i=1}^nsum_{Delta in Z^2} G(Delta)left [I_P(p_i + Delta) - I_Q(T(q_{pi(i)}) + Delta) ight ] ^2

where I_P! and I_Q! are the gray-level images (I_Q! is the image after warping) and G! is a Gaussian windowing function.

Transformation cost: The final cost D_{be}(P,Q)!, measures how much transformation is necessary to bring the two images into alignment. In the case of TPS, it is assigned to be the bending energy.

Now that we have a way of calculating the distance between two shapes, we can use a nearest neighbor classifier (k-NN) with distance defined as the shape distance calculated here. The results of applying this to different situations is given in the following section.


Digit Recognition

The authors Serge Belongie and Jitendra Malik tested their approach on the [ MNIST dataset of handwritten digits] . Currently, more than 50 algorithms have been tested on the database. The database has a training set of 60,000 examples, and a test set of 10,000 examples. The error rate for this approach was 0.63% using 20,000 training examples and 3-NN. At the time of publication, this error rate was the lowest. Currently, the lowest error rate is 0.39%.

ilhouette Similarity-based Retrieval

The authors experimented with the MPEG-7 shape silhouette database, performing Core Experiment CE-Shape-1 part B, which measures performance of similarity-based retrieval [cite journal
author = S. Jeannin and M. Bober
title = Description of core experiments for MPEG-7 motion/shape. Technical Report ISO/IEC JTC 1/SC 29/WG 11 MPEG99/N2690, MPEG-7, Seoul
date=March 1999
] . The database has 70 shape categories and 20 images per shape category. Performance of a retrieval scheme is tested by using each image as a query and counting the number of correct images in the top 40 matches. For this experiment, the authors increased the amount of points sampled from each shape. Also, since the shapes in the database sometimes were rotated or flipped, the authors took defined the distance between a reference shape and query shape to be minimum shape distance between the query shape and either the unchanged reference, the vertically flipped, or the reference horizontally flipped . With these changes, they obtained a retrieval rate of 76.45%, which by 2002 was the best.

3D Object Recognition

The next experiment performed on shape contexts involved the 20 common household objects in the [ Columbia Object Image Library (COIL-20)] . Each object has 72 views in the database. In the experiment, the method was trained on a number of equally spaced views for each object and the remaining views were used for testing. A 1-NN classifier was used. The results are shown to the right. The authors also developed an "editing" algorithm based on shape context similarity and k-medoid clustering that improved on their performance .

Trademark Retrieval

Shape contexts were used to retrieve the closest matching trademarks from a database to a query trademark (useful in detecting trademark infringement). The figure to the left depicts nearest neighbor retrieval results from a database of 300 trademarks. No visually similar trademark was missed by the algorithm (verified manually by the authors).

External links

* [ Matching with Shape Contexts]
* [ MNIST database of handwritten digits]
* [ Columbia Object Image Library (COIL-20)]
* [ Caltech101 Database]


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Shape context — Construction d un contexte de contour. Shape context, que l on peut traduire par contexte de contour, est un algorithme de détection de caractéristique et un descripteur, présenté par des chercheurs de l université de Californie à Berkeley pour… …   Wikipédia en Français

  • Shape grammar — Shape grammars in computation are a specific class of production systems that generate geometric shapes. With shape grammars, forms can be created that are not stored in the computer previously. Shape grammars have been studied in particular in… …   Wikipedia

  • Context-sensitive help — is a kind of online help that is obtained from a specific point in the state of the software, providing help for the situation that is associated with that state. Context sensitive help, as opposed to general online help or online manuals, doesn… …   Wikipedia

  • Shape of the Universe — Edge of the Universe redirects here. For the Bee Gees song, see Edge of the Universe (song). The local geometry of the universe is determined by whether Omega is less than, equal to or greater than 1. From top to bottom: a spherical universe, a… …   Wikipedia

  • Formative Context — is an important theory developed by Roberto Unger. Unger is a Political Scientist but the theory has been heavily drawn on and used within the Social Study of Information Systems.In the field of Information Systems Claudio Ciborra and Giovan… …   Wikipedia

  • Scale-invariant feature transform — Feature detection Output of a typical corner detection algorithm …   Wikipedia

  • Histogram of oriented gradients — Histogram of Oriented Gradient descriptors, or HOG descriptors, are feature descriptors used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized… …   Wikipedia

  • au̯(e)-9, au̯ed-, au̯er- (*aku̯ent- : aḫu̯ent-) —     au̯(e) 9, au̯ed , au̯er (*aku̯ent : aḫu̯ent )     English meaning: to flow, to wet; water, etc.     Deutsche Übersetzung: “benetzen, befeuchten, fließen”     Note: From Root angʷ(h)i : ‘snake, worm” derived Root akʷü (more properly ǝkʷü ):… …   Proto-Indo-European etymological dictionary

  • Scale-invariant feature transform — Exemple de résultat de la comparaison de deux images par la méthode SIFT (Fantasia ou Jeu de la poudre, devant la porte d’entrée de la ville de Méquinez, par Eug …   Wikipédia en Français

  • Ekolid — context In the Dungeons Dragons fantasy role playing game, the ekolid is a type of demon.Ekolids belong to the ancient race of demon called obyrith. Being obiryths, ekolids have monstrous forms, which can drive mad anyone who dares look at them.… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”