Kadir Brady saliency detector

Kadir Brady saliency detector

The Kadir Brady saliency detector is designed to detect representative region in images. It performs well in the context of object class recognition. It was invented by http://www.robots.ox.ac.uk/~timork/ Timor Kadir] and [http://www.robots.ox.ac.uk/~jmb/home.html Michael Brady] in 2001 [ Scale, Saliency and Image Description. Timor Kadir and Michael Brady. International Journal of Computer Vision. 45 (2):83-105, November 2001] .Later in 2004, the affine invariant version was invented by Kadir, T., [http://www.robots.ox.ac.uk/~az/ Zisserman, A.] and M. Brady. [Kadir, T., Zisserman, A. and Brady, M. An affine invariant salient region detector. Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic (2004) ] Harv|A. Baumberg|2000.

Introduction

Many computer vision and image processing applications work directly with the features extracted from an image, rather than the raw image, for example, for computing image correspondences [2, 17, 19,20, 22] , or for learning object categories [1, 3, 4, 23] . Depending on the applications, different characteristics are preferred. However, there are three broad classes of image change under which good performance may be required:

#Global transformation: Features should be repeatable across the expected class of global image transformations. These include both geometric and photometric transformations that arise due to changes in the imaging conditions. For example, region detection should be covariant with viewpoint as illustrated in Figure 1. In short, we require the segmentation to commute with viewpoint change. This property will be evaluated on the repeatability and accuracy of localization and region estimation.
#Local perturbations: Features should be insensitive to classes of semi-local image disturbances. For example, a feature responding to the eye of a human face should be unaffected by any motion of the mouth. A second class of disturbance is where a region neighbours a foreground/background boundary. The detector can be required to detect the foreground region despite changes in the background.
#Intra-class variations: Features should capture corresponding object parts under intra-class variations in objects. For example, the headlight of a car for different brands of car (imaged from the same viewpoint).

All Feature detection algorithms are trying to detect regions which is stable under three types of image change described above. Instead of finding corner [7 21] , or blob [17 22] , or any specific shape of regions, Kadir brady saliency detector looks for regions which are locally complex, and globally discriminative.Such regions usually corresponds to region more stable under these types of image change.

Information theoretic saliency

In the field of Information theory, Shannon entropy is defined to quantify the complexity of a distribution "p" as p log p ,. Therefore, higher entropy means "p" is more complex, hence more unpredictable.

To measure the complexity of an image region {x,R} around point x with shape R,a descriptor D that takes on values {d_1 , . . . , d_r }(e.g. in an 8 bit grey level image, D would range from 0 to 255 for each pixel) is definedso that P_{D}(d_i,x,R), the probability of descriptor value d_ioccurs in region {x,R} can be computed.Further, the entropy of image region R_x can compute as: H_{D}(x,R) = -sum_{i in (1...r)} P_{D}(d_i,x,R) log P_{D}(d_i,x,R). Using this entropy equation we can further calculate H_{D}(x,R) for every point xand region shape R. As shown in Figure 2 (a), more complex region, like eye region, has more complex distribtion, hence higher entropy.

H_{D}(x,R) is a good measure for local complexity. However, entropy only measures the statistic of local attribute. It doesn't measure the spatial arrangement of the local attribute.For example, figure 2 (b) shows three permutations of pixels of the eye region which have the same entropy asthe eye region. However, these four regions are not equally discriminative under scale change as shown in figure 2 (b). This observation is used to define measure on discriminative in subsections.

The following subsections will discuss different methods to select regions with high local complexity and more discriminative across different region.

imilarity invariant saliency

The first version of Kadir brady saliency detector [10] only finds Salient regionsinvariant under similarity transformation. The algorithm finds circle regions with different scales.In other words, given H_{D}(x,s), where s is the scale parameter of a circle region R,the algorithm selects a set of circle region,{x_i,s_i;i=1...N}.

The method consists of three steps:
*Calculation of Shannon entropy of local image attributes for each x over a range of scales — H_{D}(x,s) = -sum_{i in (1...r)} P_{D}(d_i,x,s) log P_{D}(d_i,x,s);
*Select scales at which the entropy over scale function exhibits a peak — s_p ;
*Calculate the magnitude change of the PDF as a function of scale at each peak — W_D(x,s) = sum_{i in (1...r)} |frac{part}{part s}P_{D,}(d_i,x,s)| (s).The final saliency Y_D(x,s_p) is the product of H_D(x,s_p) and W_D(x,s_p).

For each x the method picks a scale s_p and calculates salient score Y_D(x,s_p).By comparing Y_D(x,s_p) of different points x the detector can rankthe saliency of points and pick the most representative ones. For example, in Figure 3, blue circle region has higher saliency than red circle.

Affine invariant saliency

Previous method is invariant to the similarity group of geometric transformations and to photometric shifts. However, as mentioned in the opening remarks,the ideal detector should detect region invariant up to viewpoint change.There are several detector [] can detect affine invariant region which is a better approximation of viewpoint change than similarity transformation.

To detect affine invariant region, the detector need to detect ellipse as in figure 4.R now is parameterized by three parameter (s, "ρ", "θ"),where "ρ" is the axis ratio and "θ" the orientation of the ellipse.

This modification increases the search space of the previous algorithm from a scale to a set of parameters.Therefore the complexity of the affine invariant saliency detector increases.In practice the affine invariant saliency detector starts with the set of points and scales generate from the Similarity invariant saliency detector, then iteratively approximates the suboptimal parameters.

Comparison

Although similarity invariant saliency detector is faster than Affine invariant saliency detector,it also has the drawback of favoring isotropic structure since the discriminative measure W_D is measured over isotropic scale.To summarize, Affine invariant saliency detector is invariant to affine transformation and able to detect more generate salient regions.

Figure 5 and 6 shows the comparison image output from both Similarity invariant saliency detector and invariant saliency detector.

alient Volumn

It is intuitive to pick points from higher salient score directly and stop when a certain number of threshold on number of points or salient score is satisfied. Natural images contain noise and motion blur, both of them act as randomisers and generally increase entropy, affecting previously low entropy values more than high entropy values.

A more robust method would be to pick regions rather than points in entropy space. Although the individual pixels within a salient region may be affected at any given instant by the noise, it is unlikely to affect all of them in such a way that the region as a whole becomes non-salient.

It is also necessary to analyze the whole saliency space such that each salient feature is represented. A global threshold approach would result in highly salient features in one part of theimage dominating the rest. A local threshold approach would require the setting of another scale parameter.

A simple clustering algorithm meets these two requirements are used at the end of the algorithm. It works by selecting highly salient points that have local support - that is, nearby points with similar saliency and scale. Each region must be sufficiently distant from all others (in R3 ) to qualify as a separate entity. For robustness, we use a representation that includes all of the points in a selected region.The method works as follows:
#Apply a global threshold.
#Choose the highest salient point in saliency-space (Y).
#Find the K nearest neighbours (K is a pre-set constant).
#Test the support of these using variance of the centre points.
#Find distance, D, in R3 from salient regions already clustered.
#Accept, if D > scalemean of the region and if sufficiently clustered (variance is less than pre-set threshold Vth ).
#Store as the mean scale and spatial location of K points.
#Repeat from step 2 with next highest salient point.The algorithm is implement as GreedyCluster1.m in matlab by Dr. Timor Kadir which can be download [http://www.robots.ox.ac.uk/~timork/Saliency/AffineScaleSaliency_Public_linux_V1.0.tgz#ScaleSaliency_Public_linux here]

Performance evaluation

In the field ofcomputer vision, different feature detectors have been evaluated by several tests.The most profound evaluation is published on International Journal of Computer Vision in 2006 [A comparison of affine region detectors. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool. International Journal of Computer Vision] . The following subsection discuss the performance of Kadir brady saliency detector on a subset of test in the paper.

Performance under global transformation

In order to measure the consistency of region detected on the same object or scene across images under global transformation, repeatability score, which is first proposed by [http://personal.ee.surrey.ac.uk/Personal/K.Mikolajczyk/Krystian Mikolajczyk] and [http://lear.inrialpes.fr/people/schmid/ Cordelia Schmid] in [18, 19] , is calculated as follow.

Firstly, overlap error epsilon of a pairof corresponding ellipses mu_a and mu_b each on different images is defined

epsilon = 1 - frac{mu_a cap (A^T mu_b A)}{mu_a cup (A^T mu_b A)}

where A is the locally linearized affine transformation of the homography between the two images,and mu_a cap (A^T mu_b A) and mu_a cup (A^T mu_b A)represent the area of intersection and union of the ellipses respectively. Notice mu_a is scaled into a fix scale to take the count ofsize variation of different detected region. Only if epsilon is smaller than certain epsilon_0, the pair of ellipses are deemed to correspond.

Then the repeatability score for a given pair of images is computed as the ratio between the number of region-to-region correspondences and the smaller of the number of regions in the pair of images, where only the regions located in the part of the scene present in both images are counted.In general we would like a detector to have a high repeatability score and a large number of correspondences.

The specific global transformations tested in the [http://www.robots.ox.ac.uk/~vgg/research/affine/index.html test dataset] are:
*Viewpoint change
*Zoom+rotation
*Image blur
*JPEG compression
*Light change

The performance of Kadir Brady saliency detector is inferior to most of other detectors mainly because the number of points detected is usually lower than other detectors.

The precise procedure is given in the Matlab code from Detector evaluation
#Software implementation.

Performance under intra-class variation and image perturbations

In the task of object class categorization, the ability of detecting similar regionsgiven intra-class variation and image perturbations across object instanceis very critical. In [cite] , Repeatability measure over intra-class variation and image perturbationsis proposed. The following subsection will introduce the definition and discuss the performance.

Intra-class variation test

Suppose there are a set of images of the same object class, e.g. motorbikes. A regiondetection operator which is unaffected by intra-class variation will reliably selectregions on corresponding parts of all the objects, say the wheels, engine or seatfor motorbikes.

Repeatability over intra-class variation is measuring the (average) numberof correct correspondences over the set of images, where the correct correspondences is established by manual selection.

A region is matched if it fulfils three requirements:
*Its position matches within 10 pixels.
*Its scale is within 20%.
*Normalised mutual information between the appearances is > 0.4.

In detail the average correspondence score S is measured as follows.

N regions are detected on each image of the M images in the dataset. Then for a particularreference image i the correspondence score S_i is given by the proportion ofcorresponding to detected regions for all the other images in the dataset, i.e.: Si = frac{Total number of matches}{Total number of detected regions}=frac{N_{M}^{i{N (M-1)}

The scoreS_i is computed for M/2 different selections of the reference image,and averaged to give S. The score is evaluated as a function of the number ofdetected regions N .

Kadir brady saliency detector gives highest score across three test class, which are motorbike, car, and face.As illustrate in figure [] , saliency detector indicates that most detections are near the object. In contrast, other detectors maps show a much more diffuse pattern over the entire area caused by poor localisation and false responses to background clutter.

Image perturbations test

In order to test insensitivity to image perturbation the data set is split intotwo parts: the first contains images with a uniform background and the second,images with varying degrees of background clutter. If the detector is robust tobackground clutter then the average correspondence score S should be similarfor both subsets of images.

In this test saliency detector also outperforms other detectors duo to three reasons:
*Several detection methods blur the image, hence causing a greater degree of similarity betweenobjects and background.
*In most images the objects of interest tend to be in focus while backgrounds are out of focus and hence blurred. Blurred regions tend to exhibit slowly varying statistics which result in a relatively low entropy and inter-scale saliency in the saliency detector.
*Other detectors define saliency with respect to specific properties of the local surface geometry. In contrast, the saliency detector uses a much broader definition.

Conclusion

Saliency detector is most useful in the task of object recognition, whereas several other detector are more useful in the task of computing image correspondences.However, in the task of 3D object recognition which all three type of image change are combined,Saliency detector might still be powerful as mentioned in [xx] .

Software implementation

* [http://www.robots.ox.ac.uk/~timork/salscale.html Scale Saliency and Scale Descriptors and download Scale Saliency binaries]
* [http://www.robots.ox.ac.uk/~timork/Saliency/AffineInvariantSaliency.html Affine Invariant Scale Saliency and download Affine Invariant Scale Saliency binaries]
* [http://www.robots.ox.ac.uk/~vgg/research/affine/evaluation.html Detector evaluation]

References


* cite conference
author=A. Baumberg
title=Reliable feature matching across widely separated views
booktitle=Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
pages=pages I:1774--1781
year=2000
url=http://citeseer.ist.psu.edu/baumberg00reliable.html
note = First reference on the multi-scale Harris operator

* cite conference
author=J. Matas, O. Chum, M. Urban, and T. Pajdla
title=Robust wide baseline stereo from maximally stable extremal regions
booktitle=Proceedings of British Machine Vision Conference
pages=pages 384–393
year=2002

* cite conference
author=K. Mikolajczyk and C. Schmid
title=An affine invariant interest point detector
booktitle=Proceedings of European Conference on Computer Vision
year=2002

* cite conference
author=F. Schaffalitzky and A. Zisserman
title=Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”
booktitle=Proceedings of European Conference on Computer Vision
year=2002
pages=414–431

* cite conference
author= T. Tuytelaars and L. Van Gool
title=Wide baseline stereo based on local, affinely invariant regions
booktitle=Proceedings of British Machine Vision Conference
year=2000
pages=412–422

* cite conference
author=S. Agarwal and D. Roth
title=Learning a sparse representation for object detection
booktitle=Proceedings of European Conference on Computer Vision
year=2002
pages=113–130

* cite conference
author= E. Borenstein and S. Ullman
title=Class-specific, top-down segmentation
booktitle=Proceedings of European Conference on Computer Vision
year=2002
pages=109–124

* cite conference
author= R. Fergus, P. Perona, and A. Zisserman
title=Object class recognition by unsupervised scale-invariant learning
booktitle=Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
year=2003
pages=II:264–271

* cite conference
author=M. Weber, M. Welling, and P. Perona
title=Unsupervised learning of models for recognition
booktitle=Proceedings of European Conference on Computer Vision
date=June 20002

External links

ee also


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Hessian Affine region detector — The Hessian Affine region detector is a feature detector used in the fields of computer vision and image analysis. Like other feature detectors, the Hessian Affine detector is typically used as a preprocessing step to algorithms that rely on… …   Wikipedia

  • Harris affine region detector — In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or …   Wikipedia

  • One-shot learning — is an object categorization problem of current research interest in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one shot… …   Wikipedia

  • Bag of words model in computer vision — This is an article introducing the Bag of words model (BoW) in computer vision, especially for object categorization. From now, the BoW model refers to the BoW model in computer vision unless explicitly declared.Before introducing the BoW model,… …   Wikipedia

  • List of computer vision topics — This is a list of computer vision and image processing topics Contents 1 Image enhancement 2 Transformations 3 Filtering, Fourier and wavelet transforms and image compression …   Wikipedia

  • Object categorization from image search — In computer vision, the problem of object categorization from image search is the problem of training a classifier to recognize categories of objects, using only the images retrieved automatically with an Internet search engine. Ideally,… …   Wikipedia

  • Constellation model — The constellation model is a probabilistic, generative model for category level object recognition in computer vision. Like other part based models, the constellation model attempts to represent an object class by a set of N parts under mutual… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”