- Histogram of oriented gradients
Histogram of Oriented Gradient descriptors, or HOG descriptors, are feature descriptors used in
computer vision andimage processing for the purpose ofobject detection . The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that ofedge orientation histograms ,scale-invariant feature transform descriptors, andshape context s, but differs in that it computes on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved performance.Navneet Dalal andBill Triggs , researchers for theFrench National Institute for Research in Computer Science and Control (INRIA ), first described Histogram of Oriented Gradient descriptors in their June 2005 paper to theInternational Conference on Computer Vision and Pattern Recognition . In this work they focused their algorithm on the problem of pedestrian detection in static images, although since then they expanded their tests to include human detection in film and video, as well as to a variety of common animals and vehicles in static imagery.Theory
The essential thought behind the Histogram of Oriented Gradient descriptors is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. The implementation of these descriptors can be achieved by dividing the image into small connected regions, called cells, and for each cell compiling a histogram of gradient directions or edge orientations for the pixels within the cell. The combination of these histograms then represents the descriptor. For improved performance, the local histograms can be contrast-normalized by calculating a measure of the intensity across a larger region of the image, called a block, and then using this value to normalize all cells within the block. This normalization results in better invariance to changes in illumination or shadowing.
The Histogram of Oriented Gradients descriptor maintains a few key advantages over other descriptor methods. Since the Histogram of Oriented Gradients descriptor operates on localized cells, the method upholds invariance to geometric and photometric transformations; such changes would only appear in larger spatial regions. Moreover, as Dalal and Triggs observe, coarse spatial sampling, fine orientation sampling, and strong local photometric normalization permits the individual body movement of pedestrians to be ignored so long as they maintain a roughly upright position. The HOG descriptor is thus particularly suited for human detection in images. [] in conjunction with their HOG descriptors to find human figures in test images.
Testing
In their original human detection experiment, Dalal and Triggs compared their R-HOG and C-HOG descriptor blocks against
Generalized Haar Wavelets ,PCA-SIFT descriptors, and Shape Contexts. Generalized Haar Wavelets are oriented Haar wavelets, and were used in 2001 by Mohan, Papageorgiou, and Poggio in their own object detection experiments. PCA-SIFT descriptors are similar to SIFT descriptors, but differ in thatPrincipal Component Analysis is applied to the normalized gradient patches. PCA-SIFT descriptors were first used in 2004 by Ke and Sukthankar and were claimed to outperform regular SIFT descriptors. Finally, Shape Contexts use circular bins, similar to those used in C-HOG blocks, but only tabulate votes on the basis of edge presence, making no distinction with regards to orientation. Shape Contexts were originally used in 2001 by Belongie, Malik, and Puzicha.The testing commenced on two different data sets.
The Massachusetts Institute of Technology pedestrian database contains 509 training images and 200 test images of pedestrians on city streets. The set only contains images featuring the front or back of human figures and contains little variety in human pose. The set is well-known and has been used in a variety of human detection experiments, such as those conducted by Papageorgiou and Poggio in 2000. The MIT database is currently available for research at http://cbcl.mit.edu/cbcl/software-datasets/PedestrianData.html. The second set was developed by Dalal and Triggs exclusively for their human detection experiment due to the fact that the HOG descriptors performed near-perfectly on the MIT set. Their set, known as INRIA, contains 1805 images of humans taken from personal photographs. The set contains images of humans in a wide variety of poses and includes difficult backgrounds, such as crowd scenes, thus rendering it more complex than the MIT set. The IRNIA database is currently available for research at http://lear.inrialpes.fr/data.The above site has an image showing examples from the INRIA human detection database.
As for the results, the C-HOG and R-HOG block descriptors perform comparatively, with the C-HOG descriptors maintaining a slight advantage in the detection miss rate at fixed false positive rates across both data sets. On the MIT set, the C-HOG and R-HOG descriptors produced a detection miss rate of essentially zero at a false positive rate. On the INRIA set, the C-HOG and R-HOG descriptors produced a detection miss rate of roughly 0.1 at a false positive rate. The Generalized Haar Wavelets represent the next highest performing approach: the wavelets produced roughly a 0.01 miss rate at a false positive rate on the MIT set, and roughly a 0.3 miss rate on the INRIA set. The PCA-SIFT descriptors and Shape Contexts both performed fairly poorly on both data sets. Both methods produced a miss rate of 0.1 at a false positive rate on the MIT set and nearly a miss rate of 0.5 at a false positive rate on the INRIA set. The image below contains the result data from the original Dalal and Triggs experiment. The curves represent the
Detection Error Tradeoff on a log-log scale, which equates to the miss rate versus the false positive rate. [cite web |url= http://www.acemedia.org/aceMedia/files/document/wp7/2005/cvpr05-inria.pdf | title=Histograms of Oriented Gradients for Human Detection, pg. 4 ]Further Development
As part of the
Pascal Visual Object Classes 2006 Workshop, Dalal and Triggs presented results on applying Histogram of Oriented Gradient descriptors to image objects other than human beings, such as cars, buses, and bicycles, as well as common animals such as dogs, cats, and cows. They included with their results the optimal parameters for block formulation and normalization in each case. The image in the below reference shows some of their detection examples for motorbikes. [cite web |url=http://www.pascal-network.org/challenges/VOC/voc2006/slides/dalal.pdf | title=Object Detection using Histograms of Oriented Gradients ]Then as part of the 2006
European Conference on Computer Vision , Dalal and Triggs teamed up withCordelia Schmid to apply Histogram of Oriented Gradient detectors to the problem of human detection in films and videos. Essentially their technique involves the combination of regular HOG descriptors on individual video frames with new Internal Motion Histograms (IMH) on pairs of subsequent video frames. These Internal Motion Histograms use the gradient magnitudes from optical flow fields obtained from two consecutive frames. These gradient magnitudes are then used in the same manner as those produced from static image data within the HOG descriptor approach. When testing on two large datasets taken from several movie DVDs, the combined HOG-IMH method yielded a miss rate of approximately 0.1 at a false positive rate. [cite web |url=http://www.acemedia.org/aceMedia/files/document/wp7/2006/eccv06-inria.pdf | title=Human Detection Using Oriented Histograms of Flow and Appearance ]At the
Intelligent Vehicles Symposium in 2006,F. Suard ,A. Rakotomamonjy , andA. Bensrhair introduced a complete system for pedestrian detection based on HOG descriptors. Their system operates using two infrared cameras. Since human beings appear brighter than their surroundings on infrared images, the system first locates positions of interest within the larger view field where humans could possibly be located. Then normal Support Vector Machine classifiers operate on the HOG descriptors taken from these smaller positions of interest to formulate a decision regarding the presence of a pedestrian. Once pedestrians are located within the view field, the actual position of the pedestrian is estimated using stereovision. [cite web |url=http://www.ce.unipr.it/people/broggi/publications/iv2006-pd-aziz.pdf | title=Pedestrian Detection using Infrared images and Histograms of Oriented Gradients ]At the
IEEE Conference on Computer Vision and Pattern Recognition in 2006,Qiang Zhu ,Shai Avidan ,Mei-Chen Yeh , andKwang-Ting Cheng presented an algorithm to significantly speed up human detection using HOG descriptor methods. Their method uses HOG descriptors in combination with thecascade of rejecters algorithm normally applied with great success to the problem of face detection. Also, rather than relying on blocks of uniform size, they introduce blocks that vary in size, location, and aspect ratio. In order to isolate the blocks best suited for human detection, they applied theAdaBoost algorithm to select those blocks to be included in the rejecter cascade. In their experimentation, their algorithm achieved comparable performance to the original Dalal and Triggs algorithm, but operated at speeds up to 70 times faster. In April 2006, the Mitsubishi Electric Research Laboratories applied for the U.S. Patent of this algorithm under application number 20070237387. [cite web |url= http://shaiavidan.org/papers/IntegralHoG.pdf | title=Fast Human Detection Using a Cascade of Histograms of Oriented Gradients ]References
External links
* http://www.cs.cmu.edu/~yke/pcasift/ - Code for PCA-SIFT Object Detection
* http://ralyx.inria.fr/2006/Raweb/lear/uid30.html - Software Toolkit for HOG Object Detection
* http://pascal.inrialpes.fr/data/human/ - INRIA Human Image Dataset
* http://cbcl.mit.edu/software-datasets/PedestrianData.html - MIT Pedestrian Image Datasetee also
*
Feature (computer vision)
*Feature detection (computer vision)
*Feature extraction
*Interest point detection
*Corner detection
*Scale-Invariant Feature Transform
Wikimedia Foundation. 2010.