Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

There are a number of fundamental differences between texture descriptors and keypoint descriptors. And those differences make them fit for very different tasks - usually recognition tasks that are performed well by one method fail with the other and vice-versa.

Here are some highlights from the two paradigms:

Texture descriptors

Examples: HOG, LBP, Haar.

How they work: Given a texture area, they uniformly process all of it to extract a very high number of parameters to describe it. Those parameters are usually very similar, if not identical, for each patch of the input area. they are usually low-quality by themselves, and only their high numbers makes them useful for a classification task. However, no human would be able to correlate all the data to extract anything useful, or to build a distance function, or a threshold on what is "similar" or "not similar". One would have to manually select and set thousands/tens of thousands of thresholds. This is why texture descriptors are usually processed by some data-mining algorithms, like SVD, ADA Boost, etc. Their role is, given a number of positive samples, and a number of negative samples, to find a threshold between the two classed that best separates them. Some algorithms (boosting) even try to select those features that seem to be more useful for classification and reject those that do not add useful info.

Texture descriptors work well with objects that are predefined and can easily be described by a fixed area in a picture: a car, a face, a letter, a sign.

Keypoint descriptors

Examples: SIFT/SURF/FREAK/ORB/ASIFT/AGAST/ a thousand others...

How they work:

Given a random image, they compute for each pixel a function of "interestingness" - that is a measure of how probable is to recognize that point in another image - is it really a unique point? for this, they analyse the area around the pixel and calculate all kinds of crazy statistics.

After all this is done, the algorithm selects a number of the most interesting ones and calculates for them the same (or another) set of parameters, to be used as description. (Like a name, or a hash function describing each point).

The final step of the algorithm is to match two or more images to see whether they contain the same information (the same keypoints) - is the San Francisco Bridge in the two pics? Are they superimposing? Note that, unlike the texture algorithms, keypoints make no assumption about the area of an object - they do not need and do not output contour information, or location information about an object, but about automatically-selected points.

This makes them very useful to process random pics and to measure similarity between them. It is of little use if you want to detect cats, but it is great to catch someone who posted pics with your filthy room on the internet.

Unlike the texture descriptor, here you can easily set and understand output parameters - you can accept a match between two pics if a number of keypoints are matched and a homography can be determined between them. That is two thresholds to write by hand, instead of thousands if you use HOG.

There are a number of fundamental differences between texture descriptors and keypoint descriptors. And those differences make them fit for very different tasks - usually recognition tasks that are performed well by one method fail with the other and vice-versa.

Here are some highlights from the two paradigms:

Texture descriptors

Examples: HOG, LBP, Haar.

How they work: Given a texture area, they uniformly process all of it to extract a very high number of parameters to describe it. Those parameters are usually very similar, if not identical, for each patch of the input area. they are usually low-quality by themselves, and only their high numbers makes them useful for a classification task. However, no human would be able to correlate all the data to extract anything useful, or to build a distance function, or a threshold on what is "similar" or "not similar". One would have to manually select and set thousands/tens of thousands of thresholds. This is why texture descriptors are usually processed by some data-mining algorithms, like SVD, ADA Boost, etc. Their role is, given a number of positive samples, and a number of negative samples, to find a threshold between the two classed that best separates them. Some algorithms (boosting) even try to select those features that seem to be more useful for classification and reject those that do not add useful info.

Texture descriptors work well with objects that are predefined and can easily be described by a fixed area in a picture: a car, a face, a letter, a sign.

Keypoint descriptors

Examples: SIFT/SURF/FREAK/ORB/ASIFT/AGAST/ a thousand others...

How they work:

Given a random image, they compute for each pixel a function of "interestingness" - that is a measure of how probable is to likely is to uniquely recognize that point in another image - is it really a unique point? for image. For this, they analyse the area around the pixel and calculate all kinds of crazy statistics.

After all this is done, the algorithm selects a number of the most interesting ones and calculates for them the same (or another) set of parameters, to be used as description. (Like a name, or a hash function describing each point).

The final step of the algorithm is to match two or more images to see whether they contain the same information (the same keypoints) - is the San Francisco Bridge in the two pics? Are they superimposing? Note that, unlike the texture algorithms, keypoints make no assumption about the area of an object - they do not need and do not output contour information, or location information about an object, but about automatically-selected points.

This makes them very useful to process random pics and to measure similarity between them. It is of little use if you want to detect cats, but it is great to catch someone who posted pics with your filthy room on the internet.

Unlike the texture descriptor, here you can easily set and understand output parameters - you can accept a match between two pics if a number of keypoints are matched and a homography can be determined between them. That is two thresholds to write by hand, instead of thousands if you use HOG.