Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Relation between BOWImageDescriptorExtractor descriptors and trained vocab?

I'm working on an application for searching images in a digital archive that follows the methodology outlined by http://www.cs.ucsb.edu/~mturk/pubs/JangTurkWACV2011.pdf, in which they describe a system of extracting SURF features, clustering these to a Bag of Words vocabulary using a BOWKMeansTrainer, and then generating a visual words document for each image which is indexed by Lucene for subsequent searching. (This is, obviously, the simplified explanation.)

In trying to apply this approach, I'm running into a bit of a hick-up with regards to the results that I'm getting back from “BOWImageDescriptorExtractor.compute” when I use it to get descriptors for each image based upon a trained vocabulary.

According to the documentation (http://docs.opencv.org/modules/features2d/doc/object_categorization.html) this call should “Compute the bag-of-words image descriptor as is a normalized histogram of vocabulary words encountered in the image. The i-th bin of the histogram is a frequency of i-th word of the vocabulary in the given image.”

When I generate descriptors for an image using BOWImageDescriptorExtractor.compute, the returned histogram appears to be correct, but all of the frequencies that are returned are either a “0” (to be expected) or floats with a negative exponent 0.638006e-10 (not so expected). In the case of the example just given, assuming this result came from bin 3, for example, this would mean, if I'm understanding the documentation correctly, that there are 0.000765 occurrences of word/bin 3 in the image, which isn't really possible.

I'v checked multiple ways (writing the histogram out to YAML, as well as breaking the float into significand and exponent using “significand = frexp(histogram[i], &exp);”) and the returned exponent to the float is, in fact, always negative for the frequency count in every bin that has a non zero value of the returned descriptor Mat.

Obviously, I'm missing something in understanding exactly how the number in each bin of the descriptors histogram actually relates to the BOW vocabulary set. I've been experimenting for a couple of days with various ways of interpreting it, but just can't make any sense of it. I'm hoping someone here might be able to tell me exactly how the number in “bin 3” of the descriptor histogram relates to “word 3” of the clustered BOW vocabulary.

I've searched all the openCV forums and don't believe this is a duplicate question.

Thank you,

Carl

click to hide/show revision 2
fixed math error in my conversion of float scientific notation

Relation between BOWImageDescriptorExtractor descriptors and trained vocab?

I'm working on an application for searching images in a digital archive that follows the methodology outlined by http://www.cs.ucsb.edu/~mturk/pubs/JangTurkWACV2011.pdf, in which they describe a system of extracting SURF features, clustering these to a Bag of Words vocabulary using a BOWKMeansTrainer, and then generating a visual words document for each image which is indexed by Lucene for subsequent searching. (This is, obviously, the simplified explanation.)

In trying to apply this approach, I'm running into a bit of a hick-up with regards to the results that I'm getting back from “BOWImageDescriptorExtractor.compute” when I use it to get descriptors for each image based upon a trained vocabulary.

According to the documentation (http://docs.opencv.org/modules/features2d/doc/object_categorization.html) this call should “Compute the bag-of-words image descriptor as is a normalized histogram of vocabulary words encountered in the image. The i-th bin of the histogram is a frequency of i-th word of the vocabulary in the given image.”

When I generate descriptors for an image using BOWImageDescriptorExtractor.compute, the returned histogram appears to be correct, but all of the frequencies that are returned are either a “0” (to be expected) or floats with a negative exponent 0.638006e-10 (not so expected). In the case of the example just given, assuming this result came from bin 3, for example, this would mean, if I'm understanding the documentation correctly, that there are 0.000765 0.000000000638 occurrences of word/bin 3 in the image, which isn't really possible.

I'v checked multiple ways (writing the histogram out to YAML, as well as breaking the float into significand and exponent using “significand = frexp(histogram[i], &exp);”) and the returned exponent to the float is, in fact, always negative for the frequency count in every bin that has a non zero value of the returned descriptor Mat.

Obviously, I'm missing something in understanding exactly how the number in each bin of the descriptors histogram actually relates to the BOW vocabulary set. I've been experimenting for a couple of days with various ways of interpreting it, but just can't make any sense of it. I'm hoping someone here might be able to tell me exactly how the number in “bin 3” of the descriptor histogram relates to “word 3” of the clustered BOW vocabulary.

I've searched all the openCV forums and don't believe this is a duplicate question.

Thank you,

Carl