Ask Your Question
1

Relation between BOWImageDescriptorExtractor descriptors and trained vocab?

asked 2012-08-29 11:28:07 -0600

cstahmer gravatar image

updated 2012-08-29 13:02:32 -0600

I'm working on an application for searching images in a digital archive that follows the methodology outlined by http://www.cs.ucsb.edu/~mturk/pubs/JangTurkWACV2011.pdf, in which they describe a system of extracting SURF features, clustering these to a Bag of Words vocabulary using a BOWKMeansTrainer, and then generating a visual words document for each image which is indexed by Lucene for subsequent searching. (This is, obviously, the simplified explanation.)

In trying to apply this approach, I'm running into a bit of a hick-up with regards to the results that I'm getting back from “BOWImageDescriptorExtractor.compute” when I use it to get descriptors for each image based upon a trained vocabulary.

According to the documentation (http://docs.opencv.org/modules/features2d/doc/object_categorization.html) this call should “Compute the bag-of-words image descriptor as is a normalized histogram of vocabulary words encountered in the image. The i-th bin of the histogram is a frequency of i-th word of the vocabulary in the given image.”

When I generate descriptors for an image using BOWImageDescriptorExtractor.compute, the returned histogram appears to be correct, but all of the frequencies that are returned are either a “0” (to be expected) or floats with a negative exponent 0.638006e-10 (not so expected). In the case of the example just given, assuming this result came from bin 3, for example, this would mean, if I'm understanding the documentation correctly, that there are 0.000000000638 occurrences of word/bin 3 in the image, which isn't really possible.

I'v checked multiple ways (writing the histogram out to YAML, as well as breaking the float into significand and exponent using “significand = frexp(histogram[i], &exp);”) and the returned exponent to the float is, in fact, always negative for the frequency count in every bin that has a non zero value of the returned descriptor Mat.

Obviously, I'm missing something in understanding exactly how the number in each bin of the descriptors histogram actually relates to the BOW vocabulary set. I've been experimenting for a couple of days with various ways of interpreting it, but just can't make any sense of it. I'm hoping someone here might be able to tell me exactly how the number in “bin 3” of the descriptor histogram relates to “word 3” of the clustered BOW vocabulary.

I've searched all the openCV forums and don't believe this is a duplicate question.

Thank you,

Carl

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
0

answered 2012-12-29 12:43:14 -0600

wahoo_wa gravatar image

BOWImgDescriptorExtractor

List item #3 from the above link reads:

Compute the bag-of-words image descriptor as is a normalized histogram of vocabulary words encountered in the image. The i-th bin of the histogram is a frequency of i-th word of the vocabulary in the given image.

The BOWImageDescriptorExtractor.compute() method returns a normalized histogram of the occurrences of each vocabulary word.

This means that if there are 10 words in the image, with 5 occurrences of word 0 and 5 occurrences of word 2, then the 0th element in the histogram would possess 5/10, or .5, as well as the element at index 2. All other elements would hold a value of zero.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2012-08-29 11:28:07 -0600

Seen: 707 times

Last updated: Dec 29 '12