Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Question about Bag of Words, detectors, and such

hello all,

I am basically looking to do a bag of words type setup where I comepare a picture taken, with items in the database.

Basically am example of what I am doing is taking a picture of a bookshelf, and identifying the books on it. So 1 picture could contain 50 "vocabulary" items.

Basically I am curious about which "keypoint detectors" "feature descriptors" and "matchers" I will need.

It seems there are so many choices and I don't know which would be better for what.

I would like to use something other than SURF or SIFT because I hear you need a license and to get that requires a good bit of money.

I have heard good things about FREAK, BRISK, and ORB, but those are only descriptors right? I would still need a keypoint detector and matcher? ( I thought I also heard that some descriptors are also detectors or...?)

I think that one of the more important things would be scale invariance as the picture I have might not be the size of the picture I'm taking within the bookshelf.

I don't think that rotation is that big a deal.

I'm not sure what else I should ask about these but if anyone has any input to help me on my path I would greatly appreciate it...

As for BoW itself I hear you basically have your vocab of keypoints, then you compare them to the keypoints in the image, and then do a histogram compare?

I also believe I heard something about training classifiers? Why exactly would we need to do that? To identify the items within the whole picture? like a bottle compared to a box?

I think that's all, thanks again to anyone who can help,

~KZ

click to hide/show revision 2
retagged

updated 2014-03-01 02:39:21 -0600

berak gravatar image

Question about Bag of Words, detectors, and such

hello all,

I am basically looking to do a bag of words type setup where I comepare a picture taken, with items in the database.

Basically am example of what I am doing is taking a picture of a bookshelf, and identifying the books on it. So 1 picture could contain 50 "vocabulary" items.

Basically I am curious about which "keypoint detectors" "feature descriptors" and "matchers" I will need.

It seems there are so many choices and I don't know which would be better for what.

I would like to use something other than SURF or SIFT because I hear you need a license and to get that requires a good bit of money.

I have heard good things about FREAK, BRISK, and ORB, but those are only descriptors right? I would still need a keypoint detector and matcher? ( I thought I also heard that some descriptors are also detectors or...?)

I think that one of the more important things would be scale invariance as the picture I have might not be the size of the picture I'm taking within the bookshelf.

I don't think that rotation is that big a deal.

I'm not sure what else I should ask about these but if anyone has any input to help me on my path I would greatly appreciate it...

As for BoW itself I hear you basically have your vocab of keypoints, then you compare them to the keypoints in the image, and then do a histogram compare?

I also believe I heard something about training classifiers? Why exactly would we need to do that? To identify the items within the whole picture? like a bottle compared to a box?

I think that's all, thanks again to anyone who can help,

~KZ