Ask Your Question

On what factors does "Bag of Words" training efficiency depends?

asked 2014-12-01 23:36:21 -0600

noobproof gravatar image

Hello all,

Is it better to train a classifier on "Low Res" images than "High res" images? I am asking this question to efficiently classify objects based on their shape. Such that It will search in the vocabulary of that particular shape other than matching it against the whole homograph(~60MB).

I am working on a project to identify products in a Supermart. My idea is to classify products based on shape like-
/Bottle // here i will have a vocabulary of bottles /beer /wine /sauce /Carton // here i will have a vocabulary of carton /washing power /chocolates /Packets // here i will have a vocabulary of Packets /soup sachets /Noodles ... and so on

Please guide me if I am going in wrong direction. Thanks for your help.

Aditi K

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2014-12-02 08:04:17 -0600

bad_keypoints gravatar image

updated 2015-01-08 03:25:48 -0600

Disclaimer: I have not implemented a very good working system of BoW as of yet, but I'm currently working on one.

So far, my findings are these:

1) Good keypoints (obviously). What this means is, keypoints invariant to affine transforms. SIFT/SURF/ORB are good. ORB might be the fastest, but it might not give you the best feature matching quality wise. (Take the hint)

2) Number of clusters. This is the most important. I myself haven't figured out yet what a good number is, given the number of training images. Given that I have N images, I've for now tested and found out that N*100 clusters works well for my purpose (which is not classification, mind you).

I'll update this answer in tomorrow with some links to talk more about the second point. Too many will give you a very hard time detecting/classifying anything. Too less will give you overlapping bad results.

See this, I found it very useful in understanding these problems.

I'll update the answer soon enough with more details. Right now I'm going somewhere.

edit flag offensive delete link more


I have to say that ORB features are not found on the border of the image, which, in my case, was really big 1/2, or 1/3 from the area of the image. Is this OK? How can I change this?

thdrksdfthmn gravatar imagethdrksdfthmn ( 2014-12-03 03:05:43 -0600 )edit

Linked to the second point: If I have about 400 images per class, shall I use 40000 clusters? or 40000*number_of_classes? Isn't this too large? How long will it take?

thdrksdfthmn gravatar imagethdrksdfthmn ( 2014-12-03 03:07:44 -0600 )edit

For your second concern: The goal here is to maximize the potential of a query image to be classified very well. Classification is done good when a sweet spot of number of clusters are formed which SVM classifier (or whatever else you're using) is able to use distinctly classify images belonging to different classes. Thus, its good to have a good number of training images, (I'd say go for 1000+ per class), and have clusters = no. of classes * 100.

This could be wrong, or work well. I've not done classification, so I really can' tell, but if you think logically, 1000 clusters having the common descriptors from images of 10 classes will be a really good train data compared to 100 clusters (too much overlapping) for the same, or 1,000,000 (1000 images for 10 classes, continued...)

bad_keypoints gravatar imagebad_keypoints ( 2014-12-05 03:36:11 -0600 )edit

And from each image, if we save only 200 descriptors, then we will have 200 * 1000 * 10 = 2,000,000 descriptors clustered into 1 million clusters. Which is a lot of spread out area of clusters for an SVM to draw classification distinctiveness between.)

bad_keypoints gravatar imagebad_keypoints ( 2014-12-05 03:39:19 -0600 )edit

Question Tools

1 follower


Asked: 2014-12-01 23:36:21 -0600

Seen: 363 times

Last updated: Jan 08 '15