Create features index (image database) and search (Python)

asked 2015-10-01 16:03:20 -0500

user21 gravatar image

updated 2015-10-02 04:31:50 -0500

I have almost 3 million images of different objects. Now I'd like to build an index (database), storing the features of each image (SIFT or SURF). Using this index and a single image, I'd like to find the k most similar images (showing the exact same object). I have been able to extract features and compare single images to each other. But comparing image by image for such a huge database takes forever.

How would I build an index of all images and search this index?

This may be a trivial questions for all the pros here, but I am very new to OpenCV (Python bindings) and very grateful for any help :) Thank you!

edit retag flag offensive close merge delete

Comments

2

"This may be a trivial question" - no, it definitely isn't ;)

berak gravatar imageberak ( 2015-10-02 01:50:49 -0500 )edit

Ok :) Any advice on how to build the index (FLANN for example)? I've read much about it but couldn't actually find any code example.

user21 gravatar imageuser21 ( 2015-10-02 02:47:12 -0500 )edit

unfortunately, opencv's python bindings only have a FLANN matcher, not the FLANN index ;(

berak gravatar imageberak ( 2015-10-02 02:49:23 -0500 )edit
4

From what I've read, Opencv's Python bindings certainly does have the FLANN index (flann_Index). But it is not documented like many other Opencv features.

user21 gravatar imageuser21 ( 2015-10-02 03:01:57 -0500 )edit

ohh, sorry, you're right. for opencv3 this would be in cv2.flann. you could try:

>>> help(cv2.flann)
>>> help(cv2.flann.Index())
berak gravatar imageberak ( 2015-10-02 03:12:12 -0500 )edit
2

Doesn't help :( In Opencv2.4 it's cv2.flann_Index:

help(cv2.flann_Index)

Returns

flann_Index([features, params[, distType]]) -> <flann_Index object>

But how do I add multiple images to this index? Do I simply have to concatenate the features of each image or is there an "add" function?

user21 gravatar imageuser21 ( 2015-10-02 03:35:43 -0500 )edit

"Do I simply have to concatenate the features of each image " - yes, i think so. i've only tried the c++ version, but you will have to flatten/reshape each image to a single row, and build your feature matrix as a stack of those.

berak gravatar imageberak ( 2015-10-02 03:52:05 -0500 )edit

Can you post a snippet of your C++ code? Maybe that helps.

user21 gravatar imageuser21 ( 2015-10-02 03:54:28 -0500 )edit

I've read both threads. But: When I add descriptors of each image

flann.add(des2)

train the matcher

flann.train()

and search

matches = flann.knnMatch(des1,k=2)

how do I know, which of the images match the query image the most? How do I get some kind of ranking and the image names or IDs? Or is this just to compare one image to a group of images and it tells you how good the query image matches the whole group?

user21 gravatar imageuser21 ( 2015-10-02 08:48:20 -0500 )edit

ah, sorry, those links were about flann matcher not the index.

in 3.0, index->knnSearch() returns indices and distances

berak gravatar imageberak ( 2015-10-02 08:59:52 -0500 )edit