Create features index (image database) and search (Python)

asked 2015-10-01 16:03:20 -0600

user21 gravatar image

updated 2015-10-02 04:31:50 -0600

I have almost 3 million images of different objects. Now I'd like to build an index (database), storing the features of each image (SIFT or SURF). Using this index and a single image, I'd like to find the k most similar images (showing the exact same object). I have been able to extract features and compare single images to each other. But comparing image by image for such a huge database takes forever.

How would I build an index of all images and search this index?

This may be a trivial questions for all the pros here, but I am very new to OpenCV (Python bindings) and very grateful for any help :) Thank you!

edit retag flag offensive close merge delete

Comments

2

"This may be a trivial question" - no, it definitely isn't ;)

berak gravatar imageberak ( 2015-10-02 01:50:49 -0600 )edit

Ok :) Any advice on how to build the index (FLANN for example)? I've read much about it but couldn't actually find any code example.

user21 gravatar imageuser21 ( 2015-10-02 02:47:12 -0600 )edit

unfortunately, opencv's python bindings only have a FLANN matcher, not the FLANN index ;(

berak gravatar imageberak ( 2015-10-02 02:49:23 -0600 )edit
4

From what I've read, Opencv's Python bindings certainly does have the FLANN index (flann_Index). But it is not documented like many other Opencv features.

user21 gravatar imageuser21 ( 2015-10-02 03:01:57 -0600 )edit

ohh, sorry, you're right. for opencv3 this would be in cv2.flann. you could try:

>>> help(cv2.flann)
>>> help(cv2.flann.Index())
berak gravatar imageberak ( 2015-10-02 03:12:12 -0600 )edit
2

Doesn't help :( In Opencv2.4 it's cv2.flann_Index:

help(cv2.flann_Index)

Returns

flann_Index([features, params[, distType]]) -> <flann_Index object>

But how do I add multiple images to this index? Do I simply have to concatenate the features of each image or is there an "add" function?

user21 gravatar imageuser21 ( 2015-10-02 03:35:43 -0600 )edit

"Do I simply have to concatenate the features of each image " - yes, i think so. i've only tried the c++ version, but you will have to flatten/reshape each image to a single row, and build your feature matrix as a stack of those.

berak gravatar imageberak ( 2015-10-02 03:52:05 -0600 )edit

Can you post a snippet of your C++ code? Maybe that helps.

user21 gravatar imageuser21 ( 2015-10-02 03:54:28 -0600 )edit

I've read both threads. But: When I add descriptors of each image

flann.add(des2)

train the matcher

flann.train()

and search

matches = flann.knnMatch(des1,k=2)

how do I know, which of the images match the query image the most? How do I get some kind of ranking and the image names or IDs? Or is this just to compare one image to a group of images and it tells you how good the query image matches the whole group?

user21 gravatar imageuser21 ( 2015-10-02 08:48:20 -0600 )edit

ah, sorry, those links were about flann matcher not the index.

in 3.0, index->knnSearch() returns indices and distances

berak gravatar imageberak ( 2015-10-02 08:59:52 -0600 )edit
1

Ok, thank you very much for your help. I was finally able to concatenate the descriptors of a smaller image group (using numpy) and build the Flann index. For the search I use

indices, dists = flann.knnSearch(query_des, 2, params = {})

and then I use

for m,n in dists:
    if m < 0.7*n:
        ...

to search for good matches and check to which image they belong using the returned indices. I then count good matches for each image to rank them. If the query image has been indexed, it gets of course all the points (perfect match), and all other images get no points. What do I need to change in order for other images to get also points in case they show the same object? Do I have to increase knn in knnSearch?

user21 gravatar imageuser21 ( 2015-10-02 13:51:12 -0600 )edit