Create features index (image database) and search (Python)
I have almost 3 million images of different objects. Now I'd like to build an index (database), storing the features of each image (SIFT or SURF). Using this index and a single image, I'd like to find the k most similar images (showing the exact same object). I have been able to extract features and compare single images to each other. But comparing image by image for such a huge database takes forever.
How would I build an index of all images and search this index?
This may be a trivial questions for all the pros here, but I am very new to OpenCV (Python bindings) and very grateful for any help :) Thank you!
"This may be a trivial question" - no, it definitely isn't ;)
Ok :) Any advice on how to build the index (FLANN for example)? I've read much about it but couldn't actually find any code example.
unfortunately, opencv's python bindings only have a FLANN matcher, not the FLANN index ;(From what I've read, Opencv's Python bindings certainly does have the FLANN index (flann_Index). But it is not documented like many other Opencv features.
ohh, sorry, you're right. for opencv3 this would be in cv2.flann. you could try:
Doesn't help :( In Opencv2.4 it's cv2.flann_Index:
Returns
But how do I add multiple images to this index? Do I simply have to concatenate the features of each image or is there an "add" function?
"Do I simply have to concatenate the features of each image " - yes, i think so. i've only tried the c++ version, but you will have to flatten/reshape each image to a single row, and build your feature matrix as a stack of those.
Can you post a snippet of your C++ code? Maybe that helps.
I've read both threads. But: When I add descriptors of each image
train the matcher
and search
how do I know, which of the images match the query image the most? How do I get some kind of ranking and the image names or IDs? Or is this just to compare one image to a group of images and it tells you how good the query image matches the whole group?
ah, sorry, those links were about flann matcher not the index.
in 3.0, index->knnSearch() returns indices and distances
Ok, thank you very much for your help. I was finally able to concatenate the descriptors of a smaller image group (using numpy) and build the Flann index. For the search I use
and then I use
to search for good matches and check to which image they belong using the returned indices. I then count good matches for each image to rank them. If the query image has been indexed, it gets of course all the points (perfect match), and all other images get no points. What do I need to change in order for other images to get also points in case they show the same object? Do I have to increase knn in knnSearch?