Find image from a database of images

asked 2017-09-28 17:31:01 -0500

Hello everyone, I'm moving my first steps with OpenCV in Python.
What I'd wish to do is given an image, find its "original" one from a collection of reference images. Just to be clear, the query image is a simple photo of the whole image (card), so it's not the scenario "find an object inside a photo", but "just" a similarity test.
My final database will be pretty large (about 25 000 images), but I started doing some tests on a smaller scale (only 270 images).
Recognition works perfectly, however it's pretty slow: it takes 8 seconds to iterate over all 270 images. I was able to speed up the job by saving the descriptors to disk and load them, instead of calculating them; anyway it's still slow.

So I started to work on FLANN: I get some results, but my main problem is to find the matching images. I get a whole array of points, but I don't know how to fetch the right image.
This is my code:

scanned = 'tests/temp_bw.png'
surf = cv2.xfeatures2d.SURF_create(400)

img1 = cv2.imread(scanned, 0)
kp1, des1 = surf.detectAndCompute(img1, None)

index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)

des_all = None
for filename in os.listdir('images'):
    img2 = cv2.imread('images/' + filename, 0)
    kp2, des2 = surf.detectAndCompute(img2, None)
    if des_all is None:
        des_all = des2
        des_all = np.concatenate((des_all, des2))

flann = cv2.flann.Index()
print "Training...", index_params)
print "Matching..."
indexes, matches = flann.knnSearch(des1, 10)
# and now???

Any suggestions on how I can reference back the most similar image?

edit retag flag offensive close merge delete


berak gravatar imageberak ( 2017-09-28 21:52:10 -0500 )edit

I think I got what you mean. However just to be sure: I want to know if the query image is very similar to train image X. To clarify: I want to know if the query image is Lord Of The Rings book, not a book instead of a hammer (since all images will be book). Should I create a category for each entry of my train set?

tampe125 gravatar imagetampe125 ( 2017-09-29 01:15:31 -0500 )edit

your flann index does "unsupervised" clustering, it does not know about categories or class labels

(if you need that, rather use KNearest, or SVM)

berak gravatar imageberak ( 2017-09-29 01:41:59 -0500 )edit

also, your current code matches descriptors, not images (the way you do it now, you lose the information, which image any traindescriptors originally belonged to)

berak gravatar imageberak ( 2017-09-29 04:12:28 -0500 )edit

given the size of the database, what should be the best approach? Try BOW with KNearest or keep track of the descriptors-images relationship? I'm afraid the latter one will require an huge amount of disk space

tampe125 gravatar imagetampe125 ( 2017-09-29 05:14:43 -0500 )edit

the bow idea would at least solve some problem: you calculate 1 fixed size bow feature vector per image (which is also needed for any kind of ml)

if you throw that at a flann index, you get image indices. (not feature indices)

you still will have to make up your mind, if you need categories, then you need something, that does classification

berak gravatar imageberak ( 2017-09-29 08:34:26 -0500 )edit
berak gravatar imageberak ( 2017-09-30 01:31:56 -0500 )edit