# LSH matching very slow

Hey guys and girls,

I'm testing a recognition pipeline using ORB descriptors. I recently changed the matcher from bruteforce to flann, using the new LSH index. Unfortunately the LSH matching is about 5-6 times slower than bruteforce using about 1000 descriptors as a database. Has anybody experienced similar problems? Is the LSH code still under development?

Current git commit hash: 72a4f19.

Part of the code:

  boost::shared_ptr<cv::FlannBasedMatcher> m_matcherPtr;
cv::Ptr<cv::flann::IndexParams> indexParams = new cv::flann::LshIndexParams(12, 20, 2);
m_matcherPtr = boost::shared_ptr<cv::FlannBasedMatcher>(new cv::FlannBasedMatcher(indexParams));
m_matcherPtr->train();

// This step takes very long
m_matcherPtr->knnMatch(*query, knnmatches, 1);

edit retag close merge delete

Sort by » oldest newest most voted

1000 descriptors are too few.
I have tried both BF and LSH with about 1'000'000 descriptors and time were something like:

BruteForce: 50sec (knnMatch)
LSH: 2sec         (knnMatch)


I have tuned a bit the LSH param to get that speed, anyway with default param and 1mil descriptors the knnMatch took 4 sec

more

Are you saying that LSH with 1000 descriptors is slower than with 1.000.000? How long does your code take for LSH with 1000 descriptors?

( 2012-08-21 05:54:26 -0500 )edit

@rui: i haven't tried it with 1000 descriptors. it's not that usefull to use LSH with so few descriptors

( 2012-08-21 09:01:57 -0500 )edit

Maybe using LSH is not that much better than BF in this case, but i think it should be at least near the performance of the BF matching for small datasets. With the 1000 descriptors i used, LSH needs about 30-40 seconds. Using 1000 times the number of descriptors wont speed up the matching magically. So i guess there has to be some problem with my code or OpenCV. Does your code look similar to the one i use? Could you post it?

( 2012-08-27 10:01:59 -0500 )edit
2

Hi, I implemented the LSH in FLANN/OpenCV and I would concur with yes123: LSH will not give you a speedup for 1000 descriptors and probably for 10000 too. You will first need to change the parameters a lot but even then, there will be so many cache misses (as you have to go in a bunch of random buckets around) that LSH will be slower in that case.

Think about the extreme case of 1 element. With LSH, you need to compute several hash values, get the different bucks located somewhere in RAM, go over them to find the nearest neighbors and compute their Hamming distance. With your scale, most of those would be empty but the algorithm is still way more complex than a few clock ticks.

( 2012-08-27 14:35:30 -0500 )edit

I fixed it. I played a bit with the parameters of LSH and changed them to LshIndexParams(8, 30, 2). Now i'm down to 0.4 seconds for 2000 descriptors, which is fast enough for my problem.

Thanks to everyone!

more

In my case, (30, 8, 2) works fine

( 2013-04-07 17:07:52 -0500 )edit

I had the same issue with LSH descriptor matching. I have played around quite a bit with the parameters. I found that the main factor slowing down the process is the use of the multi-probe (third parameter). The multi-probe LSH is a special case of LSH which is meant to improve matching accuracy at the cost of a much higher matching time. The simple LSH (without multi-probe) is much faster (about 10x) and for a low number of descriptors (<10000) the drop in accuracy is not that significant. For 1000x1000 descriptor matching, I get a processing time of about 2 ms on a powerful laptop (about 4.5 ms for brute-force matching). To disable multi-probe, just set the multi-probe parameter (thrid parameter of the LSHIndexParams constructor) to 1.

Lastly, when you call the macthing methods, they always call the train method first. A simple workaround is to derive the FlannBasedMatcher class to make the xxxMatchImpl methods public. But then, it is up to you to make sure that the LSH search index has been trained before you invoke a matching method.

more

Official site

GitHub

Wiki

Documentation