SURF performance

asked 2014-07-28 15:22:45 -0500

AEA gravatar image

Hello, I'm using SurfFeatureDetector along with SurfDescriptorExtractor and FlannBasedMatcherto align images. I tried to replace SurfFeatureDetector with gpu::SURF_GPU

I was expecting performance improvements, but the processing time stays almost same (within 5%) My GPU is Quadro K2000 2GB and CPU is Xeon E5-1650 v2 @3.5GHz

I have same situation (almost no improvement in speed) in Remap and gpu::Remap I'm not counting first run of GPU function in timing

Is there a way to make GPU processing faster?

I'm using OpenCV 2.4.9 compiled with VS2012 and CUDA 6.0

edit retag flag offensive close merge delete

Comments

How large is your input data?

StevenPuttemans gravatar imageStevenPuttemans ( 2014-07-28 15:50:02 -0500 )edit

My input data is ~6k x 7k pixels (40MPx) The same (no difference in performance) even for 512 x 512 image

AEA gravatar imageAEA ( 2014-07-29 08:50:03 -0500 )edit
1

Then you are running into pipeline troubles. Basically the downside of a GPU is the tranfering between CPU and GPU of the memory. This is a known bottleneck in GPU architectures. If a 512x512 image takes the same time, it is due to the fact that your GPU has perfect processing speeds but the bottleneck is the transfering of the data back to the CPU memory. I only know this problem by concept, haven't run into it myself. Might need someone with better GPU architecture understanding to solve this.

StevenPuttemans gravatar imageStevenPuttemans ( 2014-07-30 03:36:13 -0500 )edit
2

Thank you very much. I thought OpenCV optimizes pipelining itself. Now I will do it. Thanks

AEA gravatar imageAEA ( 2014-07-30 09:57:04 -0500 )edit

OpenCv should optimize the pipelining itself but I have experience similar results where a latent SVM classifier runs as fast on GPU as on CPU on a 500 x 500 pixel image. However I find it strange that such a large image doesn't run faster on GPU. What could happen is that your GPU memory is limited and that your image can't be hold at once in GPU memory. Then you will have multiple passings of that bottleneck.

StevenPuttemans gravatar imageStevenPuttemans ( 2014-07-31 02:04:24 -0500 )edit