SVM Predict Slow
Hello all,
I am using a SVM in (what should be) a 30hz application using Ubuntu 10.4 and OpenCV 2.4. I need to do approximately 500 classifications per frame. Even using a linear SVM predict for all 500 samples is very slow for me. I have profiled that cv::SVM takes 75% of all computation time in my program, and it's only running at 17FPS. However only about 1.5 cores of my 4 core CPU are being utilised (running top gives 160%). Programs like GNU Parallel result in ~370% of my CPU being utilised.
My problem is that multithreading the SVM prediction does not give a performance boost. I have tried both the SVM predict API that uses cv::parallel_for, and the API which does not. Using 1 thread gives around 14FPS, using 2 threads gives around 17FPS, and using more still gives ~17FPS.
My Question: Why is the SVM prediction slow, and yet still only using less than half of my CPU cores? Why does manually multithreading and doing half the predictions between 2 threads give only a small speedup? Is the OpenCV SVM just not very fast and should I use another implementation?
Many thanks
I should add that this situation hasn't changed in version 2.4.3, despite the promise of better multicore performance. Compiling with TBB on actually gave worse performance than with if off.
post your code for multithreading. I think you should copy SVM data structure to each thread.
I have similar issues here. I'm running sliding window across an image, and while my feature extraction took only around 900ms, by passing it to SVM it slows down to almost 40 secs. Using TBB's parallel_for to break down my loop, made it worse, the whole thing slowed down to 50 secs (either thru manual or automatic chunking)
I have a separate SVM data structure copy for each thread. Instantiation code:
Classification code:
How it improved performance?
I'm afraid that doesn't make sense. Performance hasn't improved, but I haven't made any changes.
Dumb question, but if you're using SVM as a linear predictor, can't you just get mean and bias off of it and do your own calculation from it? Would be much faster than having to go through all your support vectors and computing a dot product off of them for each of your 500 samples.