SVM Predict Slow

asked 2012-09-04 10:59:35 -0500

Arrakis gravatar image

Hello all,

I am using a SVM in (what should be) a 30hz application using Ubuntu 10.4 and OpenCV 2.4. I need to do approximately 500 classifications per frame. Even using a linear SVM predict for all 500 samples is very slow for me. I have profiled that cv::SVM takes 75% of all computation time in my program, and it's only running at 17FPS. However only about 1.5 cores of my 4 core CPU are being utilised (running top gives 160%). Programs like GNU Parallel result in ~370% of my CPU being utilised.

My problem is that multithreading the SVM prediction does not give a performance boost. I have tried both the SVM predict API that uses cv::parallel_for, and the API which does not. Using 1 thread gives around 14FPS, using 2 threads gives around 17FPS, and using more still gives ~17FPS.

My Question: Why is the SVM prediction slow, and yet still only using less than half of my CPU cores? Why does manually multithreading and doing half the predictions between 2 threads give only a small speedup? Is the OpenCV SVM just not very fast and should I use another implementation?

Many thanks

edit retag flag offensive close merge delete


I should add that this situation hasn't changed in version 2.4.3, despite the promise of better multicore performance. Compiling with TBB on actually gave worse performance than with if off.

Arrakis gravatar imageArrakis ( 2013-01-23 08:35:17 -0500 )edit

post your code for multithreading. I think you should copy SVM data structure to each thread.

mrgloom gravatar imagemrgloom ( 2013-04-16 05:03:36 -0500 )edit

I have similar issues here. I'm running sliding window across an image, and while my feature extraction took only around 900ms, by passing it to SVM it slows down to almost 40 secs. Using TBB's parallel_for to break down my loop, made it worse, the whole thing slowed down to 50 secs (either thru manual or automatic chunking)

sub_o gravatar imagesub_o ( 2013-04-18 02:10:39 -0500 )edit

I have a separate SVM data structure copy for each thread. Instantiation code:

static std::vector<cv::SVM> shared_svm;

Classification code:

static void processThread(const std::size_t i)
    const cv::SVM &svm = shared_svm[i];
    const CvMat predictor_old = ...
    CvMat *responses_old_ptr = cvCreateMat(rows, 1, CV_32FC1);
    svm.predict(&predictor_old, responses_old_ptr);
Arrakis gravatar imageArrakis ( 2013-04-27 12:34:43 -0500 )edit

How it improved performance?

mrgloom gravatar imagemrgloom ( 2013-04-30 04:55:51 -0500 )edit

I'm afraid that doesn't make sense. Performance hasn't improved, but I haven't made any changes.

Arrakis gravatar imageArrakis ( 2013-05-04 10:27:02 -0500 )edit

Dumb question, but if you're using SVM as a linear predictor, can't you just get mean and bias off of it and do your own calculation from it? Would be much faster than having to go through all your support vectors and computing a dot product off of them for each of your 500 samples.

B4silio gravatar imageB4silio ( 2016-08-16 04:03:39 -0500 )edit