Ask Your Question

Revision history [back]

Software running significantly slower with OpenCL

I've recently moved to OpenCV 3.0.0, and I am looking into OpenCL potential to optimize software. For that I wrote a simple face detection software that detects and matches faces between frames. For this, I use a xml Cascade Classifier I found online with the detectMultiscale method, and a Hungarian Algorithm to match rectangles.

Code:

    _timer.start
    while (cv::waitKey(1) != 27)
    {

//Get frame from webcam
        _cam.getFrame(_frame);

//Resize from 640x480 to 320x240
        if (_width != _frame.cols)
        {
            cv::resize(_frame, _frame, cv::Size(_width, _height));
        }

//Calls detectMultiScale and matching algorithm
        _faceDetector.update(_frame);
        _faceDetector.getFaceVector(_faceVector);

//Display image and face rectangles
        for (unsigned int n = 0; n < _faceVector.size(); n++)
        {
            if ( _faceVector[n]->getConfidenceFactor() > 5)
            {
                std::stringstream ss, ss2;
                ss << _faceVector[n]->getID();
                ss2 << _faceVector[n]->getConfidenceFactor();
                cv::putText(_frame, ss.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 255), 2, 8, false);
                cv::putText(_frame, ss2.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y +_faceVector[n]->getBoundingBox().height), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 255, 0), 2, 8, false);
                cv::rectangle(_frame, _faceVector[n]->getBoundingBox(), CV_RGB(0,255,0), 1, 8, 0);
            }
        }
//Calculate frames per second
        _counterFps++;

        if (_timer.getElapsedTime() >= 1)
        {
            _timer.reset();
            std::cout << "FPS: " << _counterFps << std::endl;
            _counterFps = 0;
        }
    }

I turn on and off ocl with the following call on system initialization:

cv::ocl::setUseOpenCL(bool)

System characteristics:

Processor: Intel core 2 duo

GPU: GeForce GT220

Operating System: Linux (gentoo)

Results:

If I set useOpenCL to false, this code runs at 30 fps and uses 98% of the CPU (got this from looking at top) If I set useOpenCL to true, the code runs at 24 fps and uses 198% of CPU.

I also monitor the GPU temperature, and it rises significantly when OpenCL is being used, so it confirms that is working. If this is the case, how come that the CPU is working twice as much? It doesn't make any sense.

I need help interpreting these results, since I was really hoping for a good boost on performance.

Best regards.

image description

Software running significantly slower with OpenCL

I've recently moved to OpenCV 3.0.0, and I am looking into OpenCL potential to optimize software. For that I wrote a simple face detection software that detects and matches faces between frames. For this, I use a xml Cascade Classifier I found online with the detectMultiscale method, and a Hungarian Algorithm to match rectangles.

Code:

    _timer.start
    while (cv::waitKey(1) != 27)
    {

//Get frame from webcam
        _cam.getFrame(_frame);

//Resize from 640x480 to 320x240
        if (_width != _frame.cols)
        {
            cv::resize(_frame, _frame, cv::Size(_width, _height));
        }

//Calls detectMultiScale and matching algorithm
        _faceDetector.update(_frame);
        _faceDetector.getFaceVector(_faceVector);

//Display image and face rectangles
        for (unsigned int n = 0; n < _faceVector.size(); n++)
        {
            if ( _faceVector[n]->getConfidenceFactor() > 5)
            {
                std::stringstream ss, ss2;
                ss << _faceVector[n]->getID();
                ss2 << _faceVector[n]->getConfidenceFactor();
                cv::putText(_frame, ss.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 255), 2, 8, false);
                cv::putText(_frame, ss2.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y +_faceVector[n]->getBoundingBox().height), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 255, 0), 2, 8, false);
                cv::rectangle(_frame, _faceVector[n]->getBoundingBox(), CV_RGB(0,255,0), 1, 8, 0);
            }
        }
//Calculate frames per second
        _counterFps++;

        if (_timer.getElapsedTime() >= 1)
        {
            _timer.reset();
            std::cout << "FPS: " << _counterFps << std::endl;
            _counterFps = 0;
        }
    }

I turn on and off ocl with the following call on system initialization:

cv::ocl::setUseOpenCL(bool)

System characteristics:

Processor: Intel core 2 duo

GPU: GeForce GT220

Operating System: Linux (gentoo)

Results:

If I set useOpenCL to false, this code runs at 30 fps and uses 98% of the CPU (got this from looking at top) If I set useOpenCL to true, the code runs at 24 fps and uses 198% of CPU.

EDIT: Let me clarify that this difference isn't noticable until the detectMultiScale method is called. Capturing frame from webcam and resizing have the same cost. The visualization block also has the same cost with or without OpenCL.

I also monitor the GPU temperature, and it rises significantly when OpenCL is being used, so it confirms that is working. If this is the case, how come that the CPU is working twice as much? It doesn't make any sense.

I need help interpreting these results, since I was really hoping for a good boost on performance.

Best regards.

image description

Software running significantly slower with OpenCLOpenCL (Opencv 3.0.0)

I've recently moved to OpenCV 3.0.0, and I am looking into OpenCL potential to optimize software. For that I wrote a simple face detection software that detects and matches faces between frames. For this, I use a xml Cascade Classifier I found online with the detectMultiscale method, and a Hungarian Algorithm to match rectangles.

Code:

    _timer.start
    while (cv::waitKey(1) != 27)
    {

//Get frame from webcam
        _cam.getFrame(_frame);

//Resize from 640x480 to 320x240
        if (_width != _frame.cols)
        {
            cv::resize(_frame, _frame, cv::Size(_width, _height));
        }

//Calls detectMultiScale and matching algorithm
        _faceDetector.update(_frame);
        _faceDetector.getFaceVector(_faceVector);

//Display image and face rectangles
        for (unsigned int n = 0; n < _faceVector.size(); n++)
        {
            if ( _faceVector[n]->getConfidenceFactor() > 5)
            {
                std::stringstream ss, ss2;
                ss << _faceVector[n]->getID();
                ss2 << _faceVector[n]->getConfidenceFactor();
                cv::putText(_frame, ss.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 255), 2, 8, false);
                cv::putText(_frame, ss2.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y +_faceVector[n]->getBoundingBox().height), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 255, 0), 2, 8, false);
                cv::rectangle(_frame, _faceVector[n]->getBoundingBox(), CV_RGB(0,255,0), 1, 8, 0);
            }
        }
//Calculate frames per second
        _counterFps++;

        if (_timer.getElapsedTime() >= 1)
        {
            _timer.reset();
            std::cout << "FPS: " << _counterFps << std::endl;
            _counterFps = 0;
        }
    }

I turn on and off ocl with the following call on system initialization:

cv::ocl::setUseOpenCL(bool)

System characteristics:

Processor: Intel core 2 duo

GPU: GeForce GT220

Operating System: Linux (gentoo)

Results:

If I set useOpenCL to false, this code runs at 30 fps and uses 98% of the CPU (got this from looking at top) If I set useOpenCL to true, the code runs at 24 fps and uses 198% of CPU.

EDIT: Let me clarify that this difference isn't noticable until the detectMultiScale method is called. Capturing frame from webcam and resizing have the same cost. The visualization block also has the same cost with or without OpenCL.

I also monitor the GPU temperature, and it rises significantly when OpenCL is being used, so it confirms that is working. If this is the case, how come that the CPU is working twice as much? It doesn't make any sense.

I need help interpreting these results, since I was really hoping for a good boost on performance.

Best regards.

image description

Software running significantly slower with OpenCL (Opencv 3.0.0)

I've recently moved to OpenCV 3.0.0, and I am looking into OpenCL potential to optimize software. For that I wrote a simple face detection software that detects and matches faces between frames. For this, I use a xml Cascade Classifier I found online with the detectMultiscale method, and a Hungarian Algorithm to match rectangles.

Code:

    _timer.start
    while (cv::waitKey(1) != 27)
    {

//Get frame from webcam
        _cam.getFrame(_frame);

//Resize from 640x480 to 320x240
        if (_width != _frame.cols)
        {
            cv::resize(_frame, _frame, cv::Size(_width, _height));
        }

//Calls detectMultiScale and matching algorithm
        _faceDetector.update(_frame);
        _faceDetector.getFaceVector(_faceVector);

//Display image and face rectangles
        for (unsigned int n = 0; n < _faceVector.size(); n++)
        {
            if ( _faceVector[n]->getConfidenceFactor() > 5)
            {
                std::stringstream ss, ss2;
                ss << _faceVector[n]->getID();
                ss2 << _faceVector[n]->getConfidenceFactor();
                cv::putText(_frame, ss.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 255), 2, 8, false);
                cv::putText(_frame, ss2.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y +_faceVector[n]->getBoundingBox().height), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 255, 0), 2, 8, false);
                cv::rectangle(_frame, _faceVector[n]->getBoundingBox(), CV_RGB(0,255,0), 1, 8, 0);
            }
        }
//Calculate frames per second
        _counterFps++;

        if (_timer.getElapsedTime() >= 1)
        {
            _timer.reset();
            std::cout << "FPS: " << _counterFps << std::endl;
            _counterFps = 0;
        }
    }

I turn on and off ocl with the following call on system initialization:

cv::ocl::setUseOpenCL(bool)

System characteristics:

Processor: Intel core 2 duo

GPU: GeForce GT220

Operating System: Linux (gentoo)

Results:

If I set useOpenCL to false, this code runs at 30 fps and uses 98% of the CPU (got this from looking at top) If I set useOpenCL to true, the code runs at 24 fps and uses 198% of CPU.

EDIT: EDIT1: Let me clarify that this difference isn't noticable noticeable until the detectMultiScale method is called. Capturing frame from webcam and resizing have the same cost. The visualization block also has the same cost with or without OpenCL. I used a timer to verify that the detectMultiScale method takes longer with openCL flag active when compared to no OpenCL flag.

I also monitor the GPU temperature, and it rises significantly when OpenCL is being used, so it confirms that is working. If this is the case, how come that the CPU is working twice as much? It doesn't make any sense.

EDIT2: I tried to test this with other algorithms, such as Sobel (which is documented to be up to 32x faster here, but, this time, there is no change in the CPU processing, meaning, it works the same with or without openCL. Also used a timer to verify that it takes the same time to process edges.

I need help interpreting these results, since I was really hoping for a good boost on performance.

Best regards.

image description

Software running significantly slower with OpenCL (Opencv 3.0.0)

I've recently moved to OpenCV 3.0.0, and I am looking into OpenCL potential to optimize software. For that I wrote a simple face detection software that detects and matches faces between frames. For this, I use a xml Cascade Classifier I found online with the detectMultiscale method, and a Hungarian Algorithm to match rectangles.

Code:

    _timer.start
    while (cv::waitKey(1) != 27)
    {

//Get frame from webcam
        _cam.getFrame(_frame);

//Resize from 640x480 to 320x240
        if (_width != _frame.cols)
        {
            cv::resize(_frame, _frame, cv::Size(_width, _height));
        }

//Calls detectMultiScale and matching algorithm
        _faceDetector.update(_frame);
        _faceDetector.getFaceVector(_faceVector);

//Display image and face rectangles
        for (unsigned int n = 0; n < _faceVector.size(); n++)
        {
            if ( _faceVector[n]->getConfidenceFactor() > 5)
            {
                std::stringstream ss, ss2;
                ss << _faceVector[n]->getID();
                ss2 << _faceVector[n]->getConfidenceFactor();
                cv::putText(_frame, ss.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 255), 2, 8, false);
                cv::putText(_frame, ss2.str(), cv::Point(_faceVector[n]->getBoundingBox().x, _faceVector[n]->getBoundingBox().y +_faceVector[n]->getBoundingBox().height), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 255, 0), 2, 8, false);
                cv::rectangle(_frame, _faceVector[n]->getBoundingBox(), CV_RGB(0,255,0), 1, 8, 0);
            }
        }
//Calculate frames per second
        _counterFps++;

        if (_timer.getElapsedTime() >= 1)
        {
            _timer.reset();
            std::cout << "FPS: " << _counterFps << std::endl;
            _counterFps = 0;
        }
    }

I turn on and off ocl with the following call on system initialization:

cv::ocl::setUseOpenCL(bool)

System characteristics:

Processor: Intel core 2 duo

GPU: GeForce GT220

Operating System: Linux (gentoo)

Results:

If I set useOpenCL to false, this code runs at 30 fps and uses 98% of the CPU (got this from looking at top) If I set useOpenCL to true, the code runs at 24 fps and uses 198% of CPU.

EDIT1: Let me clarify that this difference isn't noticeable until the detectMultiScale method is called. Capturing frame from webcam and resizing have the same cost. The visualization block also has the same cost with or without OpenCL. I used a timer to verify that the detectMultiScale method takes longer with openCL flag active when compared to no OpenCL flag.

I also monitor the GPU temperature, and it rises significantly when OpenCL is being used, so it confirms that is working. If this is the case, how come that the CPU is working twice as much? It doesn't make any sense.

EDIT2: I tried to test this with other algorithms, such as Sobel (which is documented to be up to 32x faster with OpenCL here, but, this time, there is no change in the CPU processing, meaning, it works the same with or without openCL. Also used a timer to verify that it takes the same time to process edges.

I need help interpreting these results, since I was really hoping for a good boost on performance.

Best regards.

image description