Ask Your Question

OpenCL in OpenCV 3.0.0

asked 2014-05-26 23:07:05 -0500

chaanakya gravatar image

updated 2014-06-14 17:37:06 -0500

I'm attempting to use OpenCV 3.0.0 (yes, I know it's a development version) to work with OpenCL. I'm using the UMat structure instead of the ocl::oclMat structure found in earlier versions. As expected, those matrices are getting created on the GPU side. However, when I attempt to run GaussianBlur for example on those matrices, things slow down to a crawl. Earlier, this would have been solved by using ocl::GaussianBlur, but that does not exist anymore. How is one supposed to achieve this in OpenCV 3.0?


Now that I have ostensibly enabled using OpenCL for dealing with UMats, things are still slowing down to a crawl. Here is the code that I am currently using to test this out:

#include "opencv2/opencv.hpp"
#include "opencv2/core/ocl.hpp"
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc, char** argv)
  Mat gpuFrame;
  UMat gpuBW;
  UMat gpuBlur;
  UMat gpuEdges;
  VideoCapture cap(0); // open the default camera
  if(!cap.isOpened())  // check if we succeeded
    return -1;
      cap >> gpuFrame; // get a new frame from camera
      cvtColor(gpuFrame, gpuBW, COLOR_BGR2GRAY);
      GaussianBlur(gpuBW, gpuBlur, Size(1,1), 1.5, 1.5);
      Canny(gpuBlur, gpuEdges, 0, 30, 3);
      imshow("edges", gpuEdges);
      if(waitKey(30) >= 0) break;
  // the camera will be deinitialized automatically in VideoCapture destructor
  return 0;


Changing the gpuFrame to be a regular matrix seems to have solved the issue. Thank you! :)

Edit 3

I seem to have spoken too soon --- changing gpuFrame to a regular Mat object fixed everything because everything then became CPU computations! Why is it that I cannot do multiple computations using OpenCL and not have the GPU freeze up? In my dmesg, it says the following whenever I run my program:

[670017.262677] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670025.273070] [drm] stuck on render ring
[670025.273999] [drm] GPU HANG: ecode 0:0x8fd8ffff, in VideoCapture [26945], reason: Ring hung, action: reset
[670027.274692] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670031.265908] [drm] stuck on render ring
[670031.266856] [drm] GPU HANG: ecode 0:0x8fd8ffff, in VideoCapture [26945], reason: Ring hung, action: reset
[670031.266984] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
[670033.267655] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

This seems to suggest that my loop is going too quickly. But the GPU should be quicker at computations (especially matrix computations) than a CPU, right? So what's going on?

Thank you very much!



edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted

answered 2014-05-27 06:30:36 -0500

In OpenCV 3.0-dev, user can decide the behavior of UMat by cv::ocl::setUseOpenCL().

cv::ocl::setUseOpenCL(true); // enable OpenCL in the processing of UMat
cv::ocl::setUseOpenCL(false); // disable OpenCL in the processing of UMat

And, you need to include the following header to use this function.

#include <opencv2/core/ocl.hpp>
edit flag offensive delete link more


Things are still slowing down to a crawl - please see my revised question with code.

chaanakya gravatar imagechaanakya ( 2014-06-14 17:13:11 -0500 )edit

answered 2014-06-16 08:28:07 -0500

updated 2014-06-16 08:33:51 -0500

I checked on my environment. The details are as follows.

  • Windows 8.1
  • Visual Studio 2012 Update4
  • NVIDIA GeForce GTX 680
  • CUDA 6.0

But, I don't have web camera. So, I changed source to video file from camera input. In my environment, UMat(OpenCL-enabled) is faster than UMat(OpenCL-disabled).

[670017.262677] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670025.273070] [drm] stuck on render ring

I think that your environment is Linux. It looks like issue particular to Linux. If the performance improved when you use video file as input data, it may be a issue of the capture on Linux.

edit flag offensive delete link more


Have you tried using only OpenCL (no CUDA)? I'm having the same issue with your example...

chaanakya gravatar imagechaanakya ( 2014-06-16 15:21:31 -0500 )edit

I tried using only OpenCL(with CUDA). If you possible, could you please tell me the result of this program?

dandelion1124 gravatar imagedandelion1124 ( 2014-06-16 16:55:15 -0500 )edit

I don't have CUDA, so that program will not work. I have an Intel GPU and I'm using Beignet, which is Intel's implementation of OpenCL. Can you please try your (first) program without CUDA and see how it works? NVIDIA should also have an implementation of OpenCL, so you should be able to use it.

chaanakya gravatar imagechaanakya ( 2014-06-16 18:18:36 -0500 )edit

On my environment, I can select only GPU as OpenCL platform. So, I can not run my program without CUDA. And, there is the following explanation in Beignet's web page. Maybe, this explanation might be helpful for you.

  • Note about OpenCV support

I think that you can use Intel OpenCL SDK as other choices. By the way, you can get build infomation of OpenCV to call getBuildInformation(). This information is very useful to report your environment.

dandelion1124 gravatar imagedandelion1124 ( 2014-06-17 06:03:05 -0500 )edit

Which platform do you use? I mean, IVB or HSW? If it is HSW, you need to apply a kernel patch to enable SLM and barrier support, you can find details in the README. If it is IVB, then you can try to disable the hang check firstly. Just as dandelion1124 pointed out, you can find detail instructions on the beignet's web page or in the README file.

gongzg gravatar imagegongzg ( 2014-07-16 10:35:50 -0500 )edit

I don't see OCL anywhere under modules, in master branch. I used master branch to build 3.0.0, but OCL is missing. How you build and got OCL working?

hesh gravatar imagehesh ( 2014-07-23 17:12:12 -0500 )edit

answered 2015-07-26 03:46:55 -0500

Anna Lucia gravatar image

Have you tested other filter algorithms, are they still slower down? In my project the GaussianBlur is slower in GPU than CUP, but the other algorithms are more faster in GPU. Maybe the GaussianBlur is suitbale computing in CPU because of its theory.

edit flag offensive delete link more

Question Tools


Asked: 2014-05-26 23:07:05 -0500

Seen: 12,271 times

Last updated: Jul 26 '15