Ask Your Question
2

OpenCL in OpenCV 3.0.0

asked May 27 '14

chaanakya gravatar image

updated Jun 14 '14

I'm attempting to use OpenCV 3.0.0 (yes, I know it's a development version) to work with OpenCL. I'm using the UMat structure instead of the ocl::oclMat structure found in earlier versions. As expected, those matrices are getting created on the GPU side. However, when I attempt to run GaussianBlur for example on those matrices, things slow down to a crawl. Earlier, this would have been solved by using ocl::GaussianBlur, but that does not exist anymore. How is one supposed to achieve this in OpenCV 3.0?

EDIT

Now that I have ostensibly enabled using OpenCL for dealing with UMats, things are still slowing down to a crawl. Here is the code that I am currently using to test this out:

#include "opencv2/opencv.hpp"
#include "opencv2/core/ocl.hpp"
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc, char** argv)
{
  ocl::setUseOpenCL(true);
  Mat gpuFrame;
  UMat gpuBW;
  UMat gpuBlur;
  UMat gpuEdges;
  VideoCapture cap(0); // open the default camera
  if(!cap.isOpened())  // check if we succeeded
    return -1;
  namedWindow("edges",1);
  for(;;)
    {
      cap >> gpuFrame; // get a new frame from camera
      cvtColor(gpuFrame, gpuBW, COLOR_BGR2GRAY);
      GaussianBlur(gpuBW, gpuBlur, Size(1,1), 1.5, 1.5);
      Canny(gpuBlur, gpuEdges, 0, 30, 3);
      imshow("edges", gpuEdges);
      if(waitKey(30) >= 0) break;
    }
  // the camera will be deinitialized automatically in VideoCapture destructor
  return 0;
}

EDIT 2

Changing the gpuFrame to be a regular matrix seems to have solved the issue. Thank you! :)

Edit 3

I seem to have spoken too soon --- changing gpuFrame to a regular Mat object fixed everything because everything then became CPU computations! Why is it that I cannot do multiple computations using OpenCL and not have the GPU freeze up? In my dmesg, it says the following whenever I run my program:

[670017.262677] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670025.273070] [drm] stuck on render ring
[670025.273999] [drm] GPU HANG: ecode 0:0x8fd8ffff, in VideoCapture [26945], reason: Ring hung, action: reset
[670027.274692] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670031.265908] [drm] stuck on render ring
[670031.266856] [drm] GPU HANG: ecode 0:0x8fd8ffff, in VideoCapture [26945], reason: Ring hung, action: reset
[670031.266984] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
[670033.267655] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

This seems to suggest that my loop is going too quickly. But the GPU should be quicker at computations (especially matrix computations) than a CPU, right? So what's going on?

Thank you very much!

Sincerely,

Chaanakya

Preview: (hide)

3 answers

Sort by » oldest newest most voted
3

answered May 27 '14

In OpenCV 3.0-dev, user can decide the behavior of UMat by cv::ocl::setUseOpenCL().

cv::ocl::setUseOpenCL(true); // enable OpenCL in the processing of UMat
cv::ocl::setUseOpenCL(false); // disable OpenCL in the processing of UMat

And, you need to include the following header to use this function.

#include <opencv2/core/ocl.hpp>
Preview: (hide)

Comments

Things are still slowing down to a crawl - please see my revised question with code.

chaanakya gravatar imagechaanakya (Jun 14 '14)edit
0

answered Jun 16 '14

updated Jun 16 '14

I checked on my environment. The details are as follows.

  • Windows 8.1
  • Visual Studio 2012 Update4
  • NVIDIA GeForce GTX 680
  • CUDA 6.0

But, I don't have web camera. So, I changed source to video file from camera input. In my environment, UMat(OpenCL-enabled) is faster than UMat(OpenCL-disabled). https://gist.github.com/atinfinity/fd82f794ebf736a2e2e3

[670017.262677] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[670025.273070] [drm] stuck on render ring

I think that your environment is Linux. It looks like issue particular to Linux. If the performance improved when you use video file as input data, it may be a issue of the capture on Linux.

Preview: (hide)

Comments

Have you tried using only OpenCL (no CUDA)? I'm having the same issue with your example...

chaanakya gravatar imagechaanakya (Jun 16 '14)edit

I tried using only OpenCL(with CUDA). If you possible, could you please tell me the result of this program?

https://gist.github.com/atinfinity/276c02f2c24cde30ae5b

dandelion1124 gravatar imagedandelion1124 (Jun 16 '14)edit

I don't have CUDA, so that program will not work. I have an Intel GPU and I'm using Beignet, which is Intel's implementation of OpenCL. Can you please try your (first) program without CUDA and see how it works? NVIDIA should also have an implementation of OpenCL, so you should be able to use it.

chaanakya gravatar imagechaanakya (Jun 17 '14)edit

On my environment, I can select only GPU as OpenCL platform. So, I can not run my program without CUDA. And, there is the following explanation in Beignet's web page. Maybe, this explanation might be helpful for you.

  • Note about OpenCV support

I think that you can use Intel OpenCL SDK as other choices. By the way, you can get build infomation of OpenCV to call getBuildInformation(). This information is very useful to report your environment.

dandelion1124 gravatar imagedandelion1124 (Jun 17 '14)edit

Which platform do you use? I mean, IVB or HSW? If it is HSW, you need to apply a kernel patch to enable SLM and barrier support, you can find details in the README. If it is IVB, then you can try to disable the hang check firstly. Just as dandelion1124 pointed out, you can find detail instructions on the beignet's web page or in the README file.

gongzg gravatar imagegongzg (Jul 16 '14)edit

I don't see OCL anywhere under modules, in master branch. I used master branch to build 3.0.0, but OCL is missing. How you build and got OCL working?

hesh gravatar imagehesh (Jul 23 '14)edit
0

answered Jul 26 '15

Anna Lucia gravatar image

Have you tested other filter algorithms, are they still slower down? In my project the GaussianBlur is slower in GPU than CUP, but the other algorithms are more faster in GPU. Maybe the GaussianBlur is suitbale computing in CPU because of its theory.

Preview: (hide)

Question Tools

Stats

Asked: May 27 '14

Seen: 13,285 times

Last updated: Jul 26 '15