Problem getting CUDA to work on OpenCV3 latest clone

asked 2015-07-31 02:41:07 -0600

20029 ●16 ●82 ●207 http://stevenputtemans...

updated 2015-07-31 03:14:29 -0600

So I am trying to get the CUDA interface to work on OpenCV, when I got the following error from the second I am calling upon a cuda related function:

 OpenCV Error: Gpu API call (NCV Assertion Failed: cudaError_t=2, file=/home/spu/Documents/github/opencv_CUDA/modules/cudalegacy/src/NCV.cpp, line=363) in NCVDebugOutputHandler, file /home/spu/Documents/github/opencv_CUDA/modules/cudaobjdetect/src/cascadeclassifier.cpp, line 156
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/spu/Documents/github/opencv_CUDA/modules/cudaobjdetect/src/cascadeclassifier.cpp:156: error: (-217) NCV Assertion Failed: cudaError_t=2, file=/home/spu/Documents/github/opencv_CUDA/modules/cudalegacy/src/NCV.cpp, line=363 in function NCVDebugOutputHandler

Some digging brought me to an old topic in the OpenCV dev forum that suggested to upgrade my nvidia kernel. So that is what I did. I pulled the latest NVIDIA graphics driver for Ubuntu14.04 64 bits, then downloaded CUDA7.0 from the official website, configured my system and successfully built the OpenCV library.

However when running this code, the error still exists.

// Perform the GPU detector
Ptr<cuda::CascadeClassifier> cascade_gpu = cuda::CascadeClassifier::create("/home/spu/Documents/github/opencv_CUDA/data/haarcascades_cuda/haarcascade_fullbody.xml");
for(int scale = 1; scale<6; scale++){
    Mat current;
    resize(hist, current, Size(image.rows/scale, image.cols/scale));
    // Start timing here
    int64 t0 = getTickCount();
    // We need to include the time for pushing and retrieving the data to and from the GPU
    cuda::GpuMat image_gpu(current);
    cuda::GpuMat objbuf;
    cascade_gpu->detectMultiScale(image_gpu, objbuf);
    std::vector<Rect> detections;
    cascade_gpu->convert(objbuf, detections);
    // End timing here and output
    int64 t1 = getTickCount();
    double secs = (t1-t0)/getTickFrequency();
    cerr << "Measurement - division by " << scale << ": time = " << secs << " seconds"<< endl;
}

Anyone has a clue on how I can solve this?

System configuration:

CUDA 7.0 with graphics driver 352.30
Ubuntu 14.04 64 bit
Graphic cards: 2 times NVIDIA QUADRO K2000
OpenCV was built with all known compute capability support to avoid building it for the wrong one

EDIT: partially found the reason of crash

While processing a 8000x4000 pixel image on CPU has no memory limits, once you push data to your GPU, memory restrictions can occur. Since I am iteratively downscaling the image, and the error was triggered at the first run I went looking deeper into the error code.

cudaError_t=2 means cudaErrorMemoryAllocation (The API call failed because it was unable to allocate enough memory to perform the requested operation). So basically my GPU memory cannot contain the complete image. Looking deeper now on how to solve this.

**EDIT2: 32 bit versus 64 bit** I was downloading the CUDA7.0 interface [here](https://developer.nvidia.com/cuda-downloads) which made no difference between 32 bit and 64 bit systems as for CUDA6.5 which can be found [here](https://developer.nvidia.com/cuda-toolkit-65). I am going to see if the 6.5 64bit download does the trick.

EDIT3: forget about the above remark --> 7.0 installer is just standard for 32 AND 64 bit systems. Currently running CMAKE for ... (more)

edit retag flag offensive close merge delete

add a comment

answered 2015-07-31 03:50:56 -0600

StevenPuttemans

20029 ●16 ●82 ●207 http://stevenputtemans...

updated 2015-07-31 04:02:46 -0600

Okay, simply said, the detectMultiScale needs more than 1GB of memory (which is on the GPU) for a 8000*4000 image. Now simply going to look on how to define the maximum resolution possible.

Some tests showed that a 4000x4000 image is about the maximum size I can pass along with the GPU model in a single run.

edit flag offensive delete link

Comments

@berak @Guanta @mshabunin anyone of you might got a clue on how to calculate or estimate how much memory a cascade classifier in the GPU form would need?

StevenPuttemans ( 2015-07-31 04:04:53 -0600 )edit

no idea, sorry.

berak ( 2015-07-31 04:46:35 -0600 )edit

no idea either, guess it depends on the number and size of the hog-windoes (if you use HOG at all)

Guanta ( 2015-07-31 07:34:53 -0600 )edit

Nope it is a VJ model using HAAR wavelets :)

StevenPuttemans ( 2015-08-01 07:56:33 -0600 )edit

@StevenPuttemans can you please provide a link that I can follow to install OpenCV 3.1.0 with Cuda 7.5 enabled, mine is Ubuntu 16.04. I had so many issues while installing OpenCV with Cuda enabled in 14.04. Now I upgraded to 16.04 and trying to do the installation. Can you please provide a useful link for reference.

lm35 ( 2016-10-18 04:00:05 -0600 )edit

I do not have a link on how to install it, but on Ubuntu 16.04 I basically got it to work by

Downloading the most recent CUDA driver
Installing CUDA and the accompanied video driver (to make em compatible)
Download latest master branch
Run CMAKE and make sure it finds your installation

Good luck!

StevenPuttemans ( 2016-10-18 04:23:34 -0600 )edit

@StevenPuttemans Thankyou for your guidance. Installation was successful.

lm35 ( 2016-10-19 04:47:12 -0600 )edit

add a comment

Problem getting CUDA to work on OpenCV3 latest clone

1 answer

Comments

Links

Question Tools

Stats

Related questions

Problem getting CUDA to work on OpenCV3 latest clone edit

1 answer

Comments

Links

Question Tools

Stats

Related questions

Problem getting CUDA to work on OpenCV3 latest clone