Problem getting CUDA to work on OpenCV3 latest clone
So I am trying to get the CUDA interface to work on OpenCV, when I got the following error from the second I am calling upon a cuda related function:
OpenCV Error: Gpu API call (NCV Assertion Failed: cudaError_t=2, file=/home/spu/Documents/github/opencv_CUDA/modules/cudalegacy/src/NCV.cpp, line=363) in NCVDebugOutputHandler, file /home/spu/Documents/github/opencv_CUDA/modules/cudaobjdetect/src/cascadeclassifier.cpp, line 156
terminate called after throwing an instance of 'cv::Exception'
what(): /home/spu/Documents/github/opencv_CUDA/modules/cudaobjdetect/src/cascadeclassifier.cpp:156: error: (-217) NCV Assertion Failed: cudaError_t=2, file=/home/spu/Documents/github/opencv_CUDA/modules/cudalegacy/src/NCV.cpp, line=363 in function NCVDebugOutputHandler
Some digging brought me to an old topic in the OpenCV dev forum that suggested to upgrade my nvidia kernel. So that is what I did. I pulled the latest NVIDIA graphics driver for Ubuntu14.04 64 bits, then downloaded CUDA7.0 from the official website, configured my system and successfully built the OpenCV library.
However when running this code, the error still exists.
// Perform the GPU detector
Ptr<cuda::CascadeClassifier> cascade_gpu = cuda::CascadeClassifier::create("/home/spu/Documents/github/opencv_CUDA/data/haarcascades_cuda/haarcascade_fullbody.xml");
for(int scale = 1; scale<6; scale++){
Mat current;
resize(hist, current, Size(image.rows/scale, image.cols/scale));
// Start timing here
int64 t0 = getTickCount();
// We need to include the time for pushing and retrieving the data to and from the GPU
cuda::GpuMat image_gpu(current);
cuda::GpuMat objbuf;
cascade_gpu->detectMultiScale(image_gpu, objbuf);
std::vector<Rect> detections;
cascade_gpu->convert(objbuf, detections);
// End timing here and output
int64 t1 = getTickCount();
double secs = (t1-t0)/getTickFrequency();
cerr << "Measurement - division by " << scale << ": time = " << secs << " seconds"<< endl;
}
Anyone has a clue on how I can solve this?
System configuration:
- CUDA 7.0 with graphics driver 352.30
- Ubuntu 14.04 64 bit
- Graphic cards: 2 times NVIDIA QUADRO K2000
- OpenCV was built with all known compute capability support to avoid building it for the wrong one
EDIT: partially found the reason of crash
While processing a 8000x4000 pixel image on CPU has no memory limits, once you push data to your GPU, memory restrictions can occur. Since I am iteratively downscaling the image, and the error was triggered at the first run I went looking deeper into the error code.
cudaError_t=2 means cudaErrorMemoryAllocation (The API call failed because it was unable to allocate enough memory to perform the requested operation). So basically my GPU memory cannot contain the complete image. Looking deeper now on how to solve this.
EDIT3: forget about the above remark --> 7.0 installer is just standard for 32 AND 64 bit systems. Currently running CMAKE for ...