Ask Your Question

Gabriele D's profile - activity

2017-03-14 01:47:53 -0600 received badge  Popular Question (source)
2017-03-14 01:47:53 -0600 received badge  Notable Question (source)
2017-03-14 01:47:53 -0600 received badge  Famous Question (source)
2015-11-26 10:50:22 -0600 received badge  Scholar (source)
2015-11-26 10:49:59 -0600 answered a question copy-pasted opencv code slower than precompiled code

Solved!

The compilation flags -g -G have to be removed in order to activate the default compilation flag -O3.

Now the time performances are roughly equal.

2015-11-25 03:59:08 -0600 received badge  Student (source)
2015-11-24 07:42:34 -0600 received badge  Editor (source)
2015-11-24 07:41:26 -0600 asked a question copy-pasted opencv code slower than precompiled code

Hello,

for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where I compile the cuda code with nvcc

CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_DIR/lib64 
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \
    -lcudart

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda

Thanks for the help

2015-11-17 10:39:59 -0600 commented question CUDA HaarCascade stream assertion

I have the same problem in cv::cuda::HOG::compute. I suspect that the multi-stream functionality is still not supported by openCV libraries.

2015-10-27 02:28:25 -0600 received badge  Enthusiast
2015-10-26 05:00:42 -0600 asked a question Segfault in multithreaded cuda::HOG::compute calls

Hello, I'm experiencing a strange issue in opencv cuda module using the function cv::cuda::HOG::compute in a multi-threaded application.

THE ARCHITECTURE: I have a main() function which launches 4 independent threads. In each thread a set of images is analysed with HOG features extraction technique + SVM prediction in order to detect some specific features. In more details, in each image some ROIs are extracted and then analysed with cuda::HOG::compute method.

I'm using opencv-3.0.0 with cuda library 7.5, on a GPU nvidia GeForce GTX 970.

THE CODE: In the following the relevant GPU-code of HOG + SVM analysis which is used to analyse each ROI:

 // Extracting (256x256 pixel) roi square
 Mat ROI = sampleImage( Rect( Point(X1, Y1), Point(X2, Y2) ) );

  // HOG feature extraction
  cuda::GpuMat cudaROI(ROI);
  cuda::cvtColor(cudaROI, cudaROI, CV_BGR2GRAY, 1);
  cuda::GpuMat descriptorsValuesGpu;
  Ptr<cuda::HOG> hog = cuda::HOG::create(Size(256,256), Size(16,16), Size(8,8), Size(8,8), 9);
  hog->compute( cudaROI, descriptorsValuesGpu );  // ***  crash here in multi-thread   ***

  // svm prediction
  vector<float> descriptorsValues;
  descriptorsValues.resize(descriptorsValuesGpu.cols);
  descriptorsValuesGpu.row(0).download(Mat(descriptorsValues).reshape(1,1));
  previsionScore = svm_loc->predict(descriptorsValues, noArray(), 1);

As said, this code is repeated more times independently in each thread.

In the following link there is a small running Qt project that encapsulate the core of my software (analysing in this case more times the same image) In the first lines of main() function the number of threads to be launched, the size of the set of images to be analysed in each thread, and the boolean switch from GPU to CPU can be set. Qt sample

THE ERROR: If I launch the code with one single thread there are no errors and the analysis is correctly performed. But when I launch the code with 2 or more threads I get occasionally (about 2 times over 3) the following error:

OpenCV Error: Gpu API call (an illegal memory access was encountered) in call, file /home/figaro/opencv-3.0.0/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp, line 273

OpenCV Error: Gpu API call (an illegal memory access was encountered) in compute_hists, file /home/figaro/opencv-3.0.0/modules/cudaobjdetect/src/cuda/hog.cu, line 222

OpenCV Error: Gpu API call (an illegal memory access was encountered) in set_up_constants, file /home/figaro/opencv-3.0.0/modules/cudaobjdetect/src/cuda/hog.cu, line 93

OpenCV Error: Gpu API call (an illegal memory access was encountered) in upload, file /home/figaro/opencv-3.0.0/modules/core/src/cuda/gpu_mat.cu, line 179

terminate called after throwing an instance of 'QtConcurrent::UnhandledException'
what():  std::exception

The number of errors is equal to the number of threads launched. The functions returning error change every time, except for compute_hists, so I suppose that the problem is there. When the code doesn't return error the analysis is performed correctly, independently from the size of image, the number of threads, or the size of the image sample that ... (more)

2015-10-22 04:03:30 -0600 commented question segfault with multithreaded gpu calls

@stfn: Hello. Did you find a solution to your problem? I am facing the same occasional error with opencv 3.0.0 with cuda::HOG (in compute() function) in a multi-threading software. Thanks.