Ask Your Question

copy-pasted opencv code slower than precompiled code

asked 2015-11-24 07:41:26 -0500

Gabriele D gravatar image

updated 2015-11-24 07:42:34 -0500


for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where I compile the cuda code with nvcc

CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}

Thanks for the help

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2015-11-26 10:49:59 -0500

Gabriele D gravatar image


The compilation flags -g -G have to be removed in order to activate the default compilation flag -O3.

Now the time performances are roughly equal.

edit flag offensive delete link more

Question Tools



Asked: 2015-11-24 07:41:26 -0500

Seen: 163 times

Last updated: Nov 26 '15