First time here? Check out the FAQ!

Ask Your Question
1

copy-pasted opencv code slower than precompiled code

asked Nov 24 '15

Gabriele D gravatar image

updated Nov 24 '15

Hello,

for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where I compile the cuda code with nvcc

CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_DIR/lib64 
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \
    -lcudart

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda

Thanks for the help

Preview: (hide)

1 answer

Sort by » oldest newest most voted
0

answered Nov 26 '15

Gabriele D gravatar image

Solved!

The compilation flags -g -G have to be removed in order to activate the default compilation flag -O3.

Now the time performances are roughly equal.

Preview: (hide)

Question Tools

2 followers

Stats

Asked: Nov 24 '15

Seen: 295 times

Last updated: Nov 26 '15