Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

copy-pasted opencv code slower than precompiled code

Hello,

for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where i compile the cuda code with nvcc

CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_DIR/lib64 
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \
    -lcudart

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda

Thanks for the help

copy-pasted opencv code slower than precompiled code

Hello,

for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where i I compile the cuda code with nvcc

CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_DIR/lib64 
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \
    -lcudart

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda

Thanks for the help