copy-pasted opencv code slower than precompiled code
Hello,
for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).
At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.
Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:
time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms
Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.
Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where I compile the cuda code with nvcc
CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"
CUDA_DIR = "/usr/local/cuda-7.5/"
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v
INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/
QMAKE_LIBDIR += $$CUDA_DIR/lib64
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib
LIBS += -L/usr/local/cuda-7.5/lib64/ \
-lcuda \
-lcudart
CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')
cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda
Thanks for the help