Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

3.1 samples CUDA, compiling with CUDA

Hello, I have just started using OpenCV 3.1 (not the contrib branch) and haven't really kept up with OpenCv for a year or more. Now I have built and installed with CUDA:

 Device 0: "GeForce GTX 950M"
 CUDA Driver Version / Runtime Version          7.50 / 7.50
 CUDA Capability Major/Minor version number:    5.0
 Total amount of global memory:                 4096 MBytes (4294836224 bytes)
 GPU Clock Speed:                               1.12 GHz
 Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
 Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       49152 bytes
 Total number of registers available per block: 65536
 Warp size:                                     32
 Maximum number of threads per block:           1024
 Maximum sizes of each dimension of a block:    1024 x 1024 x 64
 Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             512 bytes
 Concurrent copy and execution:                 Yes with 1 copy engine(s)
 Run time limit on kernels:                     Yes
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Concurrent kernel execution:                   Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support enabled:                No
 Device is using TCC driver mode:               No
 Device supports Unified Addressing (UVA):      Yes
 Device PCI Bus ID / PCI location ID:           1 / 0
 Compute Mode:
     Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) `
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.50, NumDevs = 1

Unfortunately, many error related to using the GPU occur:

Are there dependencies for the samples in /bin? (i.e. caltech images)

opencv_test_cudaarithm :

[----------] 8 tests from CUDA_ImgProc/Demosaicing
[ RUN      ] CUDA_ImgProc/Demosaicing.BayerBG2BGR/0
.../opencv/modules/cudaimgproc/test/test_color.cpp:2360: Failure
Value of: img.empty()
  Actual: true
Expected: false
Can't load input image
:
:

Similarly

./opencv_test_cudaobjdetect
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: 3.1.0
Build type: release
Parallel framework: pthreads
CPU features: 
OpenCL is disabled
[==========] Running 11 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from detect/CalTech
[ RUN      ] detect/CalTech.HOG/0
.../opencv/modules/cudaobjdetect/test/test_objdetect.cpp:236: Failure
Value of: img.empty()
  Actual: true
 Expected: false
:
:

I have had to sudo cp libippicv.a /usr/local/lib/ in order to make a simple cuda_test.cpp:

CFLAGS = `pkg-config --cflags opencv`
LIBS = `pkg-config --libs opencv`
INC = -I/usr/local/cuda-7.5/targets/x86_64-linux/include

% : %.cpp
    g++ $(INC) $(CFLAGS) $(LIBS) -o $@ $<

But this still resulted in a large number of errors:

: : cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference to ncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference tomemSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st)' /tmp/ccJkx6c5.o: In function NCVVector<HaarFeature64>::copySolid(NCVVector<HaarFeature64>&, CUstream_st*, unsigned long) const': cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x3d): undefined reference tocv::format(char const, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x82): undefined reference to cv::format(char const*, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x8e): undefined reference toncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x109): undefined reference to cv::format(char const*, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x164): undefined reference tocv::format(char const, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x170): undefined reference to ncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x1ec): undefined reference tocv::format(char const, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x24c): undefined reference to cv::format(char const*, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference toncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference to `memSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st*)' collect2: error: ld returned 1 exit status

That is why I tried testing the CUDA samples, but as explained, these don't appear to work themselves either. So my question is, is all this documented somehwere?

Where can I find more information? Do you have (a) specific solution(s)?

Oh, and is there a difference between the use of CUDA in Python and c++, my Python module does seem to work (but I have to check whether it uses CUDA)?