Hello, I have just started using OpenCV 3.1 (not the contrib branch) and haven't really kept up with OpenCv for a year or more. Now I have built and installed with CUDA:
Device 0: "GeForce GTX 950M"
CUDA Driver Version / Runtime Version 7.50 / 7.50
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4096 MBytes (4294836224 bytes)
GPU Clock Speed: 1.12 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) `
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.50, NumDevs = 1
Unfortunately, many error related to using the GPU occur:
Are there dependencies for the samples in /bin? (i.e. caltech images)
opencv_test_cudaarithm :
[----------] 8 tests from CUDA_ImgProc/Demosaicing
[ RUN ] CUDA_ImgProc/Demosaicing.BayerBG2BGR/0
.../opencv/modules/cudaimgproc/test/test_color.cpp:2360: Failure
Value of: img.empty()
Actual: true
Expected: false
Can't load input image
:
:
Similarly
./opencv_test_cudaobjdetect
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: 3.1.0
Build type: release
Parallel framework: pthreads
CPU features:
OpenCL is disabled
[==========] Running 11 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from detect/CalTech
[ RUN ] detect/CalTech.HOG/0
.../opencv/modules/cudaobjdetect/test/test_objdetect.cpp:236: Failure
Value of: img.empty()
Actual: true
Expected: false
:
:
I have had to sudo cp libippicv.a /usr/local/lib/
in order to make a simple cuda_test.cpp:
CFLAGS = `pkg-config --cflags opencv`
LIBS = `pkg-config --libs opencv`
INC = -I/usr/local/cuda-7.5/targets/x86_64-linux/include
% : %.cpp
g++ $(INC) $(CFLAGS) $(LIBS) -o $@ $<
But this still resulted in a large number of errors:
:
:
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference to ncvDebugOutput(cv::String const&)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference to
memSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st)'
/tmp/ccJkx6c5.o: In function NCVVector<HaarFeature64>::copySolid(NCVVector<HaarFeature64>&, CUstream_st*, unsigned long) const':
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x3d): undefined reference to
cv::format(char const, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x82): undefined reference to cv::format(char const*, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x8e): undefined reference to
ncvDebugOutput(cv::String const&)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x109): undefined reference to cv::format(char const*, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x164): undefined reference to
cv::format(char const, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x170): undefined reference to ncvDebugOutput(cv::String const&)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x1ec): undefined reference to
cv::format(char const, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x24c): undefined reference to cv::format(char const*, ...)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference to
ncvDebugOutput(cv::String const&)'
cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference to `memSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st*)'
collect2: error: ld returned 1 exit status
That is why I tried testing the CUDA samples, but as explained, these don't appear to work themselves either. So my question is, is all this documented somehwere?
Where can I find more information? Do you have (a) specific solution(s)?
Oh, and is there a difference between the use of CUDA in Python and c++, my Python module does seem to work (but I have to check whether it uses CUDA)?