1 | initial version |
I just did a test, with OpenCV3.0 however, and all CUDA module tests seem to work. I will do a new setup with 2.4 latest branch and get back to you.
2 | No.2 Revision |
I just did a test, with OpenCV3.0 however, and all CUDA module tests seem to work. I will do a new setup with 2.4 build test for both OpenCV2.4 latest branch and get back to you.OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.
The errors are all the same
C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[ FAILED ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)
No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.
3 | No.3 Revision |
I just did a build test for both OpenCV2.4 latest branch and OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.
The errors are all the same
C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[ FAILED ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)
No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.
More detailed card info
spu@TOBCAT:~/Documents/github/opencv_CUDA_2.4/build/bin$ ./opencv_test_gpu
[----------]
[ GPU INFO ] Run on OS Linux x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 2
Device 0: "Quadro K2000"
CUDA Driver Version / Runtime Version 7.50 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2047 MBytes (2145927168 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock Speed: 0.95 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 2
Device 1: "Quadro K2000"
CUDA Driver Version / Runtime Version 7.50 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock Speed: 0.95 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 129 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2
4 | No.4 Revision |
I just did a build test for both OpenCV2.4 latest branch and OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.
The errors are all the same
C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[ FAILED ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)
No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.
More detailed
UPDATE
Seems the first error that occurs, when piping the output to a file is
spu@TOBCAT:~/Documents/github/opencv_CUDA_2.4/build/bin$ ./opencv_test_gpu
[----------]
[ GPU INFO RUN ] Run on OS Linux x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 2
Device 0: "Quadro K2000"
CUDA Driver Version / Runtime Version 7.50 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2047 MBytes (2145927168 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock Speed: 0.95 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 2
Device 1: "Quadro K2000"
CUDA Driver Version / Runtime Version 7.50 / 7.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock Speed: 0.95 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 129 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2
GPU_ImgProc/Canny.Accuracy/0
/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/test/test_imgproc.cpp:331: Failure
Value of: img.empty()
Actual: true
Expected: false
This is not that bad because the test just states that it cannot find the testing data. All fine with that, however it is strange that it does not say what testing data it is expecting. A ton of similar errors like this one keep popping up.
Off course this makes that the accuracy testing on the same function fail also. Same goes for HarrisCorners and other functions. Then errors occur in the loading of classifiers.
So my biggest guess is that you need to specify the location of the test data, but no idea on how to do so.