Ask Your Question

Revision history [back]

I just did a test, with OpenCV3.0 however, and all CUDA module tests seem to work. I will do a new setup with 2.4 latest branch and get back to you.

I just did a test, with OpenCV3.0 however, and all CUDA module tests seem to work. I will do a new setup with 2.4 build test for both OpenCV2.4 latest branch and get back to you.OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.

  • For the 3.0 all tests run fine
  • For the 2.4.11 tests, it starts out fine but then crashes somewhere in the process with failed tests.

The errors are all the same

C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[  FAILED  ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)

No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.

I just did a build test for both OpenCV2.4 latest branch and OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.

  • For the 3.0 all tests run fine
  • For the 2.4.11 tests, it starts out fine but then crashes somewhere in the process with failed tests.

The errors are all the same

C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[  FAILED  ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)

No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.

More detailed card info

spu@TOBCAT:~/Documents/github/opencv_CUDA_2.4/build/bin$ ./opencv_test_gpu 
[----------]
[ GPU INFO ]    Run on OS Linux x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 2

Device 0: "Quadro K2000"
  CUDA Driver Version / Runtime Version          7.50 / 7.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2047 MBytes (2145927168 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Clock Speed:                               0.95 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           3 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2

*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 2

Device 1: "Quadro K2000"
  CUDA Driver Version / Runtime Version          7.50 / 7.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147287040 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Clock Speed:                               0.95 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           129 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2

I just did a build test for both OpenCV2.4 latest branch and OpenCV3 latest branch. I built the system with CUDA7.0 support on Linux Ubuntu 14.04.

  • For the 3.0 all tests run fine
  • For the 2.4.11 tests, it starts out fine but then crashes somewhere in the process with failed tests.

The errors are all the same

C++ exception with description "/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/src/filtering.cpp:368: error: (-217) NPP_CUDA_KERNEL_EXECUTION_ERROR [Code = -1000] in function operator()
" thrown in the test body.
[  FAILED  ] GPU_Filter/Blur.Accuracy/1, where GetParam() = (Quadro K2000, 128x128, 8UC1, KSize(3x3), Anchor([-1, -1]), sub matrix) (165 ms)

No idea yet what the reason is, will take a look at it. NVIDIA Quadro K2000 cards used here.

More detailed

  • deleted card infoinfo, not that usefull now -

UPDATE

Seems the first error that occurs, when piping the output to a file is

spu@TOBCAT:~/Documents/github/opencv_CUDA_2.4/build/bin$ ./opencv_test_gpu 
[----------]
[ GPU INFO RUN      ]    Run on OS Linux x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 2

Device 0: "Quadro K2000"
  CUDA Driver Version / Runtime Version          7.50 / 7.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2047 MBytes (2145927168 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Clock Speed:                               0.95 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           3 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2

*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 2

Device 1: "Quadro K2000"
  CUDA Driver Version / Runtime Version          7.50 / 7.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147287040 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Clock Speed:                               0.95 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           129 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.0, NumDevs = 2
GPU_ImgProc/Canny.Accuracy/0
/home/spu/Documents/github/opencv_CUDA_2.4/modules/gpu/test/test_imgproc.cpp:331: Failure
Value of: img.empty()
  Actual: true
Expected: false

This is not that bad because the test just states that it cannot find the testing data. All fine with that, however it is strange that it does not say what testing data it is expecting. A ton of similar errors like this one keep popping up.

Off course this makes that the accuracy testing on the same function fail also. Same goes for HarrisCorners and other functions. Then errors occur in the loading of classifiers.

So my biggest guess is that you need to specify the location of the test data, but no idea on how to do so.