Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

OpenCV_Perf_GPU.exe makes GPU crashed

As documentation of OpenCV 2.4.9 suggested, I recompiled OpenCV to enable CUDA.

I found compiling was successful and all related unit test were passed by running opencv_test_gpu.exe. But when running opencv_perf_gpu.exe, it made my GPU crashed. I found following test was the one that made it happen. Of course, all tests after the test would fail. I wonder if this is something I can fix in my part or a bug in OpenCV. FYI, my system information follows.

The first failed test:

[ RUN      ] Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans.Denoising_NonLocalMeans/3
..\..\..\sources\modules\ts\src\ts_perf.cpp(1367): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  D:/mywork/dev/opencv-2.4.9/sources/modules/gpu/src/cuda/nlm.cu:154: error: (-217) unknown error

params    = (1280x720, CV_8U, BGR, 21, 5)
termination reason:  unhandled exception
bytesIn   =    2764800
bytesOut  =          0
samples   =          1 of 100
outliers  =          0
frequency =    3312880
min       =    6789452 = 2049.41ms
median    =    6789452 = 2049.41ms
gmean     =    6789452 = 2049.41ms
gstddev   = 0.00000000 = 0.00ms for 97% dispersion interval
mean      =    6789452 = 2049.41ms
stddev    =          0 = 0.00ms
[  FAILED  ] Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans.Denoising_NonLocalMeans/3, where GetParam() = (1280x720, CV_8U, BGR, 21, 5) (3043 ms)
[----------] 4 tests from Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans (156200 ms total)

System Info

[----------]
[   INFO   ]    Implementation variant: cuda.
[----------]
[----------]
[ GPU INFO ]    Run test suite on GeForce GTX 660 GPU.
[----------]
Time compensation is 0
OpenCV version: 2.4.9
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx
[----------]
[ GPU INFO ]    Run on OS Windows x32.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 1

Device 0: "GeForce GTX 660"
  CUDA Driver Version / Runtime Version          5.50 / 5.50
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 5) Multiprocessors x (192) CUDA Cores/MP:     960 CUDA Cores
  GPU Clock Speed:                               1.11 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 5.50, CUDA Runtime Version = 5.50, NumDevs = 1

OpenCV_Perf_GPU.exe makes GPU crashedcrashed (Resolved)

As documentation of OpenCV 2.4.9 suggested, I recompiled OpenCV to enable CUDA.

I found compiling was successful and all related unit test were passed by running opencv_test_gpu.exe. But when running opencv_perf_gpu.exe, it made my GPU crashed. I found following test was the one that made it happen. Of course, all tests after the test would fail. I wonder if this is something I can fix in my part or a bug in OpenCV. FYI, my system information follows.

UPDATE

I found my answer here. For some reason, For 32 bit Windows worked for me although my OS is 64 bit version. Here is what worked.

1. Exit all Windows based programs.
2. Click Start, type regedit in the Search box, and then double-click regedit.exe from the results above. If you are prompted for an administrator password or confirmation, type the password or provide confirmation.
3. Browse to and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
4. On the Edit menu, click New, and then select the following registry value from the drop-down menu specific to your version of Windows (32 bit, or 64 bit):

For 32 bit Windows

a. Select DWORD (32-bit) value.
b. Type TdrDelay as the Name and click Enter.
c. Double-click TdrDelay and add 8 for the Value data and click OK.

For 64 bit Windows 

a. Select QWORD (64-bit) value.
b. Type TdrDelay as the Name and click Enter.
c. Double-click TdrDelay and add 8 for the Value data and clickOK.
5. Close the registry editor and then restart your computer for the changes to take affect.

The first failed test:

[ RUN      ] Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans.Denoising_NonLocalMeans/3
..\..\..\sources\modules\ts\src\ts_perf.cpp(1367): error: Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  D:/mywork/dev/opencv-2.4.9/sources/modules/gpu/src/cuda/nlm.cu:154: error: (-217) unknown error

params    = (1280x720, CV_8U, BGR, 21, 5)
termination reason:  unhandled exception
bytesIn   =    2764800
bytesOut  =          0
samples   =          1 of 100
outliers  =          0
frequency =    3312880
min       =    6789452 = 2049.41ms
median    =    6789452 = 2049.41ms
gmean     =    6789452 = 2049.41ms
gstddev   = 0.00000000 = 0.00ms for 97% dispersion interval
mean      =    6789452 = 2049.41ms
stddev    =          0 = 0.00ms
[  FAILED  ] Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans.Denoising_NonLocalMeans/3, where GetParam() = (1280x720, CV_8U, BGR, 21, 5) (3043 ms)
[----------] 4 tests from Sz_Depth_Cn_WinSz_BlockSz_Denoising_NonLocalMeans (156200 ms total)

System Info

[----------]
[   INFO   ]    Implementation variant: cuda.
[----------]
[----------]
[ GPU INFO ]    Run test suite on GeForce GTX 660 GPU.
[----------]
Time compensation is 0
OpenCV version: 2.4.9
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx
[----------]
[ GPU INFO ]    Run on OS Windows x32.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 1

Device 0: "GeForce GTX 660"
  CUDA Driver Version / Runtime Version          5.50 / 5.50
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 5) Multiprocessors x (192) CUDA Cores/MP:     960 CUDA Cores
  GPU Clock Speed:                               1.11 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 5.50, CUDA Runtime Version = 5.50, NumDevs = 1