Hi all,
when I use GPU code I sometimes get a strange behaviour that is reproducible (at least on my machine).
When I let run this Code
cudaStream_t
stream; stream;
cudaSafeCall( cudaStreamCreate( &stream )
); );
in a function nothing happens, everybody is happy and I get cudaSuccess from cudaStreamCreate.
As soon as I let run this Code
cudaStream_t
stream; stream;
cudaSafeCall( cudaStreamCreate( &stream )
); );
gpu::Stream
streamddd; streamddd;
the second line cudaStreamCreate() produces a cudaErrorUnknown. Note that I didn't reach the line that was newly included in the second example.
I debug built with OpenCV 2.4.9 using CUDA 4.2 on Visual Studio 2008 (32bit build). I also compiled OpenCV on my own as debug and release build. Both worked out of the box using CMake.
When I run the opencv_test_gpu of the OpenCV CMake generated Solution my graphics card is recognized correctly
[----------]
[ GPU INFO ] Run on OS Windows x32.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "Quadro K1100M"
CUDA Driver Version / Runtime Version 6.0 / 4.20
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock Speed: 0.71 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3
D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16
384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simul
taneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Vers
ion = 4.20, NumDevs = 1
Run tests on all supported devices
but it returns a lot of fails like
[ RUN ] GPU_ImgProc/CvtColor.GRAY2BGR/8
unknown file: error: C++ exception with description "D:\compiled\OpenCV\sources\
modules\dynamicuda\include\opencv2/dynamicuda/dynamicuda.hpp:1134: error: (-217)
unknown error in function CudaFuncTable::mallocPitch
" thrown in the test body.
[ FAILED ] GPU_ImgProc/CvtColor.GRAY2BGR/8, where GetParam() = (Quadro K1100M,
113x113, CV_16U, whole matrix) (3235 ms)
but not all are fails
[ RUN ] GPU_ImgProc/CvtColor.BGR5652BGR/2
[ OK ] GPU_ImgProc/CvtColor.BGR5652BGR/2 (1 ms)
What do I miss? What is wrong in my thinking? What does OpenCV do to my GPU?
If you need any further information to decide what the problem may be do not hesitate to ask.
Thanks in advance.
Cheers,
Willi