Ask Your Question

dnn with cuda on 4.20

asked 2019-12-27 23:04:48 -0500

kehl gravatar image

my GPU is 1080 Ti, when i test the tensorflow model with python,deriving one image is about 150ms. Detect with opencv4.0 on CPU, the deriving speed is about 1300ms. I have compiled the 4.2 version on windows, but i found the speed of dnn deriving is not increasing compareing with version 4.00.

My cmake configure is below:

General configuration for OpenCV 4.2.0 =====================================

Version control: unknown

Extra modules: Location (extra): D:/sdk/opencv4.2/opencv_contrib-master/modules Version control (extra): unknown

Platform: Timestamp: 2019-12-27T03:20:13Z Host: Windows 6.1.7601 AMD64 CMake: 3.16.2 CMake generator: Visual Studio 14 2015 CMake build tool: C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe MSVC: 1900

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (16 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (29 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

C/C++: Built as dynamic libs?: YES C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe (ver 19.0.24215.1) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP4 /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP4 /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP4 /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP4 /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: cudart_static.lib nppc.lib nppial.lib nppicc.lib nppicom.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib/x64 3rdparty dependencies:

OpenCV modules: To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc line_descriptor ml objdetect optflow phase_unwrapping photo plot python3 quality reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab world xfeatures2d ximgproc xobjdetect xphoto Disabled: - Disabled by dependency: - Unavailable: cnn_3dobj cvv freetype java js matlab ovis python2 python2 sfm viz Applications: tests perf_tests apps Documentation: NO Non-free algorithms: NO

Windows RT support: NO

GUI: Win32 ... (more)

edit retag flag offensive close merge delete


Build and usage instructions can be found here. Example code to use OpenCV DNN CUDA for YOLOv3. Note that you have to set the backend and the target to use CUDA (check YOLOv3 example); otherwise, it will use the CPU by default.

Yashas gravatar imageYashas ( 2019-12-29 04:17:58 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2019-12-28 02:23:32 -0500

You have not built with the DNN CUDA backend


Once you have built with cuDNN try setting the target to

edit flag offensive delete link more


1080 Ti doesn't have good half-precision throughput. DNN_TARGET_CUDA (which is the FP32 target) will be faster than DNN_TARGET_CUDA_FP16.

Yashas gravatar imageYashas ( 2019-12-29 04:17:04 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-12-27 23:04:48 -0500

Seen: 840 times

Last updated: Dec 28 '19