Ask Your Question

Revision history [back]

dnn with cuda on 4.20

my GPU is 1080 Ti, when i test the tensorflow model with python,deriving one image is about 150ms. Detect with opencv4.0 on CPU, the deriving speed is about 1300ms. I have compiled the 4.2 version on windows, but i found the speed of dnn deriving is not increasing compareing with version 4.00.

My cmake configure is below:

General configuration for OpenCV 4.2.0 =====================================

Version control: unknown

Extra modules: Location (extra): D:/sdk/opencv4.2/opencv_contrib-master/modules Version control (extra): unknown

Platform: Timestamp: 2019-12-27T03:20:13Z Host: Windows 6.1.7601 AMD64 CMake: 3.16.2 CMake generator: Visual Studio 14 2015 CMake build tool: C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe MSVC: 1900

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (16 files): + SSSE3 SSE4_1 SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (29 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

C/C++: Built as dynamic libs?: YES C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe (ver 19.0.24215.1) C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP4 /MD /O2 /Ob2 /DNDEBUG C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP4 /MDd /Zi /Ob0 /Od /RTC1 C Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP4 /MD /O2 /Ob2 /DNDEBUG C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /MP4 /MDd /Zi /Ob0 /Od /RTC1 Linker flags (Release): /machine:x64 /INCREMENTAL:NO Linker flags (Debug): /machine:x64 /debug /INCREMENTAL ccache: NO Precompiled headers: NO Extra dependencies: cudart_static.lib nppc.lib nppial.lib nppicc.lib nppicom.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib/x64 3rdparty dependencies:

OpenCV modules: To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc line_descriptor ml objdetect optflow phase_unwrapping photo plot python3 quality reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab world xfeatures2d ximgproc xobjdetect xphoto Disabled: - Disabled by dependency: - Unavailable: cnn_3dobj cvv freetype java js matlab ovis python2 python2 sfm viz Applications: tests perf_tests apps Documentation: NO Non-free algorithms: NO

Windows RT support: NO

GUI: Win32 UI: YES VTK support: NO

Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 2.0.2-62) WEBP: build (ver encoder: 0x020e) PNG: build (ver 1.6.37) TIFF: build (ver 42 - 4.0.10) JPEG 2000: build (ver 1.900.1) OpenEXR: build (ver 2.3.0) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: NO FFMPEG: YES (prebuilt binaries) avcodec: YES (58.54.100) avformat: YES (58.29.100) avutil: YES (56.31.100) swscale: YES (5.5.100) avresample: YES (4.0.0) GStreamer: NO DirectShow: YES Media Foundation: YES DXVA: NO

Parallel framework: Concurrency

Trace: YES (with Intel ITT)

Other third-party libraries: Intel IPP: 2019.0.0 Gold [2019.0.0] at: D:/sdk/opencv4.2/opencv/build64/3rdparty/ippicv/ippicv_win/icv Intel IPP IW: sources (2019.0.0) at: D:/sdk/opencv4.2/opencv/build64/3rdparty/ippicv/ippicv_win/iw Lapack: NO Eigen: NO Custom HAL: NO Protobuf: build (3.5.1)

NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS) NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75 NVIDIA PTX archs:

cuDNN: NO

OpenCL: YES (NVD3D11) Include path: D:/sdk/opencv4.2/opencv/sources/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python 3: Interpreter: C:/Program Files/Anaconda3/python.exe (ver 3.5.2) Libraries: C:/Program Files/Anaconda3/libs/python35.lib (ver 3.5.2) numpy: C:/Program Files/Anaconda3/lib/site-packages/numpy/core/include (ver 1.11.1) install path: C:/Program Files/Anaconda3/Lib/site-packages/cv2/python-3.5

Python (for build): C:/Program Files/Anaconda3/python.exe

Java:
ant: NO JNI: NO Java wrappers: NO Java tests: NO

Install to: D:/sdk/opencv4.2/opencv/build64/install

Configuring done Generating done