dnn efficiency of mobilenet-ssd [closed]

asked 2018-09-18 14:38:42 -0500

yancey gravatar image

updated 2018-09-25 13:11:28 -0500

I am using opencv dnn to run a mobilenet-ssd 300x300 20 classes caffe model, on windows 7 and visual studio 2015. Anyone has any idea what efficiency should be expected on windows 7? According to this page it takes approximately 23 ms to do a single forward pass on Linux. But on my computer it takes about 180 ms to do a single forward pass, which seems too slow. My cpu is Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz.

After I added tbb, it is down to ~110ms, still far from the published ~30ms. Results of opencv_perf_dnn.exe for GoogLeNet and MobileNet SSD:

[ RUN      ] DNNTestNetwork.GoogLeNet/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 27 Mb
    Blobs: 44 Mb
Calculation complexity: 3.19044 GFlops
[ PERFSTAT ]    (samples=13   mean=83.71   median=83.19   min=82.46   stddev=1.63 (1.9%))
[       OK ] DNNTestNetwork.GoogLeNet/0 (1312 ms)
[ RUN      ] DNNTestNetwork.MobileNet_SSD_Caffe/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 23 Mb
    Blobs: 73 Mb
Calculation complexity: 18.1839 GFlops
[ PERFSTAT ]    (samples=13   mean=110.29   median=109.63   min=109.19   stddev=1.38 (1.3%))

Is it just my cpu too slow or was I doing anything wrong here?


General configuration for OpenCV 4.0.0-pre =====================================
  Version control:               3.4.3-322-g808ba552c

  Extra modules:
    Location (extra):            C:/OpenCV/opencv_contrib/modules
    Version control (extra):     4.0.0-alpha-9-gf9eaef9f-dirty

    Timestamp:                   2018-09-18T14:26:00Z
    Host:                        Windows 6.1.7601 AMD64
    CMake:                       3.12.1
    CMake generator:             Visual Studio 14 2015 Win64
    CMake build tool:            C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe
    MSVC:                        1900

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (4 files):          + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (6 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (10 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

    Built as dynamic libs?:      YES
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe  (ver 19.0.24215.1)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP8   /MD /O2 /Ob2 /DNDEBUG 
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP8   /MDd /Zi /Ob0 /Od /RTC1 
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi        /MP8    /MD /O2 /Ob2 /DNDEBUG 
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi        /MP8  /MDd /Zi /Ob0 /Od /RTC1 
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO 
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL 
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be ...
edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by yancey
close date 2018-09-28 10:13:22.099715


Please specify an OpenCV version and check build configuration (debug or release)

dkurt gravatar imagedkurt ( 2018-09-25 00:54:15 -0500 )edit

@dkurt This is a recent clone from github on master branch, so version 4.0.0. The perf test results are from release build.

yancey gravatar imageyancey ( 2018-09-25 09:07:32 -0500 )edit

@yancey, please provide cmake flags or an output from getBuildInformation.

dkurt gravatar imagedkurt ( 2018-09-25 11:14:54 -0500 )edit

@dkurt, buildInformation is added.

yancey gravatar imageyancey ( 2018-09-25 13:14:34 -0500 )edit

@yancey, May I ask you to estimate an efficiency of OpenCV with Intel's Inference Engine backend for this model?

dkurt gravatar imagedkurt ( 2018-09-26 00:32:29 -0500 )edit

@yancey, you may also try to run DNN_TARGET_OPENCL or DNN_TARGET_OPENCL_FP16 in both OpenCV backend and IE.

dkurt gravatar imagedkurt ( 2018-09-26 04:21:56 -0500 )edit

@dkurt, I didn't try openvino as I don't use windows 10 or visual studio 17. Opencl call fell back to cpu call because I have a Nvidia card. However, I did a perf_test on a better cpu, which matched the speed in the published benchmark. So there is nothing wrong with my windows build of opencv. I didn't realize cpu frequency had such a big impact on dnn speed. Thanks for the help.

yancey gravatar imageyancey ( 2018-09-28 10:12:41 -0500 )edit

@yancey, The results from page are for Linux (see configurations) so we compare different numbers. You may choose Intel graphics by environment variable OPENCV_OPENCL_DEVICE=Intel:GPU(see https://github.com/opencv/opencv/wiki...).

dkurt gravatar imagedkurt ( 2018-09-29 09:06:20 -0500 )edit