Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

dnn efficiency of mobilenet-ssd

I am using opencv dnn to run a mobilenet-ssd 300x300 20 classes caffe model, on windows 7 and visual studio 2015. Anyone has any idea what efficiency should be expected on windows 7? According to this page it takes approximately 23 ms to do a single forward pass on Linux. But on my computer it takes about 180 ms to do a single forward pass, which seems too slow. My cpu is Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz.

During my building of opencv, opencl was disabled. MKL was provided:

  Other third-party libraries:
Intel IPP:                   2017.0.3 [2017.0.3]
       at:                   C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippicv_win
Intel IPP IW:                sources (2017.0.3)
          at:                C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippiw_win
Lapack:                      YES (C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_intel_lp64.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_sequential.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_core.lib)

But it seemed MKL did not accelerate dnn at all in my test.

From this link, people tried to use MKL to replace cv:: gemm in dnn. It was fixed in opencv_contrib. After dnn being promoted to the main repository, the source code seems changed a lot. So does the current opencv support using MKL in dnn? If not, does anyone know how to change the current source code to use MKL to accelerate dnn, as tricks in the link? Thanks.

dnn efficiency of mobilenet-ssd

I am using opencv dnn to run a mobilenet-ssd 300x300 20 classes caffe model, on windows 7 and visual studio 2015. Anyone has any idea what efficiency should be expected on windows 7? According to this page it takes approximately 23 ms to do a single forward pass on Linux. But on my computer it takes about 180 ms to do a single forward pass, which seems too slow. My cpu is Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz.

During my building After I added tbb, it is down to ~110ms, still far from the published ~30ms. Results of opencv, opencl was disabled. MKL was provided:opencv_perf_dnn.exe for GoogLeNet and MobileNet SSD:

  Other third-party libraries:
Intel IPP:                   2017.0.3 [2017.0.3]
       at:                   C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippicv_win
Intel IPP IW:                sources (2017.0.3)
          at:                C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippiw_win
Lapack:                      YES (C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_intel_lp64.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_sequential.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_core.lib)
[ RUN      ] DNNTestNetwork.GoogLeNet/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 27 Mb
    Blobs: 44 Mb
Calculation complexity: 3.19044 GFlops
[ PERFSTAT ]    (samples=13   mean=83.71   median=83.19   min=82.46   stddev=1.63 (1.9%))
[       OK ] DNNTestNetwork.GoogLeNet/0 (1312 ms)
[ RUN      ] DNNTestNetwork.MobileNet_SSD_Caffe/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 23 Mb
    Blobs: 73 Mb
Calculation complexity: 18.1839 GFlops
[ PERFSTAT ]    (samples=13   mean=110.29   median=109.63   min=109.19   stddev=1.38 (1.3%))

But it seemed MKL did not accelerate dnn at all in Is it just my test. cpu too slow or was I doing anything wrong here?

From this link, people tried to use MKL to replace cv:: gemm in dnn. It was fixed in opencv_contrib. After dnn being promoted to the main repository, the source code seems changed a lot. So does the current opencv support using MKL in dnn? If not, does anyone know how to change the current source code to use MKL to accelerate dnn, as tricks in the link? Thanks.

dnn efficiency of mobilenet-ssd

I am using opencv dnn to run a mobilenet-ssd 300x300 20 classes caffe model, on windows 7 and visual studio 2015. Anyone has any idea what efficiency should be expected on windows 7? According to this page it takes approximately 23 ms to do a single forward pass on Linux. But on my computer it takes about 180 ms to do a single forward pass, which seems too slow. My cpu is Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz.

After I added tbb, it is down to ~110ms, still far from the published ~30ms. Results of opencv_perf_dnn.exe for GoogLeNet and MobileNet SSD:

[ RUN      ] DNNTestNetwork.GoogLeNet/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 27 Mb
    Blobs: 44 Mb
Calculation complexity: 3.19044 GFlops
[ PERFSTAT ]    (samples=13   mean=83.71   median=83.19   min=82.46   stddev=1.63 (1.9%))
[       OK ] DNNTestNetwork.GoogLeNet/0 (1312 ms)
[ RUN      ] DNNTestNetwork.MobileNet_SSD_Caffe/0, where GetParam() = OCV/CPU
Memory consumption:
    Weights(parameters): 23 Mb
    Blobs: 73 Mb
Calculation complexity: 18.1839 GFlops
[ PERFSTAT ]    (samples=13   mean=110.29   median=109.63   min=109.19   stddev=1.38 (1.3%))

Is it just my cpu too slow or was I doing anything wrong here?

buildInformation:

General configuration for OpenCV 4.0.0-pre =====================================
  Version control:               3.4.3-322-g808ba552c

  Extra modules:
    Location (extra):            C:/OpenCV/opencv_contrib/modules
    Version control (extra):     4.0.0-alpha-9-gf9eaef9f-dirty

  Platform:
    Timestamp:                   2018-09-18T14:26:00Z
    Host:                        Windows 6.1.7601 AMD64
    CMake:                       3.12.1
    CMake generator:             Visual Studio 14 2015 Win64
    CMake build tool:            C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe
    MSVC:                        1900

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (4 files):          + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (6 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (10 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

  C/C++:
    Built as dynamic libs?:      YES
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe  (ver 19.0.24215.1)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP8   /MD /O2 /Ob2 /DNDEBUG 
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi      /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP8   /MDd /Zi /Ob0 /Od /RTC1 
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi        /MP8    /MD /O2 /Ob2 /DNDEBUG 
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi        /MP8  /MDd /Zi /Ob0 /Od /RTC1 
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO 
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL 
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core datasets dnn dnn_objdetect dpm face features2d flann fuzzy hfs highgui img_hash imgcodecs imgproc line_descriptor ml objdetect phase_unwrapping photo plot reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab
    Disabled:                    java_bindings_generator js python3 python_bindings_generator world xfeatures2d ximgproc xobjdetect xphoto
    Disabled by dependency:      optflow
    Unavailable:                 cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv freetype hdf java matlab ovis python2 sfm viz
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI: 
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O: 
    ZLib:                        build (ver 1.2.11)
    JPEG:                        build-libjpeg-turbo (ver 1.5.3-62)
    WEBP:                        build (ver encoder: 0x020e)
    PNG:                         build (ver 1.6.34)
    TIFF:                        build (ver 42 - 4.0.9)
    JPEG 2000:                   build (ver 1.900.1)
    OpenEXR:                     build (ver 1.7.1)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    Video for Windows:           YES
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (ver 57.107.100)
      avformat:                  YES (ver 57.83.100)
      avutil:                    YES (ver 55.78.100)
      swscale:                   YES (ver 4.8.100)
      avresample:                YES (ver 3.7.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES

  Parallel framework:            TBB (ver 2018.0 interface 10004)

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2017.0.3 [2017.0.3]
           at:                   C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippicv_win
    Intel IPP IW:                sources (2017.0.3)
              at:                C:/OpenCV/opencv_build_4.0.0/3rdparty/ippicv/ippiw_win
    Lapack:                      YES (C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_intel_lp64.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_sequential.lib C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64/mkl_core.lib)
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)

  Python (for build):            C:/Python27/python.exe

  Install to:                    C:/OpenCV/4.0.0
-----------------------------------------------------------------