Why self compiled Opencv is slower than apt-get package?

asked 2017-10-13 20:47:08 -0500

ossyaritoori gravatar image

updated 2017-10-16 15:50:14 -0500

Hi community.

I have 2 opencv package: one is self-compiled opencv3.1 with cuda and the other is installed using sudo apt-get install ros-kinetic-opencv3.

When I tried the same program on different packages, it appears my self compiled package is much slower than that in ROS.

Ex> Template matching with ORB

My opencv3.1        : 150 - 200 ms per frame 
ROS opencv3.2.0-dev : 40~50ms per frame

This is tested in python, but I also find my c++ codes have a similar result.

Here shows getBuildInformation() result in each version.

1.Self compiled opencv

General configuration for OpenCV 3.1.0 =====================================
  Version control:               3.1.0-3-g50b7dfd-dirty

    Host:                        Linux 3.10.96-tegra aarch64
    CMake:                       3.5.1
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RelWithDebugInfo

    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 5.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wno-narrowing -Wno-delete-non-virtual-dtor -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wno-narrowing -Wno-delete-non-virtual-dtor -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wno-narrowing -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wno-narrowing -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):
    Linker flags (Debug):
    Precompiled headers:         NO
    Extra dependencies:          Qt5::Test Qt5::Concurrent Qt5::OpenGL /usr/lib/aarch64-linux-gnu/libwebp.so /usr/lib/aarch64-linux-gnu/libpng.so /usr/lib/aarch64-linux-gnu/libtiff.so /usr/lib/aarch64-linux-gnu/libjasper.so /usr/lib/aarch64-linux-gnu/libjpeg.so v4l1 v4l2 avcodec-ffmpeg avformat-ffmpeg avutil-ffmpeg swscale-ffmpeg /usr/lib/aarch64-linux-gnu/libbz2.so Qt5::Core Qt5::Gui Qt5::Widgets /usr/lib/aarch64-linux-gnu/hdf5/serial/lib/libhdf5.so /usr/lib/aarch64-linux-gnu/libpthread.so /usr/lib/aarch64-linux-gnu/libsz.so /usr/lib/aarch64-linux-gnu/libz.so /usr/lib/aarch64-linux-gnu/libdl.so /usr/lib/aarch64-linux-gnu/libm.so correspondence multiview numeric glog gflags dl m pthread rt /usr/lib/aarch64-linux-gnu/libGLU.so /usr/lib/aarch64-linux-gnu/libGL.so tbb atomic cudart nppc nppi npps cufft -L/usr/local/cuda-8.0/lib64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 cudev core cudaarithm flann hdf imgproc ml reg surface_matching video cudabgsegm cudafilters cudaimgproc cudawarping dnn fuzzy imgcodecs photo shape videoio cudacodec highgui objdetect plot ts xobjdetect xphoto bgsegm bioinspired dpm face features2d line_descriptor saliency text calib3d ccalib cudafeatures2d cudalegacy cudaobjdetect cudaoptflow cudastereo cvv datasets rgbd stereo structured_light superres tracking videostab xfeatures2d ximgproc aruco optflow sfm stitching python2
    Disabled:                    world contrib_world
    Disabled by dependency:      -
    Unavailable:                 java python3 viz matlab

    QT 5.x:                      YES (ver 5.5.1)
    QT OpenGL ...
edit retag flag offensive close merge delete


What flags did you compile it with? Did you make sure to include all the optimizations your processor can use?

Tetragramm gravatar imageTetragramm ( 2017-10-14 16:58:37 -0500 )edit

The answer should be found by calling

std::cout << cv::getBuildInformation();

(or the python equivalent) for both builds and compare the output.

tomasth gravatar imagetomasth ( 2017-10-14 17:00:51 -0500 )edit


ossyaritoori gravatar imageossyaritoori ( 2017-10-14 19:06:37 -0500 )edit

Actually, tomasth is correct. Can you edit your question with the getBuildInformation() from both versions? There should be some obvious differences.

Also, just to check, is your processor an ARM processor? Or is it Intel or AMD?

Tetragramm gravatar imageTetragramm ( 2017-10-15 14:44:23 -0500 )edit

Hi. I added my getBuildInformation result. And my machine is 64bit arm.

ossyaritoori gravatar imageossyaritoori ( 2017-10-16 15:52:55 -0500 )edit

Two things.

I notice you have OpenCL disabled. That could do it.

Secondly, they use the NVIDIA HAL called carotene. You should see an option in the WITH section of your cmake named WITH_CAROTENE. Give that a try.

Tetragramm gravatar imageTetragramm ( 2017-10-16 17:56:10 -0500 )edit