Does OpenCV 3.4.x perform better(execution time) in release mode?
So I've built OpenCV 3.4.1 in Release mode with default flags. Does this make the build any different than if I built in Debug mode? I have a large codebase that utilizes OpenCV core libraries, and changing the build to release mode has not made much difference. I'm wondering If I've done something wrong or forgotten something during my build - perhaps some flag that should be set for enhanced performance.
Here is a log of my Release build config.
General configuration for OpenCV 3.4.1 =====================================
Version control: unknown
Platform:
Host: Linux 4.4.0-87-generic x86_64
CMake: 3.11.1
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (3 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (9 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
C/C++:
Built as dynamic libs?: YES
C++ Compiler: /usr/bin/c++ (ver 4.8.4)
C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release):
Linker flags (Debug):
ccache: NO
Precompiled headers: YES
Extra dependencies: dl m pthread rt
3rdparty dependencies:
OpenCV modules:
To be built: calib3d core dnn features2d flann highgui imgcodecs imgproc java_bindings_generator ml objdetect photo python2 python3 python_bindings_generator shape stitching superres ts video videoio videostab viz
Disabled: js world
Disabled by dependency: -
Unavailable: cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev java
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: NO
GUI:
GTK+: YES (ver 2.24.23)
GThread : YES (ver 2.40.2)
GtkGlExt: NO
VTK support: YES (ver 5.8.0)
Media I/O:
ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.8)
JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver )
WEBP: build (ver encoder: 0x020e)
PNG: /usr/lib ...
Actually it should have influences, because you literally remove all debug code from the build. So no more asserts, no more debug checks, ... However it highly depends on the code to say if it actually improves speed alot. Can you quantify your increase in numbers? Also install thread building blocks, TBB, which is better than pthread for optimizations.
My experience using the DNN module is that there is a very significant performance difference between debug and release on builds targeting Windows. For instance, executing TinyYolo on CPU, release mode builds seem to be 3x to 4x faster in processing a single frame. However, I see very little difference executing the same code targeting Android: debug and release performance is practically the same.
Hmm that could be due to the fact that Android is a mobile platform. I am unsure if mobile platforms handle debug and release concepts in the same way.
Besides that, if you are going to build an app for sales, the compressed binary in release will also be smaller than the debug one, and that still matters :)