OpenCV 4 cmake switches for best runtime performance (Linux, C++, Python)
I've been looking through the long list of OpenCV 4 compile switches generated by cmake-gui. Trying to figure out the best options for fast runtime performance. That would include multithreading/multicore, math and matrix libs, GPU utilization, etc. I haven't found much in the way of guidance.
Switches include:
Atlas_,
BUILD_IPP_IW,
WITH_IPP,
BUILD_WITH_DYNAMIC_IPP,
WITH_ITT,
BUILD_TBB,
WITH_TBB,
various *BLAS switches,
various LAPACK switches,
various OPENCL switches,
MKL_,
OPENMP,
*PTHREADS,
WITH_CUDA,
whatever other Intel libraries,
[probably more that I'm overlooking]
Can anyone provide leads toward some kind of strategy for setting the compile switches? I'm using OpenCV with C++ and Python 3 under Linux, with Qt. I'll be using the Contrib libs, and CUDA (for DNN/neural net apps). I'll need to debug code that I'm writing but probably don't need debugging info for OpenCV itself.
No comments yet. OK, maybe that was too specific. How about any comments on optimizations or libraries in general? I'm just trying to get a start on which of the flags could be important.
OpenCV DNN does not use CUDA for the moment. There is a PR for that.
Use OpenBLAS or Intel MKL to accelerate the basic routines for Matrix algebra. Use Intel OpenVINO to accelerate DNN on Intel hardware. IPP is automacally integrated in OpenCV.
Your experience writing optimized code will matter most.
Thanks, Eduardo. Are OpenBLAS and MKL mutually exclusive? I was tempted to just compile with all the obvious options turned on. But I'd like to make sure this wouldn't create more problems that it would solve.
I appreciate your comment about writing optimized code. But I'll have time to worry about that as code is developed. I want to make sure OpenCV 4 is built correctly. In the past, the build process (for v3.x) was so tedious that I was reluctant to revisit later to attempt fixes to OpenCV itself (like adding in the Contrib libs).
Re OpenVino: My impression was that it installed its own version of OpenCV...? That could solve problems, but I'll need to make sure it includes the Contrib libs, etc.
This article may be useful: "Accelerating OpenCV 4 – build with CUDA 10.0, Intel MKL + TBB"