You need to compile OpenCV from the sources with some flags such as TBB for best multithreading performances. Moreover you can set many flags, I sugget you to use CMake GUI and check it. If the time execution matters, I suggest you to compile in RELEASE mode and not in debug mode: doing so the resulting binary will be more compact and the execution time is just impressive comparing to the DEBUG mode. Activating CUDA is also a good choice. If you don't need debug, then I suggest to you this:
cmake -D WITH_TBB=ON -D WITH_OPENMP=ON -D WITH_IPP=ON -D CMAKE_BUILD_TYPE=RELEASE -D BUILD_EXAMPLES=OFF -D WITH_NVCUVID=ON -D WITH_CUDA=ON -D BUILD_DOCS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -D WITH_CSTRIPES=ON -D WITH_OPENCL=ON CMAKE_INSTALL_PREFIX=/usr/local/ ..
If you need debug or just want an idea, go here
P.S. = similar question that have the same link I gave here is this one
A lot of things in OpenCV are parallelized by default if you use, for example, the forEach member function. Many of the function implementations are parallized as well so it'd help us to describe what exactly you're trying to do.