As for ARM NEON acceleration, please refer to the answer given in open-source NEON optimizations.
To summarize, currently only a few functions have been accelerated for ARM NEON. Volunteering efforts and Github pull requests are welcome.
The locations of the source files are
- cvCanny - modules\imgproc\src\canny.cpp
- cvDilate - modules\imgproc\src\morph.cpp
- cvResize - modules\imgproc\src\imgwarp.cpp
- cvtColor - modules\imgproc\src\color.cpp
Currently, these accelerations are available:
- Intel Performance Primitives (IPP)
- NVIDIA CUDA
- NVIDIA Tegra
- x86 SIMD (SSE2 and up)
Source code for NVIDIA CUDA accelerated algorithms are in a separate directory:
Source code for OpenCL accelerated algorithms are in a separate directory:
To check for preprocessor directives for each type of acceleration:
- IPP : check for HAVE_IPP and IPP_VERSION_MAJOR
- CUDA : check for HAVE_CUDA. Also check for use of the
cv::GpuMat
matrix type, which is capable of copying data between CPU and GPU memory. - Tegra : check for HAVE_TEGRA_OPTIMIZATION
- SSE2 : check for CV_SSE2, and then call
checkHardwareSupport(CV_CPU_SSE2)
. If an instruction set higher than SSE2 is used (such as SSE3, SSSE3, SSE4.1, etc), a check must be performed on each of these, because the presence of higher instruction set may not imply the presence of all lower instruction sets.
Please see http://answers.opencv.org/question/17845/open-source-neon-optimizations/