Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

As for ARM NEON acceleration, please refer to the answer given in open-source NEON optimizations.

To summarize, currently only a few functions have been accelerated for ARM NEON. Volunteering efforts and Github pull requests are welcome.

The locations of the source files are

  • cvCanny - modules\imgproc\src\canny.cpp
  • cvDilate - modules\imgproc\src\morph.cpp
  • cvResize - modules\imgproc\src\imgwarp.cpp
  • cvtColor - modules\imgproc\src\color.cpp

Currently, these accelerations are available:

  • Intel Performance Primitives (IPP)
  • NVIDIA CUDA
  • NVIDIA Tegra
  • x86 SIMD (SSE2 and up)

Source code for NVIDIA CUDA accelerated algorithms are in a separate directory:

  • modules\gpu\src

Source code for OpenCL accelerated algorithms are in a separate directory:

  • modules\ocl\src

To check for preprocessor directives for each type of acceleration:

  • IPP : check for HAVE_IPP and IPP_VERSION_MAJOR
  • CUDA : check for HAVE_CUDA. Also check for use of the cv::GpuMat matrix type, which is capable of copying data between CPU and GPU memory.
  • Tegra : check for HAVE_TEGRA_OPTIMIZATION
  • SSE2 : check for CV_SSE2, and then call checkHardwareSupport(CV_CPU_SSE2). If an instruction set higher than SSE2 is used (such as SSE3, SSSE3, SSE4.1, etc), a check must be performed on each of these, because the presence of higher instruction set may not imply the presence of all lower instruction sets.