Ask Your Question
1

Are these functions accelerated by ARM NEON?

asked 2014-05-23 04:16:13 -0600

Leon gravatar image

I'm currently using OpenCV on an android based device which supports ARM NEON.The OpenCV version is 2.4.9. Functions such as cvCanny, cvDilate, cvResize, cvtColor are used, and I want to speed them up. What i want to know is that if these functions are already written in the form that can be accelerated by the NEON architecture. How can i know the origin code of these functions, and what other functions are accelareted by NEON? Is there any more suggestions on optimization? Thank you!

edit retag flag offensive close merge delete

Comments

1 answer

Sort by ยป oldest newest most voted
2

answered 2014-07-29 13:18:31 -0600

rwong gravatar image

As for ARM NEON acceleration, please refer to the answer given in open-source NEON optimizations.

To summarize, currently only a few functions have been accelerated for ARM NEON. Volunteering efforts and Github pull requests are welcome.

The locations of the source files are

  • cvCanny - modules\imgproc\src\canny.cpp
  • cvDilate - modules\imgproc\src\morph.cpp
  • cvResize - modules\imgproc\src\imgwarp.cpp
  • cvtColor - modules\imgproc\src\color.cpp

Currently, these accelerations are available:

  • Intel Performance Primitives (IPP)
  • NVIDIA CUDA
  • NVIDIA Tegra
  • x86 SIMD (SSE2 and up)

Source code for NVIDIA CUDA accelerated algorithms are in a separate directory:

  • modules\gpu\src

Source code for OpenCL accelerated algorithms are in a separate directory:

  • modules\ocl\src

To check for preprocessor directives for each type of acceleration:

  • IPP : check for HAVE_IPP and IPP_VERSION_MAJOR
  • CUDA : check for HAVE_CUDA. Also check for use of the cv::GpuMat matrix type, which is capable of copying data between CPU and GPU memory.
  • Tegra : check for HAVE_TEGRA_OPTIMIZATION
  • SSE2 : check for CV_SSE2, and then call checkHardwareSupport(CV_CPU_SSE2). If an instruction set higher than SSE2 is used (such as SSE3, SSSE3, SSE4.1, etc), a check must be performed on each of these, because the presence of higher instruction set may not imply the presence of all lower instruction sets.
edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2014-05-23 04:16:13 -0600

Seen: 9,210 times

Last updated: Jul 29 '14