Does OpenCV support the use of vector reciprocal on ARM NEON?
Some ARM NEON architecture do not have a native floating-point division instruction for vector data. Instead, the operation must be composed from a sequence of native instructions which together implement an iterative reciprocal estimate algorithm (most probably of Newton-Raphson method).
C++ compilers targeting ARM NEON should automatically generate such instructions for the scalar floating-point source code, or defer to a standard math library function call. However, if the library code specifically loops over each element performing its own non-trivial approximation, then it is dubious that C++ compilers (even with auto-vectorization enabled) would dare to defy the hard-coded logic.
It appears that unless the library specifically codes ARM NEON-specific matrix floating point divisions, it will fall back on to scalar C++ code, resulting in one math library function call per matrix element.
I see that OpenCV contains a very nifty vector-of-four elementwise division algorithm, but I doubt if it could beat the native implemented instructions.
Has anyone performed a benchmark on mobile ARM NEON processors to evaluate the performance of the native NEON vector reciprocal estimate operations?
Hey Rwong! I am very interested in this topic... would you like to exchange some e-mails? I have some questions I'd love to ask you :) Please email me @ [email protected]
@PedroBatista If your question is related to OpenCV you can post a question on this site. If your question is about source code sharing, unfortunately all of my work is done for my employer (due to the "work for hire" contract), therefore I cannot share any source code unless that sharing is explicitly permitted by and deemed beneficial to my employer.
Nop, the question was related with Neon programming and how it works, thanks anyway :)