ARM core Matrix Mult Ops vs DSP core?

asked 2018-01-21 20:59:51 -0600

miner_tom gravatar image

Hello to the community,

Longer question is below but the shorter question is: I was looking through the OpenCV documentation and I was not able to find which SOC's or micro-controllers, used in embedded systems, are best suited to the mathematical operations (matrix mult and FFT) in image processing and general DSP. Is there such information?

Now the back story.

I am an older guy who started out his EE career working in the image processing industry, where image manipulation was done with multiple 3x3 convolutions (it took 9 home made MACs, multiplier accumulators). Then, as hardware became increasingly dense and fast, the FFT became usable and integer multiplication went out of fashion while floating point multiplication was in vogue.

I left that part of the industry over 30 years ago. Coming back into it as I am working on a development board for using and optimizing image processing tasks.

Just started looking at OpenCV. I realize that OpenCV is an attempt to create an abstraction layer between the operating system and the user application but when I look at ARM core's, considering embedded systems, I make the assumption (I could be wrong) that they are not really equipped for doing complex MULT operations, and primarily do integer math. In my mind, this limitation would necessitate large execution times for even simple matrix multiplication operations. Just thinking of comparing multiple frames of video in real time for motion detection, ARM cores seem underpowered.

I have seen SOC's such as from TI like the AM57x family, which contain a DSP core as well as the ARM core. I would imagine that these SOC's have superior ability to do DSP/Image processing functions. The difference must be stunning.

Is it that these days, most DSP microcontrollers also include an ARM core for running an operating system such as a linux derivitive? Such was not the case in "my day" where the venerable Motorola 56000 series and the TI TMS320 series processors did not contain ARM cores because they had not been developed yet (yes, I actually wrote programs in 56002 assembly... but that was ages ago).

Thank You for reading this. Tom

edit retag flag offensive close merge delete


Disclaimer, embedded systems are not my topic.

Depending of the TDP (thermal design power), ARM CPU's can go from cheap Raspberry PI zero to expensive CPU as found in high end smartphone.

Specialized processor like DSP can help to improve the performance of the vision task but should require some extra knowledge in my opinion.

To get an idea of what can be done on ARM architecture, you can have a look at the ARM Compute Library. To get an idea of performance of floating point operations, you can check dgemm (double generalized matrix multiplication) benchmarks, for instance.

Eduardo gravatar imageEduardo ( 2018-01-22 12:13:16 -0600 )edit

While you can directly build OpenCV to target ARM architecture, I think that deporting some operations on the DSP should require additional and specific programming but I may be wrong.

I would try the Compute Library as it is specially made for ARM architecture but I don't know how easy it is to use.

You can also check on YouTube what can be done with OpenCV and a Raspberry PI.

Eduardo gravatar imageEduardo ( 2018-01-22 12:19:58 -0600 )edit

OpenCV is primarily written to be compiled and run on general purpose (GP) processors (GPP).

DSPs may offload computation from the GPP, but are rarely faster with GP code.

DSPs have limited GP ability, unique memory and program load model, unique synchronizing operations and memory model, their own toolchain, and are limited to unique, usually embedded industry parts. DSPs are employed to solve specific application needs on unique and proprietary DSP/GPP combinations. As a result, DSPs don't have much of an open source ecosystem, and stable code is not available.

Exceptions that prove the rule are large open communities adding support to OpenCV: NVidia CUDA algorithm subset, Intel technologies like TBB and IPP, and generalized SIMD coprocessor support (NEON, Altivec, MMX, SSE).

opalmirror gravatar imageopalmirror ( 2018-01-31 14:04:16 -0600 )edit