Ask Your Question

CARMA DevKit Tegra 3 NEON optimizations

asked 2013-01-20 21:48:59 -0600

this post is marked as community wiki

This post is a wiki. Anyone with karma >50 is welcome to improve it.


We are using the CARMA (CUDA on ARM) Development Kit (Tegra T30 Q7 module onboard with ARM Cortex A9, 1.3Ghz, ubuntu linux 3.1, gcc 4.6.3) for some experiments on performance benefits of NEON instructions. We've modified some OpenCV 2.4.2 routines with NEON intrinsics, but observe very little performance gains. We get much better gains with Samsung Exynos 4412 (Also ARM Cortex A9, 1.4Ghz, Android ndk gcc 4.6.x) with the same routines.

Is there a way for us to make use of the the Tegra 3 optimized OpenCV routines (binary pack) on ubuntu linux? Or are there any specific flags/methods for the tegra 3 that we are missing out on?

Any help is much appreciated!

Thanks, Gaurav

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2013-01-21 05:11:34 -0600

this post is marked as community wiki

This post is a wiki. Anyone with karma >50 is welcome to improve it.

Currently Android is the only platform where NEON optimizations are supported in OpenCV for Tegra. So, if you want to understand what OpenCV for Tegra may give you, you should create an Android application that uses OpenCV Manager. I understand that this is an overkill, but this is the only way. Alternatively you can give us the list of your bottlenecks, and we'll try to inform you about expected speedups.

There are plans to support OpenCV for Tegra on the CARMA platform. In a near future it should get the full opencv_gpu module, with a lot of CUDA optimizations. NEON code is also going to be enabled, but it may take several months. There are no guarantees, but possibly the next OpenCV 2.4.4 release will use both CUDA and NEON on CARMA.

Regarding your comparison with Samsung. Am I right that you compare default open-source and manually NEON-optimized OpenCV code? And there is speedup on Samsung+Android, but almost nothing on T30+ubuntu. If that is the case, you're right, it looks like an issue with compilation options. You can try the same code on any T30+Android device (Nexus 7, Asus Transformer, HTC One X, etc) and check if Tegra gives the same [relative] speedup as Samsung.

edit flag offensive delete link more

Question Tools



Asked: 2013-01-20 21:48:59 -0600

Seen: 837 times

Last updated: Jan 21 '13