Ask Your Question

trie's profile - activity

2012-10-30 18:36:54 -0600 commented answer test NEON-optimized cv::threshold() on mobile device

That is, I happened to have a program (I'm currently trying to optimize with neon....), running on an arm, which uses threshold (at least for preprocessing). I came to this thread while searching the internet for "arm neon opencv".

Regarding gaining "only x2": On http://hilbert-space.de/?p=22 I have read, that using assembler instead of intrinsics might bring another performance boost, since the compiler didn't optimized the register-usage very well. I haven't looked at the assembler-output (yet, will probably do that in the next days...), but maybe it's a similar case here.

However I have very little knowledge of assembler (neither arm/neon, nor of the PC-world...), so that might not give much insight ;-)

2012-10-30 18:07:42 -0600 commented answer test NEON-optimized cv::threshold() on mobile device

normally the program processes images from a webcam. For this test (and other test of my own) I fed a video-file with a resolution of 800x600 into it. (The file was written with the opencv-video-writer as mjpeg. To limit the actual processing to "interesting" regions, in a first step there is a square-detector, loosely based on squares.cpp from the samples, but with adaptiveThreshold instead of canny (to work with differing light-conditions). That is the steps are: pyrDown pyrUp adaptiveThreshold(gray0, gray, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, kernel, athresh); dilate(gray, gray, Mat(), Point(-1, -1)); findContours(gray, contours...

2012-10-30 03:47:05 -0600 received badge  Teacher (source)
2012-10-30 01:15:57 -0600 received badge  Necromancer (source)
2012-10-29 16:24:23 -0600 answered a question test NEON-optimized cv::threshold() on mobile device

I have tried the patch on a beagleboard, running debian testing hardfloat (armhf) (based on opencv git commit 5777598).

First I had some errors, mixing signed and unsigned data:

/root/src/opencv/modules/imgproc/src/thresh.cpp: In function ‘void cv::thresh_8u(const cv::Mat&, cv::Mat&, uchar, uchar, int)’:
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:270:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:294:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:295:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:317:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
make[2]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/src/thresh.cpp.o] Fehler 1
make[2]: Leaving directory `/root/src/opencv/build'
make[1]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/all] Fehler 2
make[1]: Leaving directory `/root/src/opencv/build'
make: *** [all] Fehler 2

I then replaced "vget_low_s8" in those lines with "vget_low_u8", then it did compile.

I then tested with a program, which uses threshold for some of its work (the main-work is in other functions) and used oprofile on it: "opreport -l -g -D smart ../build/src/imgproc|grep -i thresh" without the patch:

1054      3.5127  thresh.cpp:794              imgproc                  cv::adaptiveThreshold(cv::_InputArray const&, cv::_OutputArray const&, double, int, int, int, double)
456       1.5197  thresh.cpp:677              imgproc                  cv::ThresholdRunner::operator()(cv::Range const&) const
3         0.0100  thresh.cpp:712              imgproc                  cv::threshold(cv::_InputArray const&, cv::_OutputArray const&, double ...
(more)