Ask Your Question

Why huge performance gap between opencv c++ native and java code on Android?

asked 2018-06-07 02:59:12 -0500

o5jeff gravatar image

updated 2018-06-07 05:45:39 -0500

Hi guys, just wonder if anybody have hit similar case?

The strange thing is: the android java code, e.g., GaussianBlur() is stably 4~5x fast, comparing to calling the same function in c++ native code on android.

For a 1920x1080 image, java needs around 40~50ms in HTC820mu (snapdragon 615, & HTC one M9 (snapdragon 810, ). The native c++ code needs 200+ ms.

For the same image, iMac with 3GHz i5 processor and opencv with SSE enabled, running the same op (c++ code) costs around ~15 ms.

I thought the android java performance number is reasonable (1/2 ~ 1/3 of that of iMac) considering the float point performance gap between low-freq ARM and high-freq x86-64. However I dont understand why the native c++ API calling is so slow since the java API is implemented with native c/c++ method (GaussianBlur_0(...) - which I failed to locate it in opencv source ).

BTW: I have tried both the OpenCV4Android 3.4.1 and the same version built by myself. Both of them have NEON enabled, VFP3 disabled and use softfp-abi.

Any ideas? Thanks much!

edit retag flag offensive close merge delete


the code snippet as following: 1. java code

           try {
                Mat rgba  = Imgcodecs.imread(_path);
                Mat rgba_clone = rgba.clone();
                long  start = System.currentTimeMillis();
                Imgproc.GaussianBlur(rgba_clone, rgba_clone, new Size(3, 3), 9);
                long elapsedTimeMillis = System.currentTimeMillis()-start;
                Log.d(TAG, "java GaussianBlur costs " + elapsedTimeMillis + " ms.");

                long rgba_addr = rgba.getNativeObjAddr();
            } catch (Exception e) {
o5jeff gravatar imageo5jeff ( 2018-06-07 03:38:45 -0500 )edit

native code:

cv::Mat&  rgba_ = (cv::Mat)rgba;
cv::Mat   rgba_clone = rgba_.clone();
LOGD("image size -> w: %d, h: %d", rgba_.cols, rgba_.rows);
int n_iter = 3;
timekeeper tm;
std::chrono::milliseconds duration;
for (int i = 0; i < n_iter; i ++)
    cv::GaussianBlur(rgba_clone, rgba_clone, cv::Size(9, 9), 3);
LOGD("GaussianBlur cost: %ld ms.", (long)duration.count() / n_iter);

o5jeff gravatar imageo5jeff ( 2018-06-07 03:40:21 -0500 )edit

your java code measures "wall time", while the c++ code uses something based on cpu clock, so you compare apples to pears.

use cv::getTickCount() in both java/c++ code, and report back, please.

berak gravatar imageberak ( 2018-06-07 07:01:43 -0500 )edit

Thanks. I will try it.

o5jeff gravatar imageo5jeff ( 2018-06-07 19:27:32 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2018-06-07 15:00:12 -0500

rwong gravatar image

updated 2018-06-07 15:02:48 -0500

Your Java code:

Imgproc.GaussianBlur(rgba_clone, rgba_clone, new Size(3, 3), 9);

Your native C++ code:

cv::GaussianBlur(rgba_clone, rgba_clone, cv::Size(9, 9), 3);

The execution time of Gaussian blur is mostly determined by the kernel size (proportional to its linear size). The execution time is not determined by sigma. Increasing kernel size from 3x3 to 9x9 will at least triple the execution time.

Using a kernel size that is inappropriate for the given sigma result in truncation of the smoothing function (i.e. numerical inaccuracy and blocky artifacts), but for some applications it might be acceptable.

Unless you need numerically accurate large-radius blurring, it would be faster to pyramid-downsample, blur with a smaller radius, and then pyramid-upsample.

edit flag offensive delete link more


It really makes a lot of sense. Really shame not to notice that. I will try asap and update then.

o5jeff gravatar imageo5jeff ( 2018-06-07 19:30:00 -0500 )edit

Thanks much

o5jeff gravatar imageo5jeff ( 2018-06-07 19:30:11 -0500 )edit

aahhh, did not even see it ;)

berak gravatar imageberak ( 2018-06-07 20:03:17 -0500 )edit

Question Tools

1 follower


Asked: 2018-06-07 02:59:12 -0500

Seen: 1,629 times

Last updated: Jun 07 '18