Execution time of OpenCvs net.forward() way too much(in minutes) than in windows

asked 2020-01-26 04:48:27 -0500

Huma gravatar image

updated 2020-01-26 06:52:09 -0500

Hi All,

i executed a simple program on a arm based debian platform and also on windows. Though in windows, the time taken is not much but in debian its in minutes. Please find the details of the opencv version and the code responsible. Windows- OpenCV3.4.6 - Did not build them, installed the prebuild binaries Debian - OpenCV4.2 and since its needed for arm based i cross compiled on my linux system using the following cmake parameters cmake -D WITH_OPENMP=ON -D CMAKE_BUILD_TYPE=RELEASE -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -D CMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/arm-gnueabi.toolchain.cmake ../opencv/ i.m using arm-linux-gnueabihf-g++ (version 5.3.1) cross compiler

The code responsible is as follows:

int main()
Net net = cv::dnn::readNetFromCaffe(caffeConfigFile, caffeWeightFile);
Mat  frame = imread("untitled.png", IMREAD_COLOR); 
cv::Mat inputBlob = cv::dnn::blobFromImage(frame, 1.0, cv::Size(300, 300), cv::Scalar(104.0, 177.0, 123.0), false, 
net.setInput(inputBlob, "data");
double t = cv::getTickCount();
cv::Mat detection = net.forward("detection_out");
tt_opencvDNN = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
printf("setInput is %f", tt_opencvDNN);

I get 0.9s when run on windows and 214.5 sec when run on debian OS. Not sure what is wrong. Is it something to do with the libs i have compiled with? Am i missing some compiler options?


I recompiled the binaries using the compiler flag -DENABLE_NEON=ON and now the time reduced from 214 seconds to 22 seconds. Is there anything more i can add to make it faster?

edit retag flag offensive close merge delete


Things to do: How much time it spends in each function? Is one of them the bottleneck? Are the results same? Are the inputs really same? Could Debian version run into some edge case?

mvuori gravatar imagemvuori ( 2020-01-26 05:04:02 -0500 )edit

I logged the timing for each function. Generally in comparison with windows, all opencv functions are taking more time than their windows variant. (readNetFromCaffe takes 1.5 sec in arm based debian and 0.1 in windows , cv::dnn::blobFromImag takes 0.3 in arm and 0.01 in windows) And yes, inputs are same. Same image file- same models

Huma gravatar imageHuma ( 2020-01-26 05:48:18 -0500 )edit

It seems that you are comparing the results of a desktop processor with an embedded one. They are very different beasts. Even if the ARM platform shows some excellent numerical specs (like, for example 8 cores at 2GHz), it's still much-much slower than a X86 type processor (desktop or laptop).

About speeding up: as DNNs use a lot of simple parallel operations, the best way to speed up (on any platform) is to use hardware acceleration. For embedded applications you can either use a NVIDIA computer (like the Jetson Nano), or add a DNN processor (MovidiusUSB stick). On desktops, use NVIDIA GPUs. Note that the latest OpenCV should have CUDA backend for DNN (https://github.com/opencv/opencv/issu...); for Movidius you'll need OpenVINO.

kbarni gravatar imagekbarni ( 2020-01-28 06:48:03 -0500 )edit