Ask Your Question

Revision history [back]

Why is general OpenCV so fast compared to a specialised implementation

I was looking through some old OpenCV university work, and cleaning it up so I could use it in the future.

For our work we had been asked to not use some OpenCV functions and instead to write our own, to gain a better understanding of how the functions are working. In one such case I had written my own function to take an image and find the "Polar Co-ordinates" of the image (according to the comments I made for myself). This was to get the derivatives of an image and calculate the directions of change so I would have a "gradient magnitude" and a "gradient angle" of the image. I also wanted to obtain the mean value in the "gradient magnitude" image, for processing that came later.

I wrote a single function to do all of the above with one full iteration of the original image, basically doing everything at the same time.

I found that in order to obtain the same result with OpenCV functions only would require 4 separate calls:

cv::Sobel(original_image, grad_mag_x, -1, 1, 0, 3);
cv::Sobel(original_image, grad_mag_y, -1, 0, 1, 3);
cv::cartToPolar(grad_mag_x, grad_mag_y, grad_mag_total, grad_angle);
auto mean = cv::mean(grad_mag_total)[0];

3 iterations of the original image are required, with one further iteration over two images simultaneously.

So I went ahead and timed my implementation against OpenCV's implementation. I found my code running an order of magnitude slower with no optimisation and when compiled for speed it was running only twice as slow.

Why is this the case? Surely something specialised that runs through n by n data once would be faster than something running through an n by n set of data 4 times?

What makes me the most curious is I would have thought that looping through an n x n image 3 times would upset the hardware cache and cause more frequent direct reads to memory, making the whole process orders of magnitude slower.

Of course, if you read all of this and think that my implementation really should be faster then I shall take a good hard look at what I wrote all those years ago and make it better.