Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Basic operations on single image are slow

I noticed that some of the basic operations on single image are much slower than their counterparts that work with 2 images. Here some results of tests I made with 1 megapixel 8-bit image:

Operation "image1 += image2" took 0.7 millisecond. Operation "image += 10" that should be at least as fast (I think it should be faster due to simpler memory access), took 3.6 ms. More than 5 times slower.

Operation "image *= 0.7" took 6.6 ms. Operation "addWeighted(image1, 0.5, image2, 0.5, 10, image1)" took 1.8 ms. More than 3.5 times slower instead of being faster. It seems sort of ridiculous that multiplying image by scalar can be done by "addWeighted(image, 0.7, image, 0, 0, image)" instead of "image *= 0.7", and time performance will be better.

More time measurements for single image operations: convertTo() - 6.6 ms, abs() - 4 ms, convertScaleAbs() - 8.2 ms, normalize() - 8.6 ms.

More time measurements for two image operations: scaleAdd() - 1.8 ms, bitwise_and() - 0.8 ms, compare() - 0.7 ms.

This behavior is very weird. Does anyone know why this is happening? Are there some flags that may fix this problem?

As a temporary solution I implemented all single image operations through LUT operator. This means that all single image operations take 1.2 ms. But this is good only for 8-bit images. And I think much better performance can be achieved by "proper" solution.

Basic operations on single image are slow

I noticed that some of the basic operations on single image are much slower than their counterparts that work with 2 images. Here some results of tests I made with 1 megapixel 8-bit image:

Operation "image1 += image2" took 0.7 millisecond. Operation "image += 10" that should be at least as fast (I think it should be faster due to simpler memory access), took 3.6 ms. More than 5 times slower.

Operation "image *= 0.7" took 6.6 ms. Operation "addWeighted(image1, 0.5, image2, 0.5, 10, image1)" took 1.8 ms. More than 3.5 times slower instead of being faster. It seems sort of ridiculous that multiplying image by scalar can be done by "addWeighted(image, 0.7, image, 0, 0, image)" instead of "image *= 0.7", and time performance will be better.

More time measurements for single image operations: convertTo() - 6.6 ms, abs() - 4 ms, convertScaleAbs() - 8.2 ms, normalize() - 8.6 ms.

More time measurements for two image operations: scaleAdd() - 1.8 ms, bitwise_and() - 0.8 ms, compare() - 0.7 ms.

This behavior is very weird. Does anyone know why this is happening? Are there some flags that may fix this problem?

As a temporary solution I implemented all single image operations through LUT operator. This means that all single image operations take 1.2 ms. But this is good only for 8-bit images. And I think much better performance can be achieved by "proper" solution.

Basic operations on single image are slow

I noticed that some of the basic operations on single image are much slower than their counterparts that work with 2 images. Here some results of tests I made with 1 megapixel 8-bit image:

Operation "image1 += image2" took 0.7 millisecond. Operation "image += 10" that should be at least as fast (I think it should be faster due to simpler memory access), took 3.6 ms. More than 5 times slower.

Operation "image *= 0.7" took 6.6 ms. Operation "addWeighted(image1, 0.5, image2, 0.5, 10, image1)" took 1.8 ms. More than 3.5 times slower instead of being faster. It seems sort of ridiculous that multiplying image by scalar can be done by "addWeighted(image, 0.7, image, 0, 0, image)" instead of "image *= 0.7", and time performance will be better.

More time measurements for single image operations: convertTo() - 6.6 ms, abs() - 4 ms, convertScaleAbs() - 8.2 ms, normalize() - 8.6 ms.

More time measurements for two image operations: scaleAdd() - 1.8 ms, bitwise_and() - 0.8 ms, compare() - 0.7 ms.

This behavior is very weird. Does anyone know why this is happening? Are there some flags that may fix this problem? Tests were performed on Windows XP (SP2 and SP3), VS2005 and VS2010, OpenCV 2.4.2.

As a temporary solution I implemented all single image operations through LUT operator. This means that all single image operations take 1.2 ms. But this is good only for 8-bit images. And I think much better performance can be achieved by "proper" solution.