Efficient Alpha Blending

I want to write the fastest alpha blending possible with OpenCV.
Alpha blending means taking 2 (usually color) images, say A and B and an additional single channel image W, the alpha channel, in my case a CV_8UC1 (but a general solution should probably support CV_32FC1 too) and perform a weighted convex sum of the two images, such that the alpha channel is scaled to between [0,1]. This allows creating transparency effects in the blending.

My current solution transforms the singl channel image intoa 3 channel image, alpha3C and the does the following:

cv::Mat blended = A.mul(alpha3C, 1.0/255) + B.mul(cv::Scalar::all(255)-alpha3C, 1.0/255);

This seems quite inefficient due to the following issues:

  1. The alpha channel must be converted to 3 channels which takes up both run-time and extra (x4) memory. This is because mul() cannot handle images of 2 different channel numbers.
  2. The scaling is done twice, and subtraction requires a temporary image for storage.

Of course, I can write my own loop over all the pixels, but I was wondering if it is possible to extend MatExpr to support such usage, which isn't so uncommon.

If you have opencv_gpu, you could try cv::gpu::alphaComp(), which is likely quite efficient!

Unfortunately, I need it on multiple mobile platforms with no (NVIDIA) GPU.

@Adi ok. Do you have to render in opencv, or could you render in opengl? I have been using alpha channels in my application, and when I was doing the rendering in opencv, I was doing so with a loop over all pixels... Now I'm rendering in opengl (openframeworks), but applying the alpha channel is not efficient either: OpenCV does need some better alpha handling...

@B.Bogart: Can you share your OpenGL code?

@Adi is my code useful in being in openframeworks?

I don't know.

