Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Applying operations to video feed. Determining when to use CPU vs GPU

I'm working on an OpenCV project to monitor a video feed and also apply animations to this feed. In the following example operation I am resizing a video frame and applying an overlay to the resized frame. My process looks like bellow:

image description

Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):

private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
    int w = 400, h = 224;

    using (var resizedFrame = new Mat(
        new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h), 
        currentFrame.Type()))
    using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
    {
        Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
        Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
        Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
        Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
        Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
        pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
    }
}

This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more (That said, certain operations might not be constrained by the CPU and adding a GPU operation would be overkill?).

If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? What optimizations could I be making to speed up the application, or are there other processes I can employ to make this job easier?

Applying operations to video feed. Determining when to use CPU vs GPU

I'm working on an OpenCV project to monitor a video feed and also apply animations to this feed. In the following example operation I am resizing a video frame and applying an overlay to the resized frame. My process looks like bellow:

image descriptionthe link of the image

Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):

private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
{
    int w = 400, h = 224;

    using (var resizedFrame = new Mat(
        new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h), 
        currentFrame.Type()))
    using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
    {
        Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
        Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
        Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
        Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
        Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
        pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);
    }
}

This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more (That said, certain operations might not be constrained by the CPU and adding a GPU operation would be overkill?).

If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? What optimizations could I be making to speed up the application, or are there other processes I can employ to make this job easier?