Ask Your Question

Applying operations to video feed. Determining when to use CPU vs GPU [closed]

asked 2018-12-31 09:09:26 -0500

flakes gravatar image

updated 2019-01-05 06:47:22 -0500

I'm working on an OpenCV project to monitor a video feed and also apply animations to this feed. In the following example operation I am resizing a video frame and applying an overlay to the resized frame. My process looks like the link of the image

Here is the implementation of the process (currently done in C# opencvsharp, however, I can shift to any language at this point):

private void updateFrame(Mat currentFrame, Mat background, Mat mask, Mat invertedMask)
    int w = 400, h = 224;

    using (var resizedFrame = new Mat(
        new OpenCvSharp.Size(currentFrame.Size().Width - w, currentFrame.Size().Height - h), 
    using (var resizedBorderFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedFrame = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var maskedBackground = new Mat(currentFrame.Size(), currentFrame.Type()))
    using (var output = new Mat(currentFrame.Size(), currentFrame.Type()))
        Cv2.Resize(currentFrame, resizedFrame, resizedFrame.Size());
        Cv2.CopyMakeBorder(resizedFrame, resizedBorderFrame, h/4, h*3/4, w/2, w/2, BorderTypes.Constant, new Scalar(0));
        Cv2.BitwiseAnd(resizedBorderFrame, mask, maskedFrame);
        Cv2.BitwiseAnd(background, invertedMask, maskedBackground);
        Cv2.BitwiseOr(maskedBackground, maskedFrame, output);
        pictureBox.Image = OpenCvSharp.Extensions.BitmapConverter.ToBitmap(output);

This process (along with a few other operations) is beginning to take longer than the framerate of the video, creating a noticeable lag. Currently the process is being performed using CPU based operations, however, I read that applying GPU operations could speed up the runtime a considerable amount. Further, I read that creating a custom kernel to combine operations (or creating the whole series as a compound-kernel operation) could speed this up even more (That said, certain operations might not be constrained by the CPU and adding a GPU operation would be overkill?).

If you were to evaluate this problem from the start, how would you go about determining which operations to put on CPU vs GPU vs custom kernel? What optimizations could I be making to speed up the application, or are there other processes I can employ to make this job easier?

edit retag flag offensive reopen merge delete

Closed for the following reason question is off-topic or not relevant by berak
close date 2019-01-01 03:15:16.997460


sorry, but we cannot help you with problems from unsupported 3rdparty c# wrappers

berak gravatar imageberak ( 2019-01-01 03:15:58 -0500 )edit

@berak I'm not asking a question specific to this wrapper. The snippet was just to present the problem in code as well.

flakes gravatar imageflakes ( 2019-01-01 14:08:07 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2019-01-01 07:14:27 -0500

updated 2019-01-02 16:02:12 -0500

  • prepare a background image
    • define a roi
    • use cv:resize() function to resize frame in background image

you can try sample c++ or java code below to get something like

image description

#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc, char** argv)
    Mat frame,back;
    VideoCapture cap(argv[1]);
    if (!cap.isOpened()) {
        cerr << "ERROR! Unable to open video file\n";
        return -1;

   // i used blurred first frame as background;
    blur(back, back, Size(20, 20));

    // set black area in the background     
    back(Rect(0, back.rows / 8, back.cols, back.rows / 8 * 6)).setTo(Scalar(0, 0, 0));

    // define a roi
    Mat roi(back, Rect(back.cols / 8, back.rows / 8, back.cols / 8 * 6, back.rows / 8 * 6));

    TickMeter tm;
    int counter = 0;

    for (;;)
        if (frame.empty()) {

        resize(frame, roi, roi.size());

        if (!(counter % 20)) // don't show all frames
            imshow("video", back);

    cout << tm;
    return 0;


package test;
import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Rect;
import org.opencv.core.Size;
import org.opencv.core.Scalar;
import org.opencv.imgproc.Imgproc;
import org.opencv.highgui.HighGui;
import org.opencv.videoio.VideoCapture;

public class test_cv
    public static void main( String[] args )
        System.loadLibrary( Core.NATIVE_LIBRARY_NAME );
        VideoCapture capture = new VideoCapture("C:\\build\\opencv4.0.1\\bin\\Release\\vtest.avi");
        if (!capture.isOpened())
            System.err.println("Unable to open  video ");

        Mat frame = new Mat();
        Mat back = new Mat();;

        Imgproc.blur(back, back, new Size(20,20));

        Mat roi =back.submat(new Rect(0,back.rows()/8,back.cols(),back.rows()/8*6));
        roi.setTo(new Scalar(0,0,0));

        roi = back.submat(new Rect(back.cols()/8,back.rows()/8,back.cols()/8*6,back.rows()/8*6));

        while (true)
            if (frame.empty())

            Imgproc.resize(frame, roi, roi.size());

            HighGui.imshow("Frame", back);

edit flag offensive delete link more



Great answer! I also got a lot of speed up from applying this trick and also about a 3x speedup by converting my code to use UMat. I'm going to see if applying cuda methods will help even further!

flakes gravatar imageflakes ( 2019-01-02 14:48:43 -0500 )edit

edited c++ code and added a time measurement and a counter for imshow

sturkmen gravatar imagesturkmen ( 2019-01-02 16:08:29 -0500 )edit

Question Tools



Asked: 2018-12-31 09:09:26 -0500

Seen: 696 times

Last updated: Jan 05 '19