OpenCV - object detection on video, elaborate frames in real time

asked 2020-04-10 05:22:09 -0500

Hi there, I'm implementing an object detection algorithm, using OpenCV on Python to read a live stream from my webcam. The overall structure of the code is something like:

cap = cv2.VideoCapture(0)
    # Load frame from the camera
    ret, frame =
    class_IDs, scores, bounding_boxes = neural_network(frame)
    cv2.imshow('window', frame)

So, basically the code is continously performing this loop:

  • reading one frame from the webcam;
  • passing this frame through the neural network and showing it with the results of the object detection.
  • after this is completed, go on to the next frame

When I use the webcam, once the elaboration of a frame is done the program reads the following frame, which is the one currently taken by the webcam (there's a buffer of 5 frames, but it's not really that significant, and I can set it to 1 anyway).
On the other hand, when reading a video file all the frames are read one by one, and the unprocessed ones just, basically, there's an increasing delay between the output of the program and the "natural" flow of the video.

I was wondering if/how I can get the same result with video as I have with the webcam. In other words:

  • take frame 0 at time t0;
  • analyze frame 0, the process takes a certain amout of time delta_t;
  • after this, do not analyze frame 1, but the frame that would be taken after delta_t if the video was played normally.

I'm asking because I'll probably have to run the object detection on a virtual machine, reading the video stream from a remote webcam, so I'm afraid that the program might be behave like it usually does for videos, reading all the accumulated frames instead of the "live" ones.

I assume I might have to use two parallel processes, one that keeps reading the video stream and the other one taking frames to be analyzed as often as allowed by the object detector...any suggestions?

edit retag flag offensive close merge delete



The simple way would be: keep track of elapsed time after processing a frame. Increment frane count after each read frame. Now as you know the file's fame rate, you can calculate what the position of the next frame to process would be. Read frames without processing them until you reach that frame. Just reading franes is (should be...) fast.

mvuori gravatar imagemvuori ( 2020-04-10 08:37:41 -0500 )edit