Why does video.set(..FRAME_POS,<index>) index not align with frame number?

asked 2018-09-27 11:53:36 -0600

Thomas Lux gravatar image

updated 2018-09-27 13:06:41 -0600

I have a video that is 2:12 sec long according to QuickTime on MacOS (10.14 Mojave).

I have the following code:

    import cv2

    vid = cv2.VideoCapture("video.mov")
    length = int(vid.get(cv2.CAP_PROP_FRAME_COUNT))  # = 3953
    fps    = int(vid.get(cv2.CAP_PROP_FPS))          # = 29

    def frame_set(index):
        success = vid.set(cv2.CAP_PROP_POS_FRAMES, index)
        success, img = vid.read()
        return img

    def frame_walk(index):
        success = vid.set(cv2.CAP_PROP_POS_FRAMES, 0)
        for i in range(index):
            vid.read()
        success, img = vid.read()
        return img

    sum(abs(frame_set(0) - frame_walk(0)))   # = 0
    sum(abs(frame_set(29) - frame_walk(29))) # = 0
    sum(abs(frame_set(30) - frame_walk(30))) # = <big number>  <---- PROBLEM, mismatch

    frame_set(3953 - 128)  # = <image>
    frame_set(3953 - 127)  # = None    <---- PROBLEM, should be valid image
    frame_set(3952)        # = None    <---- PROBLEM, should be valid image
    frame_walk(3953 - 127) # = <image> <---- correct answer
    frame_walk(3952)       # = <image> <---- correct answer

Clearly a misalignment between the "frame index" method starts as soon as "1 second" has elapsed in the video. The OpenCV ".set" method is not actually setting to the correct frame. However the more cumbersome "walk" method works just fine.

Am I doing something wrong here?

This appears to be a bug in the OpenCV codebase, because the video length divided by the fps provides a 2 minute 16 second video, when QuickTime correctly reports a 2 minute 12 second video. That difference accounts for the last 127 frames being dropped from the ".set" method.

edit retag flag offensive close merge delete

Comments

Am I doing something wrong here?

somewhat mildly, an "expectation mismatch". cv2.VideoCapture is a utility class to acquire images for computer-vision. while it seems you want to build a video editing software on top of it. (wrong library abused for this, sorry to say so)

some codecs (i've no idea about apple or mov) only store the (absolute) position information of keyframes, so any position relative to that is a plain guess.

berak gravatar imageberak ( 2018-09-27 12:08:00 -0600 )edit

besides that, your 2nd attempt:

def frame_walk(index):
    success = vid.set(cv2.CAP_PROP_POS_FRAMES, 0)
    for i in range(index):
        vid.read()
    success, img = vid.read()
    return img

is broken. even IF you get the correct number of frames, the following success, img = vid.read()

will return an EMPTY/INVALID numpy array. (the movie's over already)

and, since, like all other python noobs, YOU NEVER CHECK if it's valid or not, --- you'll just burn there.

berak gravatar imageberak ( 2018-09-27 12:14:39 -0600 )edit
1

I do want to do computer-vision, specifically I am working on de-noising images using distance-based function approximation techniques. I am using the VideoCapture object to store images in a less RAM-intense mechanism (rather than a single array).

I want to be able to index access the video just as you would index access an array. The actual number of frames provided by "length" is correct! If you look carefully, you can see that the only difference between "frame_walk" and "frame_set" is the way that the i-th frame is retrieved. I excluded the checking code for brevity, my actual code is longer and more careful.

Notice that using "frame_set" I am unable to access the last 127 frames of the video, even though they clearly exist. This is the behavior which I question.

Thomas Lux gravatar imageThomas Lux ( 2018-09-27 12:55:15 -0600 )edit