Ask Your Question

Convert a Video.mp4 in a 2D Matrix where each row represents a frame

asked 2020-12-03 10:09:53 -0500

particle99 gravatar image


as explained in the title, i want to upload a video in python (import cv2) and then find a way to represent this video as 2D-array video_matrix. The rows of video_matrix are equal to the total numbers of frames and the number of columns are equal to the total number of features that describes a single frame.

For Example a 3s video (30fps) has 90 frames, each frame has height 100 and width 100 and each pixel is described by 3 values (rbg).

In my current method I convert each frame, which is a 3D-array with dimension (100,100,3), into a 1D vector of size 100 x 100 x 3.

def image_to_vector(image):
image: numpy array of shape (length, height, depth)

 v: a vector of shape (length x height x depth, 1)
length, height, depth = image.shape
return image.reshape((length * height * depth, 1))

Then i append the resulting vector to a video_matrix

video_matrix = np.column_stack((video_matrix, frame_vector))

I repeat this procedure for all frames of the video. So in the end i get numerical representation of a video as a 2D-array where the rows are representations of a frames. The video_matrix must have this form, because I want to apply machine learning algorithms on it.

My problem is that the second step (append frame_vector to video_matrx) takes to much time. For example if want to represent a three min video it takes almost 2 hours to get the corresponding video_matrix. Is there build in tool in opencv for python that allows me to get the video_matrix faster, even for longer videos?

My Code:

import numpy as np
import cv2 # extract frames from the videos
from PIL import Image  # to manipulate images

 #Create frames of a video and store them 
video = cv2.VideoCapture('path/video.mp4') 
if not os.path.exists('data'): 

counter = 0 
    # reading from frame 
    ret,frame =  

if ret: 
    # if video is still left continue creating images 
    name = './data/frame' + str(counter) + '.jpg'
        #print ('Creating...' + name) 

        # writing the extracted images 
        cv2.imwrite(name, frame) 

        # increasing counter so that it will 
        # show how many frames are created 
        counter += 1

# Release all space and windows once done 

video_matrix = np.zeros(width * height * 3) # initialize 1D array which will become the 2D array; first column will be deleted at the end

for i in range(counter): # loops over the total amount of frames

    current_frame = np.asarray('./data/frame'+str(i)+'.jpg')) # 3D-array = current frame
    frame_vector = image_to_vector(current_frame) #convert frame into a 1D array
    video_matrix = np.column_stack((video_matrix, frame_vector)) # append frame x to a matrix X that will represent the video

video_matrix = np.delete(video_matrix, 0, 1) # d
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2020-12-03 11:11:26 -0500

crackwitz gravatar image

numpy stacking creates a copy because it's impossible to enlarge arrays in-place.

thus, you are copying the whole array for EVERY frame. that's O(n^2) complexity.

in your loop, you should append each frame to a simple python list (that_list.append(frame.flatten()). when you are done, convert the list of arrays to one big array (video_matrix = np.array(that_list))

edit flag offensive delete link more


crackwitz thank you very much! It worked out and reduced the time from 2hours to 10s!

particle99 gravatar imageparticle99 ( 2020-12-04 07:34:42 -0500 )edit

Question Tools

1 follower


Asked: 2020-12-03 10:05:12 -0500

Seen: 42 times

Last updated: Dec 03 '20