# Convert a Video.mp4 in a 2D Matrix where each row represents a frame

Hello,

as explained in the title, i want to upload a video in python (import cv2) and then find a way to represent this video as 2D-array video_matrix. The rows of video_matrix are equal to the total numbers of frames and the number of columns are equal to the total number of features that describes a single frame.

For Example a 3s video (30fps) has 90 frames, each frame has height 100 and width 100 and each pixel is described by 3 values (rbg).

In my current method I convert each frame, which is a 3D-array with dimension (100,100,3), into a 1D vector of size 100 x 100 x 3.

def image_to_vector(image):
"""
Args:
image: numpy array of shape (length, height, depth)

Returns:
v: a vector of shape (length x height x depth, 1)
"""
length, height, depth = image.shape
return image.reshape((length * height * depth, 1))


Then i append the resulting vector to a video_matrix

video_matrix = np.column_stack((video_matrix, frame_vector))


I repeat this procedure for all frames of the video. So in the end i get numerical representation of a video as a 2D-array where the rows are representations of a frames. The video_matrix must have this form, because I want to apply machine learning algorithms on it.

My problem is that the second step (append frame_vector to video_matrx) takes to much time. For example if want to represent a three min video it takes almost 2 hours to get the corresponding video_matrix. Is there build in tool in opencv for python that allows me to get the video_matrix faster, even for longer videos?

My Code:

import numpy as np
import cv2 # extract frames from the videos
from PIL import Image  # to manipulate images

#Create frames of a video and store them
video = cv2.VideoCapture('path/video.mp4')
if not os.path.exists('data'):
os.makedirs('data')

counter = 0
while(True):

if ret:
# if video is still left continue creating images
name = './data/frame' + str(counter) + '.jpg'
#print ('Creating...' + name)

# writing the extracted images
cv2.imwrite(name, frame)

# increasing counter so that it will
# show how many frames are created
counter += 1
else:
break

# Release all space and windows once done
video.release()
cv2.destroyAllWindows()

video_matrix = np.zeros(width * height * 3) # initialize 1D array which will become the 2D array; first column will be deleted at the end

for i in range(counter): # loops over the total amount of frames

current_frame = np.asarray(Image.open('./data/frame'+str(i)+'.jpg')) # 3D-array = current frame
frame_vector = image_to_vector(current_frame) #convert frame into a 1D array
video_matrix = np.column_stack((video_matrix, frame_vector)) # append frame x to a matrix X that will represent the video

video_matrix = np.delete(video_matrix, 0, 1) # d

edit retag close merge delete

Sort by ยป oldest newest most voted

numpy stacking creates a copy because it's impossible to enlarge arrays in-place.

thus, you are copying the whole array for EVERY frame. that's O(n^2) complexity.

in your loop, you should append each frame to a simple python list (that_list.append(frame.flatten()). when you are done, convert the list of arrays to one big array (video_matrix = np.array(that_list))

more

crackwitz thank you very much! It worked out and reduced the time from 2hours to 10s!

( 2020-12-04 07:34:42 -0600 )edit

Official site

GitHub

Wiki

Documentation