Memory issues when loading videos into frames
I have folder with 160 FLV videos, each having 120 frames of size 152, 360 with RGB colors (3 channels) that I would like to load into the numpy array frames
. I do this with the code:
import numpy as np
import cv2
import os
directory = "data/"
# frames = []
frames = np.empty(shape=(160 * 120, 152, 360, 3), dtype=np.float32)
for file in os.listdir(directory):
if file.endswith(".flv"):
file_path = os.path.join(directory, file)
nr_file = nr_file + 1
print('File '+str(nr_file)+' of '+str(nb_files_in_dir)+' files: '+file_path)
# Create a VideoCapture object and read from input file
# If the input is the camera, pass 0 instead of the video file name
cap = cv2.VideoCapture(file_path)
# Check if camera opened successfully
if (cap.isOpened() == False):
print("Error opening video stream or file")
# Read until video is completed
while (cap.isOpened()):
# Capture frame-by-frame
ret, frame = cap.read()
if ret == True:
# frames.append(frame.astype('float32') / 255.)
frames[nr_frame, :, :, :] = frame.astype('float32') / 255.
nr_frame = nr_frame + 1
nb_frames_in_file = nb_frames_in_file + 1
else:
break
# When everything done, release the video capture object
cap.release()
# frames = np.array(frames)
Originally I tried to use a list frames
(see the commented lines), instead of the prerallocated numpy array, but it seemed this took too much memory - no idea why though.
However, it seems this did not help much: Still the code is very memory hungry (many GB), even though my videos are just a few KB large. I think it is because the resources of the cap
-objects (the cv2.VideoCapture
-objects) might not freed despite me using cap.release()
- is that correct? What can I do, to make my code memory-efficient?
no, it's not the videocapture, the decompressed frames just need a huge amount of memory.
just do the maths. it is:
you'll have to restrict it somehow ...
oh, you also convert to float, so the whole thing * 4
(why do you think, that's nessecary ?)
@berak: I corrected my mistake: There are only 160 images and they are smaller afterall. If the size of data itself would be the issue, then the allocation of the
frames
numpy array would already eat up all memory, but it does not. The float is because I am putting the thing into a neural network afterwards. The numpy array is actually not that large, even if it is allocated with float32, so this should not be the issue (I think).still, you're trying to allocate ~25gb of memory for this.
you'll have to feed it into the nn in batches later, so only load 1 batch at a time.
Thanks, that is what I am doing now. Since I need a DataGenerator I implemented a keras.utils.Sequence class and use this for batch-training of my neural network.