Why VideoCapture.set() method is much slower when ran simultaneously with Tensorflow on single GPU?
I've created a CUDA-based Docker image with FFMPEG, OpenCV and Tensorflow. Tensorflow version: 1.14.0 OpenCV version: 4.1.2 FFMPEG version: 4.2.2
FFMPEG was built with CUDA support:
>>> ffmpeg
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
configuration:
--enable-cuda --enable-cuvid --enable-libnpp --extra-cflags=-I/usr/local/cuda/include/ --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libx264 --extra-libs=-lpthread --enable-nvenc --enable-nonfree --enable-shared --disable-static
OpenCV was built with CUDA and FFMPEG support (print(cv2.getBuildInformation())):
Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (58.54.100)
avformat: YES (58.29.100)
avutil: YES (56.31.100)
swscale: YES (5.5.100)
avresample: NO
GStreamer: NO
v4l/v4l2: YES (linux/videodev2.h)
NVIDIA CUDA: YES (ver 10.0, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75
NVIDIA PTX archs:
cuDNN: YES (ver 7.6.5)
My goal is to speed up video encoding using ffmpeg CUDA encoder. This video is later used to make some predictions using Tensorflow. I encountered a very strange behavior when running VideoCapture set(cv2.CAP_PROP_POS_FRAMES, frame_position) operation with and without Tensorflow in the same script.
When script performs only SET operation, it takes on average nearly 0.5 second. But when this operation is executed in script that additionally loads and runs Tensorflow models, it takes no average more than 8 seconds. In both cases nearly 25% of GPU memory is used during SET operation.
Command that I use to run docker container:
docker run --gpus all -it -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video $DOCKER_IMAGE_NAME
Additionally, I use an environmental variable to enable the usage of CUDA backend in cv2.CAP_FFMPEG VideoCapture API:
export OPENCV_FFMPEG_CAPTURE_OPTIONS=video_codec;h264_cuvid
What could be the reason of this behavior or is it some known bug in OpenCV/Tensorflow compatibility?
what do you set() there ? and when ?
cv2.CAP_PROP_POS_FRAMES
So the general workflow is as following:
I am opening a VideoCapture of some http video stream, setting a capture position at some point, reading N consecutive frames and performing some Machine Learning on the sequence of images.
The bottleneck is capture.set(cv2.CAP_PROP_POS_FRAMES, position) operation.