Why VideoCapture.set() method is much slower when ran simultaneously with Tensorflow on single GPU?

asked 2020-03-16 09:44:52 -0500

andriydinamic gravatar image

updated 2020-03-16 11:15:17 -0500

I've created a CUDA-based Docker image with FFMPEG, OpenCV and Tensorflow. Tensorflow version: 1.14.0 OpenCV version: 4.1.2 FFMPEG version: 4.2.2

FFMPEG was built with CUDA support:

>>> ffmpeg
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
--enable-cuda --enable-cuvid --enable-libnpp --extra-cflags=-I/usr/local/cuda/include/ --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libx264 --extra-libs=-lpthread --enable-nvenc --enable-nonfree --enable-shared --disable-static

OpenCV was built with CUDA and FFMPEG support (print(cv2.getBuildInformation())):

Video I/O:
DC1394: NO
avcodec: YES (58.54.100)
avformat: YES (58.29.100)
avutil: YES (56.31.100)
swscale: YES (5.5.100)
avresample: NO
GStreamer: NO
v4l/v4l2: YES (linux/videodev2.h)

NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75
cuDNN: YES (ver 7.6.5)

My goal is to speed up video encoding using ffmpeg CUDA encoder. This video is later used to make some predictions using Tensorflow. I encountered a very strange behavior when running VideoCapture set(cv2.CAP_PROP_POS_FRAMES, frame_position) operation with and without Tensorflow in the same script.

When script performs only SET operation, it takes on average nearly 0.5 second. But when this operation is executed in script that additionally loads and runs Tensorflow models, it takes no average more than 8 seconds. In both cases nearly 25% of GPU memory is used during SET operation.

Command that I use to run docker container:

docker run --gpus all -it -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video $DOCKER_IMAGE_NAME

Additionally, I use an environmental variable to enable the usage of CUDA backend in cv2.CAP_FFMPEG VideoCapture API:

export OPENCV_FFMPEG_CAPTURE_OPTIONS=video_codec;h264_cuvid

What could be the reason of this behavior or is it some known bug in OpenCV/Tensorflow compatibility?

edit retag flag offensive close merge delete


25% of GPU memory is used during SET operation.

what do you set() there ? and when ?

berak gravatar imageberak ( 2020-03-16 10:02:56 -0500 )edit


andriydinamic gravatar imageandriydinamic ( 2020-03-16 11:15:34 -0500 )edit

So the general workflow is as following:

I am opening a VideoCapture of some http video stream, setting a capture position at some point, reading N consecutive frames and performing some Machine Learning on the sequence of images.

The bottleneck is capture.set(cv2.CAP_PROP_POS_FRAMES, position) operation.

andriydinamic gravatar imageandriydinamic ( 2020-03-16 12:16:27 -0500 )edit