Revision history [back]

Why VideoCapture.set() method is much slower when run simultaneously with Tensorflow on single GPU?

I've created a CUDA-based Docker image with FFMPEG, OpenCV and Tensorflow. Tensorflow version: 1.14.0 OpenCV version: 4.1.2 FFMPEG version: 4.2.2

FFMPEG was built with CUDA support:

>>> ffmpeg
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
configuration:
--enable-cuda --enable-cuvid --enable-libnpp --extra-cflags=-I/usr/local/cuda/include/ --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libx264 --extra-libs=-lpthread --enable-nvenc --enable-nonfree --enable-shared --disable-static

OpenCV was built with CUDA and FFMPEG support (print(cv2.getBuildInformation())):

Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (58.54.100)
avformat: YES (58.29.100)
avutil: YES (56.31.100)
swscale: YES (5.5.100)
avresample: NO
GStreamer: NO
v4l/v4l2: YES (linux/videodev2.h)

NVIDIA CUDA: YES (ver 10.0, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 30 35 37 50 52 60 61 70 75
NVIDIA PTX archs:
cuDNN: YES (ver 7.6.5)

My goal is to speed up video encoding using ffmpeg CUDA encoder. This video is later used to make some predictions using Tensorflow. I encountered a very strange behavior when running VideoCapture set operation with and without Tensorflow in the same script.

When script performs only SET operation, it takes on average nearly 0.5 second. But when this operation is executed in script that additionally loads and runs Tensorflow models, it takes no average more than 8 seconds. In both cases nearly 25% of GPU memory is used during SET operation.

Command that I use to run docker container:

docker run --gpus all -it -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video $DOCKER_IMAGE_NAME

Additionally, I use an environmental variable to enable the usage of CUDA backend in cv2.CAP_FFMPEG VideoCapture API:

export OPENCV_FFMPEG_CAPTURE_OPTIONS=video_codec;h264_cuvid

What could be the reason of this behavior or is it some known bug in OpenCV/Tensorflow compatibility?

Why VideoCapture.set() method is much slower when run ran simultaneously with Tensorflow on single GPU?