UMat multithreading issues when using producer-consumer pattern

asked 2018-04-12 08:17:34 -0500

Stefan Koenig gravatar image

updated 2018-04-13 10:59:56 -0500

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS

  • OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0
  • Static compiled libraries from Source via Visual Studio 2017
  • CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
  • OS: Windows 10, 64bit
  • CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
  • GPU: NVidia GeForce GTX 1060

Other machines:

  • NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support
  • OCCURING WITH: Windows 10 machine, 64bit, AMD Radeon HD 6570, OpenCL supported

It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) OPTIONAL: data processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat, when the processor (2) is ommited, it produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When I add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection. It seems that OpenCV schedules any OpenCL-filters as background jobs and when switching the thread the last action applied to the cv::UMat is not enforced to be finished before we continue working with it (e.g. display it in a window).

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 return ...
edit retag flag offensive close merge delete


multithreading with opencv is usually a "problematic" idea,at least. waitKey() contains the window's internal message loop, that MUST stay in the main thread.

also, you should NEVER use pointers to cv::Mat, you're wrecking the internal refcounting like that (.even worse with UMat)

do you absolutely HAVE TO use multithreading ?

berak gravatar imageberak ( 2018-04-12 08:20:19 -0500 )edit

Hm... interesting, I changed the code to not use pointer and it works with UMat. So that's actually the problem! Could you post this as an answer... and maybe explain in more detail why the refcounting is not working then? I know about the difference of assigning a pointer to an object and an objected directly... and what refcounting is. However, I don't see where and why this causes this kind of behaviour in that special case.

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 08:40:45 -0500 )edit

cv::Mat is a smart pointer already.

and while this seems to have solved your momentary problems, for sure, there are more hidden in your current approach. (e.g. the gui code)

why the multi-theading at all ? (a lot of opencv functions are NOT thread-safe by design (e.g. to enable internal parallelism))

berak gravatar imageberak ( 2018-04-12 08:45:54 -0500 )edit

AND: I currently see no way around multithreading, as I need to make sure all CPU cores are used. Also the whole architecture of the program depends on it... It's already about 10000 lines of code, working fine except for not being able to use UMat and therefore no GPU so far.

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 08:55:24 -0500 )edit

GUI is done via a Web Interface, works fine. But, currently I've this generalized queues which forward data as a "void*" to allow different types with manual type checking. I'll need to clean this up by using a wrapper class structure instead.

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 09:01:50 -0500 )edit

You never delete the mat. You never unlock your mutex.

sjhalayka gravatar imagesjhalayka ( 2018-04-12 09:18:27 -0500 )edit

I know, forgot that in the example. The final consumer needs to do that... (will fix the above code)

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 09:32:59 -0500 )edit

PS: A "std::unique_lock<std::mutex>" is unlocked as soon as it's scope ends and it's destructor is called. That's all fine there...

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 09:42:36 -0500 )edit

OK thanks :D

sjhalayka gravatar imagesjhalayka ( 2018-04-12 09:45:01 -0500 )edit

Well... thank YOU for the inspection of my code! :P

Stefan Koenig gravatar imageStefan Koenig ( 2018-04-12 09:47:04 -0500 )edit