Ask Your Question

Revision history [back]

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer.

(2) the consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;


   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   // #define myMAT UMat
   #define myMAT Mat

   thread_queue<myMAT*> queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/ANY/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT* copy = new myMAT();
                 image.copyTo(*copy);
                 queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }


   void Consumer() {
          while (!shutdown) {
                 myMAT *image = queue.pop();
                 if (image == NULL) break;

                 // Display image
                 imshow("OUTPUT", *image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          queue.push(NULL); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer.

(2) the consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;


   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   // #define myMAT UMat
   #define myMAT Mat

   thread_queue<myMAT*> queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/ANY/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT* copy = new myMAT();
                 image.copyTo(*copy);
                 queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }


   void Consumer() {
          while (!shutdown) {
                 myMAT *image = queue.pop();
                 if (image == NULL) break;

                 // Display image
                 imshow("OUTPUT", *image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          queue.push(NULL); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #1 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This resolves the issues.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer.

(2) the consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;


   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   // #define myMAT UMat
   #define myMAT Mat

   thread_queue<myMAT*> queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/ANY/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT* copy = new myMAT();
                 image.copyTo(*copy);
                 queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }


   void Consumer() {
          while (!shutdown) {
                 myMAT *image = queue.pop();
                 if (image == NULL) break;

                 // Display image
                 imshow("OUTPUT", *image);

                 // release image copy
                 delete image;

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          queue.push(NULL); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #1 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This resolves the issues.

* EDIT #2 *

Forgot to "delete image;" within the consumer in the example code above. Fixed this...

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer.

(2) the consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

    template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT UMat
   #define myMAT Mat

   thread_queue<myMAT*> queue;
thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/ANY/IMAGE.JPG", cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT* copy = new myMAT();
                 image.copyTo(*copy);
                 queue.push(copy);
cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 myMAT *image UMat image = queue.pop();
output_queue.pop();
                 if (image == NULL) (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", *image);

                 // release image copy
                 delete image;
image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          queue.push(NULL); process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #1 #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This resolves reduces the issues.probability of the issue appearing, but it's still quite often.

* EDIT #2 *

Forgot to "delete image;" within the consumer in the example code above. Fixed this...

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer.consumer ("Processor").

(2) the processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS * OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0 * OS: Windows 10 (!)

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) the processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS *

  • OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0 * 3.2.0
  • Static compiled libraries from Source via Visual Studio 2017
  • CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
  • OS: Windows 10, 64bit
  • CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
  • GPU: NVidia GeForce GTX 1060

Other machines: * NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support * OCCURING WITH: Windows 10 (!)machine, 64bit, AMD Radeon HD 6570, OpenCL supported

It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) the processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS

  • OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0
  • Static compiled libraries from Source via Visual Studio 2017
  • CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
  • OS: Windows 10, 64bit
  • CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
  • GPU: NVidia GeForce GTX 1060

Other machines: * NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support * OCCURING WITH: Windows 10 machine, 64bit, AMD Radeon HD 6570, OpenCL supported

It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) the OPTIONAL: data processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat cv::UMat, when the processor (2) is ommited, it produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. shown. When I add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection.detection. It seems that OpenCV schedules any OpenCL-filters as background jobs and when switching the thread the last action applied to the cv::UMat is not enforced to be finished before we continue working with it (e.g. display it in a window).

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS

  • OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0
  • Static compiled libraries from Source via Visual Studio 2017
  • CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
  • OS: Windows 10, 64bit
  • CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
  • GPU: NVidia GeForce GTX 1060

Other machines: * machines:

  • NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support * support
  • OCCURING WITH: Windows 10 machine, 64bit, AMD Radeon HD 6570, OpenCL supported

It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) OPTIONAL: data processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat, when the processor (2) is ommited, it produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When I add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection. It seems that OpenCV schedules any OpenCL-filters as background jobs and when switching the thread the last action applied to the cv::UMat is not enforced to be finished before we continue working with it (e.g. display it in a window).

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

UMat multithreading issues when using producer-consumer pattern

I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...

As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...

Used Library Versions, OS

  • OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0
  • Static compiled libraries from Source via Visual Studio 2017
  • CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
  • OS: Windows 10, 64bit
  • CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
  • GPU: NVidia GeForce GTX 1060

Other machines:

  • NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support
  • OCCURING WITH: Windows 10 machine, 64bit, AMD Radeon HD 6570, OpenCL supported

It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.

Code to reproduce issue

Here is some simplified example:

(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").

(2) OPTIONAL: data processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer

(3) the final consumer just displays the received images in a window.

Using cv::Mat works... cv::UMat, when the processor (2) is ommited, it produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When I add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection. It seems that OpenCV schedules any OpenCL-filters as background jobs and when switching the thread the last action applied to the cv::UMat is not enforced to be finished before we continue working with it (e.g. display it in a window).

   #include <condition_variable>
   #include <mutex>
   #include <queue>
   #include <thread>
   #include <opencv2/imgproc.hpp>
   #include <opencv2/highgui.hpp>

   using namespace cv;

   template <typename T>
   class thread_queue {
   private:
          std::queue<T> queue;
          std::mutex mutex;
          std::condition_variable newdata;

   public:
          thread_queue<T>() {}
          ~thread_queue<T>() {}

          void push(T t) {
                 std::unique_lock<std::mutex> lock(mutex);
                 queue.push(t);
                 newdata.notify_one();
          }

          T pop() {
                 std::unique_lock<std::mutex> lock(mutex);
                 if (queue.empty()) newdata.wait(lock);
                 T elem = queue.front();
                 queue.pop();
                 return elem;
          }

          virtual size_t getSize() {
                 std::unique_lock<std::mutex> lock(mutex);
                 return queue.size();
          }
   };


   /* use either UMat or Mat ... */
   #define myMAT UMat
   // #define myMAT Mat

   thread_queue<myMAT> process_queue;
   thread_queue<myMAT> output_queue;

   bool shutdown = false;

   void Producer() {
       // Load Image...
          myMAT image;
          cv::imread("/PATH/TO/SOME/IMAGE.JPG", CV_LOAD_IMAGE_COLOR).copyTo(image);

          while (!shutdown) {
                 // Push copy to output queue...
                 cv::myMAT copy;
                 image.copyTo(copy);
                 process_queue.push(copy);

                 // Wait some time...
                 std::chrono::milliseconds time(100); // 100ms => 10 fps
                 std::this_thread::sleep_for(time);
          }
   }

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;
                 cv::UMat gray, edges;
                 cv::cvtColor(image, gray, CV_BGR2GRAY);
                 cv::Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }



   void Consumer() {
          while (!shutdown) {
                 UMat image = output_queue.pop();
                 if (image.empty()) break;

                 // Display image
                 imshow("OUTPUT", image);

                 // call waitKey (else window content isn't updated)
                 waitKey(1);
          }
   }


   void main()
   {
          // Start workers...
          std::thread *producer= new std::thread(Producer);
          std::thread *processor = new std::thread(Processor);
          std::thread *consumer = new std::thread(Consumer);

          getchar(); // Wait for key pressed....

          // Shutdown...
          shutdown = true;
          producer->join();
          process_queue.push(UMat()); // shutdown signal...
          processor->join();
          output_queue.push(UMat()); // shutdown signal...
          consumer->join();
   }

Questions

(1) What is possibly causing that issue?

(2) How to fix it?

* EDIT #3 *

As suggested by berak (in the comments), I changed everything to AVOID pointers to cv::UMat. This reduces the probability of the issue appearing, but it's still quite often.

* EDIT #4 *

The occurance of the issue seems to be related to the creation/destruction of the cv::UMat. If I move the intermediate cv::UMat variables (gray, edges) outside the loop, the issue seems to disappear. This might be due to a hash collission due to the manner of how the locking works in cv::UMat:

// it should be a prime number for the best hash function
enum { UMAT_NLOCKS = 31 };
static Mutex umatLocks[UMAT_NLOCKS]; 
[..]
static size_t getUMatDataLockIndex(const UMatData* u)
{
    size_t idx = ((size_t)(void*)u) % UMAT_NLOCKS;
    return idx;
}

However, increasing UMAT_NLOCKS to a very huge number does NOT reduce the probability of occurance significantly.

CAUSING NO (LESS?) ISSUES:

   void Processor() {
          cv::UMat gray, edges;
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 int m_thres1 = 100;
                 int m_ksize = 3;

                 cvtColor(image, gray, CV_BGR2GRAY);
                 Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }

CAUSING ISSUES RELIABLY:

   void Processor() {
          while (!shutdown) {
                 UMat image = process_queue.pop();
                 if (image.empty()) break;

                 cv::UMat gray, edges;
                 int m_thres1 = 100;
                 int m_ksize = 3;

                 cvtColor(image, gray, CV_BGR2GRAY);
                 Canny(gray, edges, m_thres1, m_thres1 * 3, m_ksize);

                 // Display image
                 output_queue.push(edges);
          }
   }

Note: The problem does not occur in a single-threaded version, even if the UMat variables gray and edges are destroyed/recreated each time within a loop (i.e. it is multi-threading and most probably OpenCL related).