UMat multithreading issues when using producer-consumer pattern
I've implemented a processing pipeline using a producer-consumer pattern. A consumer waits for data to process as a independent thread and then handles the data as soon as it receives it, while a producer will push data into the processing queue of a consumer. Well, that's the simplified version...
As long as I use cv::Mat all is fine. But, when using cv::UMat to enable GPU support, the program is behaving strange (having timing issues). It's like sometimes data has not finished processed or copied back from the GPU to CPU, yet...
Used Library Versions, OS
- OpenCV 3.4.1 (current), but occurs also using OpenCV 3.2.0
- Static compiled libraries from Source via Visual Studio 2017
- CMAKE-Options: BUILD_SHARED_LIBS = false, BUILD_TIFF = false, WITH_TIFF = false, WITH_CUDA=false, /MD and /MDd compiler options replaced with /MT and /MTd respectively; Mode: Release, x64
- OS: Windows 10, 64bit
- CPU: i7 7th Gen, 4 Cores (8 Hyper-Threads)
- GPU: NVidia GeForce GTX 1060
Other machines:
- NO PROBLEM WITH: Windows 7 machine, 64bit, but old Intel IGP GPU, with no OpenCL support
- OCCURING WITH: Windows 10 machine, 64bit, AMD Radeon HD 6570, OpenCL supported
It's either working when cv::UMat falls back to CPU implementation or might be an OS related issue. As it's occuring with AMD and NVIDIA GPUs, the graphics driver / special version should not be the issue. I'll try it with a Linux (Ubuntu 16.04) machine having 2 GTX 1080 Ti GPUs installed as soon as possible to make sure it's not a OS-related issue.
Code to reproduce issue
Here is some simplified example:
(1) a producer is loading an image and forwards copies of it to a consumer ("Processor").
(2) OPTIONAL: data processor converts the image to grayscale and applys a Canny edge detection and forwards the result to the final consumer
(3) the final consumer just displays the received images in a window.
Using cv::Mat works... cv::UMat, when the processor (2) is ommited, it produces an empty window most of the time; sometimes some random data appears and sometimes the delivered image is shown. When I add an intermediate consumer, that e.g. applys a grayscale conversion followed by a canny edge detection, the shown image is sometimes the grayscale image and sometimes the canny edge detection. It seems that OpenCV schedules any OpenCL-filters as background jobs and when switching the thread the last action applied to the cv::UMat is not enforced to be finished before we continue working with it (e.g. display it in a window).
#include <condition_variable>
#include <mutex>
#include <queue>
#include <thread>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;
template <typename T>
class thread_queue {
private:
std::queue<T> queue;
std::mutex mutex;
std::condition_variable newdata;
public:
thread_queue<T>() {}
~thread_queue<T>() {}
void push(T t) {
std::unique_lock<std::mutex> lock(mutex);
queue.push(t);
newdata.notify_one();
}
T pop() {
std::unique_lock<std::mutex> lock(mutex);
if (queue.empty()) newdata.wait(lock);
T elem = queue.front();
queue.pop();
return ...
multithreading with opencv is usually a "problematic" idea,at least. waitKey() contains the window's internal message loop, that MUST stay in the main thread.
also, you should NEVER use pointers to cv::Mat, you're wrecking the internal refcounting like that (.even worse with UMat)
do you absolutely HAVE TO use multithreading ?
Hm... interesting, I changed the code to not use pointer and it works with UMat. So that's actually the problem! Could you post this as an answer... and maybe explain in more detail why the refcounting is not working then? I know about the difference of assigning a pointer to an object and an objected directly... and what refcounting is. However, I don't see where and why this causes this kind of behaviour in that special case.
cv::Mat is a smart pointer already.
and while this seems to have solved your momentary problems, for sure, there are more hidden in your current approach. (e.g. the gui code)
why the multi-theading at all ? (a lot of opencv functions are NOT thread-safe by design (e.g. to enable internal parallelism))
AND: I currently see no way around multithreading, as I need to make sure all CPU cores are used. Also the whole architecture of the program depends on it... It's already about 10000 lines of code, working fine except for not being able to use UMat and therefore no GPU so far.
GUI is done via a Web Interface, works fine. But, currently I've this generalized queues which forward data as a "void*" to allow different types with manual type checking. I'll need to clean this up by using a wrapper class structure instead.
You never delete the mat. You never unlock your mutex.
I know, forgot that in the example. The final consumer needs to do that... (will fix the above code)
PS: A "std::unique_lock<std::mutex>" is unlocked as soon as it's scope ends and it's destructor is called. That's all fine there...
OK thanks :D
Well... thank YOU for the inspection of my code! :P
Unfortunately, the issue wasn't resolved by not using pointers to cv::UMat (would be strange anyway). Seems to be a timing issue, that is somehow prevented when single threaded.
reason for downvote: your code above is entirely made-up, artificial, and does not represent any actual problem.@berak: It is a practical problem from a research project; I just reduced it to the least amount of code to reproduce the issue. How should someone else help, if I would post the whole thing, where even I would not be sure, that the problem is not somewhere within the rest of the source code. Yesterday, I changed the whole queueing implementation to use non-pointer datatypes instead of void* converted HEAP-Objects (e.g. *UMat). I edited about 200 lines of code, just to find out that it somehow worked in the small example above, but not within the whole engine. // By the way: calling waitKey(1) in the main thread does not work (no content updates in the window); it seems it needs to be called in the thread that opened the window (e.g. via imshow()).
yea, ok. strike that, apologies.
@berak: Just to make sure, I modified the above example to a single threaded version (see https://codepad.remoteinterview.io/SP...), still using queues; works fine. Thus, its sure the issue is appearing in a multi-threaded context only and is not caused by the copy process during queueing. It seems that calls to functions as cv::Canny just tell OpenCL to start processing it in background and return and that later access to the target UMat is prevented/delayed until it's finished, which working fine in a single-threaded environment, but fails when in a multi-threading environment. Any idea where in the OpenCV/UMat source this could be? Maybe, I can force waiting manually for OpenCL outside...
NB: I added a section in the posting above about which Versions/OS/Machines I tried so