OpenCV OpenCL Thread Saftey - Deadlock (changing cv::Mat to UMat)

asked 2018-01-12 06:15:06 -0500

updated 2018-01-16 04:10:02 -0500

I've been converting an OpenCV program from using cv::Mat to cv::UMat with the intention of increasing performance (which it does). What I'm experiencing is a deadlock in the OpenCL code:

1  __lll_lock_wait                                                                                                                                                                                    lowlevellock.S                 135  0x7fffeeb6626d 
2  __GI___pthread_mutex_lock                                                                                                                                                                          pthread_mutex_lock.c           115  0x7fffeeb5fe42 
3  cv::ocl::OpenCLAllocator::copy(cv::UMatData *, cv::UMatData *, int, unsigned long const *, unsigned long const *, unsigned long const *, unsigned long const *, unsigned long const *, bool) const                                     0x7ffff2e9b44f 
4  cv::UMat::copyTo(cv::_OutputArray const&) const                                                                                                                                                                                        0x7ffff2ff13c8 
5  cv::UMat::clone                                                                                                                                                                                    mat.inl.hpp                    3685 0x7ffff75b78ff 


1  __lll_lock_wait                                                                                                                                                                                    lowlevellock.S                 135  0x7fffeeb6626d 
2  __GI___pthread_mutex_lock                                                                                                                                                                          pthread_mutex_lock.c           115  0x7fffeeb5fe42 
3  cv::ocl::OpenCLAllocator::copy(cv::UMatData *, cv::UMatData *, int, unsigned long const *, unsigned long const *, unsigned long const *, unsigned long const *, unsigned long const *, bool) const                                     0x7ffff2e9b44f 
4  cv::UMat::copyTo(cv::_OutputArray const&) const                                                                                                                                                                                        0x7ffff2ff13c8 
5  cv::UMat::clone                                                                                                                                                                                    mat.inl.hpp                    3685 0x7ffff75b78ff

The code I'm using is multi-threaded and had no issues at all before using the cv::Mat structure.

It seems the issue occurs on copying the data.

I'm using OpenCV 3.4 on Linux (Ubuntu 16.04) and have tried with both intel and Nvidia GPU cards, both have the same issue.

Is the OpenCL using UMat meant to be thread safe? At no point to I work on the same data in the same thread at the same time. Data is passed between threads by pointers which worked with the Mat structure.

I'm guessing it is suppose to be thread safe as otherwise no need for the lock at all.

Is this a bug or am I doing something wrong? Is there a workaround?

EDIT:

So, it seems the bug arises due to the double lock on lines 5419 of ocl.cpp

UMatDataAutoLock src_autolock(src); 
UMatDataAutoLock dst_autolock(dst);

When multiple UMats are used. The locks are acquired from

From UMatrix.cpp

enum { UMAT_NLOCKS = 31 }; 
static Mutex umatLocks[UMAT_NLOCKS];


void UMatData::lock() 
{ 
  umatLocks[(size_t)(void*)this % UMAT_NLOCKS].lock();
}

void UMatData::unlock()
{ 
  umatLocks[(size_t)(void*)this % UMAT_NLOCKS].unlock(); 
}

It seems that, it is possible that if (size_t)(void*)this % UMAT_NLOCKS is the same for both the destination and source matrix, this will then take the same lock and try to lock it twice, causing a deadlock. This is definitely a bug in OpenCV.

I've temporally fixed the issue by increasing UMAT_NLOCKS to a much higher number, but this is not optimal

edit retag flag offensive close merge delete

Comments

1

"The code I'm using is multi-threaded " -- why do you think, any of this is thread-safe ?

berak gravatar imageberak ( 2018-01-12 06:43:39 -0500 )edit

My code is completely fine. I'm not accessing the same data from a different thread at any time. I did not have any issue with this before changing to UMat. Looking at the openCV source, issue is probably due to UMatDataAutoLock src_autolock(src); UMatDataAutoLock dst_autolock(dst);

lines 5419 of ocl.cpp

There is also another lock cv::AutoLock lock(cleanupQueueMutex); that may be related.

MalfunctioningDroid gravatar imageMalfunctioningDroid ( 2018-01-12 07:25:37 -0500 )edit

It should also be noted that the deadlocks occur with completely different UMats, it's not on the same data.

MalfunctioningDroid gravatar imageMalfunctioningDroid ( 2018-01-12 07:31:36 -0500 )edit

Please provide some reproducer code

mshabunin gravatar imagemshabunin ( 2018-01-12 08:56:49 -0500 )edit

Built OpenCV with debug symbols and the issue was with acquiring a mutex Seems changing line 57 of UMatrix.cpp

from: enum { UMAT_NLOCKS = 31 };

to enum { UMAT_NLOCKS = 521 };

Has fixed or massively decreased the chance of this deadlock occurring. It uses static Mutex umatLocks[UMAT_NLOCKS]; to hold a number of mutexs

and then locks with:

void UMatData::lock() { umatLocks[(size_t)(void*)this % UMAT_NLOCKS].lock(); }

void UMatData::unlock() { umatLocks[(size_t)(void*)this % UMAT_NLOCKS].unlock(); }

Not so sure why it's done this way but it does cause a deadlock in some circumstances

MalfunctioningDroid gravatar imageMalfunctioningDroid ( 2018-01-15 14:25:02 -0500 )edit

@MalfunctioningDroid , there is a patch fixing this issue, could you please check it: https://github.com/opencv/opencv/pull... ?

mshabunin gravatar imagemshabunin ( 2018-01-17 06:30:50 -0500 )edit

Tested and Working, Thanks!

MalfunctioningDroid gravatar imageMalfunctioningDroid ( 2018-01-17 09:09:55 -0500 )edit