Revision history - OpenCV Q&A Forum

Your code captures some, but not all important elements involved in a task parallelism framework.

A proper framework is more aptly called "a parallel task queue execution system", rather than the older concept known as "thread pool".

Some things to check:

use thread-safe data structures everywhere inside the framework;
accept new tasks while the framework is running (without requiring that all tasks can only be added during initialization)
reuse threads without killing them (applicable to some platforms where thread creation/destruction is expensive),
pop and execute next task if task queue is empty without entering sleep (applicable to some platforms where thread sleep / awake is inefficient)
Efficient waking of threads when new data comes in. (On Windows, this is done with a "I/O completion port" feature.)
Efficient hand-off between two threads: if thread A sets a signal and goes immediately to sleep, while thread B is the only one waiting on that signal and begins executing, then thread B should basically pick up the CPU slice that thread A was using. This is an OS feature, not something that can be mimicked by library software alone.

As you can see, so far as you are only concerned with Linux, it is not necessary to over-design a parallel task queue execution engine. However, as soon as you cross the chasm to Windows, all of the "concerns" are applied, and the engine design will become vastly different.

OpenCV does not design its own engine. Instead, it delegates to whatever engine that is available on the platform, such as TBB or PPL or OpenMP. These big-vendor engines have been optimized for every single platform they're designed to run.

Your code captures some, but not all important elements involved in a task parallelism framework.

A proper framework is more aptly called "a parallel task queue execution system", rather than the older concept known as "thread pool".

Some things to check:

use thread-safe data structures everywhere inside the framework;
accept new tasks while the framework is running (without requiring that all tasks can only be added during initialization)
reuse threads without killing them (applicable to some platforms where thread creation/destruction is expensive),
pop and execute next task if task queue is empty without entering sleep (applicable to some platforms where thread sleep / awake is inefficient)
Efficient waking of threads when new data comes in. (On Windows, this is done with a "I/O completion port" feature.)
Efficient hand-off between two threads: if thread A sets a signal and goes immediately to sleep, while thread B is the only one waiting on that signal and begins executing, then thread B should basically pick up the CPU slice that thread A was using. This is an OS feature, not something that can be mimicked by library software alone.

As you can see, so far as you are only concerned with Linux, it is not necessary to over-design a parallel task queue execution engine. However, as soon as you cross the chasm to Windows, all of the "concerns" are applied, and the engine design will become vastly different.

OpenCV does not design its own engine. Instead, it delegates to whatever engine that is available on the platform, such as TBB or PPL or OpenMP. These big-vendor engines have been optimized for every single platform they're designed to run.

With regard to the thread-safety inside OpenCV:

Basically, you are on your own. Multithreading bugs have been found and fixed on OpenCV, but new bugs continue to be found and fixed. If you suspect a bug, you can open a bug request, or better yet submit a test case and also a pull request to the Github repository.

When one is accessing an OpenCV matrix from multiple threads, the methods such as Mat.Ptr basically allows one to access its data as if it were in a C program. You deal with the raw pointers, and you read about your C++ platform's instructions about the thread-safety of compiler-generated code, and you write your code to be thread-safe.

There is no help or magic involved. You will need to decide, and perform any locking that is deemed necessary.

Your code captures some, but not all important elements involved in a task parallelism framework.

A proper framework is more aptly called "a parallel task queue execution system", rather than the older concept known as "thread pool".

Some things to check:

use thread-safe data structures everywhere inside the framework;
accept new tasks while the framework is running (without requiring that all tasks can only be added during initialization)
reuse threads without killing them (applicable to some platforms where thread creation/destruction is expensive),
avoid activating more threads than there are processors (physical or virtual). Activating more threads means the CPUs have to switch between tasks, which adds overhead.
for multi-socket CPU systems only - avoid migrating tasks from one socket to another, unless one takes care of a number of issues. (details omitted.)
provide a high-performance multi-threaded malloc. On some platforms, the library-provided malloc may have a critical section that will become a bottleneck when running heavily multi-threaded workload with concurrent memory allocations and releases.
pop and execute next task if task queue is empty without entering sleep (applicable to some platforms where thread sleep / awake is inefficient)
Efficient waking of threads when new data comes in. (On Windows, this is done with a "I/O completion port" feature.)
Efficient hand-off between two threads: if thread A sets a signal and goes immediately to sleep, while thread B is the only one waiting on that signal and begins executing, then thread B should basically pick up the CPU slice that thread A was using. This is an OS feature, not something that can be mimicked by library software alone.

As you can see, so far as you are only concerned with Linux, it is not necessary to over-design a parallel task queue execution engine. However, as soon as you cross the chasm to Windows, all of the "concerns" are applied, and the engine design will become vastly different.

OpenCV does not design its own engine. Instead, it delegates to whatever engine that is available on the platform, such as TBB or PPL or OpenMP. These big-vendor engines have been optimized for every single platform they're designed to run.

With regard to the thread-safety inside OpenCV:

Basically, you are on your own. Multithreading bugs have been found and fixed on OpenCV, but new bugs continue to be found and fixed. If you suspect a bug, you can open a bug request, or better yet submit a test case and also a pull request to the Github repository.

When one is accessing an OpenCV matrix from multiple threads, the methods such as Mat.Ptr basically allows one to access its data as if it were in a C program. You deal with the raw pointers, and you read about your C++ platform's instructions about the thread-safety of compiler-generated code, and you write your code to be thread-safe.

There is no help or magic involved. You will need to decide, and perform any locking that is deemed necessary.

Revision history [back]