"illegal memory access" when calling cv::cuda::ORB::detectAndComputeAsync() from multiple threads
I'm trying to run multiple threads (std::thread), each of which opens a VideoCapture on a different device, captures frames, uploads them to the GPU, and computes ORB features & descriptors. If I run more than one thread the program fails on the call to detectAndComputeAsync. The exact error message is not consistent; on each run I see one of ~9 different errors, almost all of which include "an illegal memory access was encountered".
I've posted the (simplest version of the) offending code here:
https://gist.github.com/jrussino/c10a...
I also pasted each variant of error message that I observed into comment below that code snippet.
Notes:
- I'm using a different cuda::Stream for each thread's CUDA operations.
- This runs without error if I comment out the call to detectAndComputeAsync on line 49.
- This runs without error If I only run one thread (i.e. comment out lines 66 and 70).
"I'm trying to run multiple threads " -- don't.
what made you think, any of it would be thread-safe ?
So then how is detectAndComputeAsync supposed to be used (or any of the asynchronous CUDA methods)? I've never seen an example of cv::Feature2DAsync in use, so if somebody has a non-trivial working example that would be immensely helpful. In what sense can this method be called asynchronously?