Actually HoughCircles is not multithreaded as you can see in the sources (link).
You are free to make a pull request.
For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries.
As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.
EDIT:
I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far.
Consider that writing to same memory location is problematic.
class parallelClass : public cv::ParallelLoopBody {
public:
// Constructor: here you can pass variables from outside
// and those who shall be passed to outside
parallelClass(const cv::Mat &_in, cv::Mat &_out) :
in(_in), out(_out) // assign variables from outside to global internal variables
{
// Do something which affects all threads and global variables
out.create(in.size(), in.type());
temp.create(in.size(), in.type());
}
// For completeness
~parallelClass() {}
parallelClass& parallelClass=(const parallelClass&) {return *this;}
// This is an overloaded () operator which executes the calculation for every thread
// Range will be splitted by parallel_for_ automaticly
void operator()(const Range &boundaries) const
{
// Do the loop which you will parallelize.
// boundaries.start and boundaries.end will be set from parallel_for_ for every thread
for(int i = boundaries.start; i < boundaries.end; ++i) {
// Do loop stuff example (makes no sense)
for(int j = 0; j < src.cols; ++j) {
temp.at<char>(i, j) = in.at<char>(i, j) - 1;
out.at<char>(i, j) = temp.at<char>(i, j) + 1;
}
}
}
private:
// Declare global variables
const cv::Mat ∈
cv::Mat &out, temp;
}
Executing the class is here:
cv::Mat src, dst;
src = cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;
// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());
for(int i = 0; i < src.rows; ++i) {
for(int j = 0; j < src.cols; ++j) {
temp.at<char>(i, j) = src.at<char>(i, j) - 1;
dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
}
}
*/
// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);
// Executes the calculation and splits the calculation with numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);
EDIT 2:
I made a pull request for parallelizing HoughCircles here. Please test it and it would be great to get some feedback about performance and issues. The complete fork is here.
I think the for loop on line 1058 is independend from rows, so you could put this one intro a class which inherits from ParallelLoopBody.
To give an example, look here starting from line 639. Calling the function is at line 820. I think this should be a relative simple implementation
mein deutsch ist nicht perfekt. aber vielleicht können wir uns über die implementierung auf andere weise unterhalten? ich tue mich in moment noch schwer damit zu verstehen, wie parallel_for_ überhaupt funktioniert. ich habe zwar einige beispiele gesehen, aber aufgrund meiner schwachen c++ kenntnisse liegt die betonung auf 'relativ' einfach.
can you tell me the resources which explain the basics behind that, which needs to be known for the implementation? I havnt yet looked through that parallel_for_ wrapper, although I have already seen multiple examples.