I have been reading lately (list of questions on QA at the end) on the use of parallelism in computations on OpenCV (with Core i7 Intel CPU, no GPU, C++ VStudio). I am working on a project that might benefit from making some operations explicitly parallels. I am repeating the same computation on different portions of the same image, and they have no (from a theoretical standpoint, I don't know of hidden dependencies) influence between each others, other than being done on the same source image.
I am speaking of about a 100 iterations (2 nested for loops, ~10 it on each loop). I am computing a BF matching, a findHomography and another computation/qualification metric on each iter, plus a push_back of the results to a vector, though the exact sutff done could change/get heavier.
Now, I wonder if with all the current compiler optimization magic going on when I click on "Build Release" and the use of Intel Performance Primitives (IPP) in OpenCV3, is it useful to take time to explicitly parallelize my for loops with a parallel_for_ ? (By the way, have things changed in OpenCV3?)
Right now I have a working code that is executing in what seems to be an acceptable amount of time for the application/ cases that I have tested, but
1 - the 2 nested loops are the most time consuming portion of my code;
2 - Every millisecond counts; this might be one of the portion of the code that I would be able to "optimize" with my somewhat low experience in programming;
3 - the computation included in those loops might get heavier with time to increase robustness. I will try my best to optimize these code/ not add unnecessary code to the loop, but this might just get heavier (increase the number of iteration, etc.)
Bottom line: When is it really fruitful to explicitly parallelize code and have the methods changed in OpenCV3 ?