Ask Your Question

Revision history [back]

Because I don't mention which version of OpenCV you use so I suppose you use precompiled (Release version) of OpenCV (aka OpenCV 2.4.8). That version was not compiled with TBB so parallel_for_ does not give any improvement in speed. That why in your test, the Matiterator is faster then parallel_for_. For pointer version (2.), since sqrt function of OpenCV is implemented with SSE2 function, which can process 4 float (or 2 double) numbers at a CPU cycle, it is obviously faster than your simple pointer implementation (which can only works with 1 float number), if you use SSE with pointer and 2 for iterations, the speed will be the same. I have tested parallel_for_ with TBB enable version of OpenCV, and it usually gives nCores times faster than normal MatIterator (where nCores is the number of CPU cores on your system). So I suggest you to build OpenCV with TBB on your own and use parallel_for_, which can be seen more at http://answers.opencv.org/question/22115/best-way-to-apply-a-function-to-each-element-of/. Hope this help.