MatchTemplate OCL and CPU Timing
Hi,
I was looking for a performance boost using the OCL implementation of matchtemplate(), but the Mat and UMat implementation both take the same time. When diving a little deeper, It is the normalization that takes the majority of the time, anyone have anymore information on how to implement the fastest normalized template matching?
Results:
CPU
TM_CORR: 16ms
TM_CORR_NORMED: 47ms
OCL
TM_CORR: 0ms
TM_CORR_NORMED: 47ms
After diving a little deeper, I am not convinced the opencl normalized functions are implemented for a 32bit float.