Using OPENCL is slower than normal version
Hi!
I want to match some templates against an Image. To speed the process up I wanted to utilize OPENCL but the opposite happens.
Here is a small example source code:
import cv2
import numpy as np
img = np.uint8(np.random.random_integers(0,255,(500,500))) # random Image data
tpls = [np.uint8(np.random.random_integers(0,255,(20,20))) for i in range(0,100)] # random templates
# OPENCL off
s = []
a = cv2.getTickCount()
s = [cv2.matchTemplate(img,tpl,cv2.TM_CCORR_NORMED) for tpl in tpls]
b = cv2.getTickCount()
print((b-a)/cv2.getTickFrequency())
img = cv2.UMat(img) # convert img to UMat
tpls = [cv2.UMat(tpl) for tpl in tpls] # convert templates to UMat
# OPENCL on
s = []
a = cv2.getTickCount()
s = [cv2.matchTemplate(img,tpl,cv2.TM_CCORR_NORMED) for tpl in tpls]
b = cv2.getTickCount()
print((b-a)/cv2.getTickFrequency())
The results are as following:
Hardware: Intel Core i7 8650U vs Intel UHD Graphics 620 vs NVIDIA GeForce GTX1050
Software: OPENCV 3.4.2 on Python 3.6.6
CPU: 0.342378
UHD 620: 0.8755508
GTX 1050: 0.6655146
My taskmanager shows that the intended GPU is used. (I use environment variable OPENCV_OPENCL_DEVICE :GPU:0 for UHD and :GPU:1 for GTX)
Some ideas what I can do to get better performance? Maybe someone could try my code and tell me if he gets similar results?
Thank you in Advance
gw
Often you will get an answer that says: The first time a GPU Kernel is called takes ages because it needs to compile first. Call it once, then measure time on subsequent calls (unless you have like 1k elements in tpls).
Glimpsing at the source I get the impression, that UMat is converted into Mat.