Using OPENCL is slower than normal version

asked 2018-09-05 13:09:09 -0500

Hi!

I want to match some templates against an Image. To speed the process up I wanted to utilize OPENCL but the opposite happens.

Here is a small example source code:

import cv2
import numpy as np

img = np.uint8(np.random.random_integers(0,255,(500,500))) # random Image data

tpls = [np.uint8(np.random.random_integers(0,255,(20,20))) for i in range(0,100)] # random templates

# OPENCL off
s = []
a = cv2.getTickCount()
s = [cv2.matchTemplate(img,tpl,cv2.TM_CCORR_NORMED) for tpl in tpls]
b = cv2.getTickCount()        
print((b-a)/cv2.getTickFrequency())

img = cv2.UMat(img) # convert img to UMat
tpls = [cv2.UMat(tpl) for tpl in tpls] # convert templates to UMat

# OPENCL on
s = []
a = cv2.getTickCount()
s = [cv2.matchTemplate(img,tpl,cv2.TM_CCORR_NORMED) for tpl in tpls]
b = cv2.getTickCount()        
print((b-a)/cv2.getTickFrequency())

The results are as following:

Hardware: Intel Core i7 8650U vs Intel UHD Graphics 620 vs NVIDIA GeForce GTX1050
Software: OPENCV 3.4.2 on Python 3.6.6
CPU: 0.342378
UHD 620: 0.8755508
GTX 1050: 0.6655146

My taskmanager shows that the intended GPU is used. (I use environment variable OPENCV_OPENCL_DEVICE :GPU:0 for UHD and :GPU:1 for GTX)

Some ideas what I can do to get better performance? Maybe someone could try my code and tell me if he gets similar results?

Thank you in Advance

gw

edit retag flag offensive close merge delete

Comments

Often you will get an answer that says: The first time a GPU Kernel is called takes ages because it needs to compile first. Call it once, then measure time on subsequent calls (unless you have like 1k elements in tpls).

DrLu gravatar imageDrLu ( 2018-09-06 02:32:06 -0500 )edit

Glimpsing at the source I get the impression, that UMat is converted into Mat.

DrLu gravatar imageDrLu ( 2018-09-06 04:10:23 -0500 )edit