Transparent API performance discrepancy
Hello everyone, I'm using OpenCV's TAPI, I have 2 different computers. Laptop1 has an Radeon 8690m with OpenCL C 1.1. Laptop2 has a GTX 1050 (384.130 power setting set to prefer GPU) with OpenCL C 1.2. Both laptops have been compiled with the exact same setting except for the addition of CUDA 8 on laptop2. Both are also running the exact same code on the same version of OpenCV(3.4.2-dev) on Ubuntu 16.04. The OCL module recognizes both GPUs.
Image I'm loading in is a 2048x1536 color
code snippet:
cv::Mat m = cv::imread("img.jpg");//2048x1536
cv::UMat u;
while(1){
m.copyTo(u)
startTime();
cv::GaussianBlur(u, u, cv::Size(3x3), 3.5)
endTime();
}
on Laptop 1 I get ~24ms. On Laptop2 I get ~34ms. I let the loop run for a couple of seconds. Now the fun begins. On Laptop2 if I change GaussianBlur to use m instead of u my time is ~20ms, while Laptop 1 gets worse performance as expected (complete opposites). Is there some implementation under the hood that could be affecting the performance or is there some other issue? Thanks
With opencl don't use first loop to estimate time : first loop =compile opencl kernel + gaussianblur
I didnt, my outputs for time was the first couple were large then the values became stable. So I took the average of the middle few out of 50 values.
I checked gpu utilization using nvidia xsever both versions of code used about ~40% and pcie bandwidth ~6% utilization.
But you are comparing OpenCL GPU versus CPU processing. By processing a single image in a loop, you have the bottleneck of pushing the data to GPU memory every single time, so for me it makes sence that if you use
m
instead ofu
, and thus you are running on CPU with RAM memory access, than it runs faster for that single image. GPU's are only faster if you first allocate for example 100 images on GPU memory and process them and get the result back.Sure there is a convert Mat->UMat, but I'm timing the actual computation which shouldn't be affected by memory transfers, since the transfer should be completed before the computation starts.
try with this code
So I changed how I went about performing the Gaussian blur on Laptop2. Originally I was passing in a 3 channel image 2048x1536. I performed the same test on Laptop1. Same as the code in the Question
Next I converted the image to single channel using inRange but still timed only the Gaussian.