Different results for matrix multiplication gemm() for mat vs umat

asked 2018-03-24 04:10:50 -0500

Udhav gravatar image

updated 2018-03-24 06:37:30 -0500

I am trying to perform huge matrix multiplication using gemm() function. When I use Mat variables it takes a long time, so I switched to UMat. But I got very different results when using UMat for the same operation. Some of the values were also NaN. Here is the sample that I ran afterwards

int main(int argc, char** argv){
cv::Mat m1 = cv::Mat::ones(5, 1, CV_32FC1);
cv::Mat m2 = cv::Mat::zeros(1, 5, CV_32FC1);
cv::Mat output;

cv::gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_1_T + GEMM_2_T);

std::cout << output;
return 0;
}

Output: [0]

int main(int argc, char** argv){
cv::UMat m1 = cv::UMat::ones(5, 1, CV_32FC1);
cv::UMat m2 = cv::UMat::zeros(1, 5, CV_32FC1);
cv::UMat output;

cv::gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_1_T + GEMM_2_T);

std::cout << output;
return 0;
}

Output: [5.4256896e+35]

Can someone please tell me why there are different values for same operation and how do I correct it. I cannot simply use Mat since I want to use GPU to reduce time taken.

Edit: I narrowed down the problem and it seems it only occurs when a single row matrix or a single column matrix is involved in either of the matrices being mulitplied.

int main(int argc, char** argv){
UMat m1(1, 5, CV_32FC1);
UMat m2(5, 1, CV_32FC1);
randu(m1,Scalar::all(0),Scalar::all(1));
randu(m2, Scalar::all(0), Scalar::all(1));

UMat output;

gemm(m1, m2, 1.0, noArray(), 0.0, output);

cout << m1 << endl;
cout << m2 << endl;
cout << output << endl;
return 0;
}

Output: image description

int main(int argc, char** argv){
UMat m1(2, 5, CV_32FC1);
UMat m2(2, 5, CV_32FC1);
randu(m1,Scalar::all(0),Scalar::all(1));
randu(m2, Scalar::all(0), Scalar::all(1));

UMat output;

gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_2_T);

cout << m1 << endl;
cout << m2 << endl;
cout << output << endl;
return 0;
}

Output: image description

edit retag flag offensive close merge delete

Comments

1

Please remove screenshot and copy program and results as text

LBerger gravatar imageLBerger ( 2018-03-24 04:25:26 -0500 )edit
1

Changed, also the problem only seems to be occurring when I have to transpose a matrix.

Udhav gravatar imageUdhav ( 2018-03-24 06:02:29 -0500 )edit

No problem with result using your code my opencv version is 3.4.1-dev and platform windows 10 msvc 2017 win 64and graphics card [ INFO:0] Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_970--390_77.

What is your version and platform? (you can insert in post getbuidinformation :cout << getBuildInformation() << endl;

BUT there is an exception thrown : [ INFO:0] Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_970--390_77 OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('transpose', dims=2, globalsize=1x8x1, localsize=32x8x1) sync=false

LBerger gravatar imageLBerger ( 2018-03-24 06:13:45 -0500 )edit

Opencv version is 3.4.0, windows 8.1 64bit and graphics card [ INFO 0: ] Preparing OpenCL cache configuration for context: 32-bit--Advanced_Micro_Devices__Inc_--Hainan--2348_4

Udhav gravatar imageUdhav ( 2018-03-24 06:34:42 -0500 )edit

Also, turns out that transpose is not the problem, it gives wrong result for row/column matrices.

Udhav gravatar imageUdhav ( 2018-03-24 06:35:36 -0500 )edit

You can update to 3.4.1 I don't think it will solve problem. Can you update driver gpu ? In nvidia opencl compiler is nvopencl.dll. Can you update this compiler (amd of course)?

LBerger gravatar imageLBerger ( 2018-03-24 08:44:16 -0500 )edit