Different results for matrix multiplication gemm() for mat vs umat

asked 2018-03-24 04:10:50 -0600

Udhav
1 ●1

updated 2018-03-24 06:37:30 -0600

I am trying to perform huge matrix multiplication using gemm() function. When I use Mat variables it takes a long time, so I switched to UMat. But I got very different results when using UMat for the same operation. Some of the values were also NaN. Here is the sample that I ran afterwards

int main(int argc, char** argv){
cv::Mat m1 = cv::Mat::ones(5, 1, CV_32FC1);
cv::Mat m2 = cv::Mat::zeros(1, 5, CV_32FC1);
cv::Mat output;

cv::gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_1_T + GEMM_2_T);

std::cout << output;
return 0;
}

Output: [0]

int main(int argc, char** argv){
cv::UMat m1 = cv::UMat::ones(5, 1, CV_32FC1);
cv::UMat m2 = cv::UMat::zeros(1, 5, CV_32FC1);
cv::UMat output;

cv::gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_1_T + GEMM_2_T);

std::cout << output;
return 0;
}

Output: [5.4256896e+35]

Can someone please tell me why there are different values for same operation and how do I correct it. I cannot simply use Mat since I want to use GPU to reduce time taken.

Edit: I narrowed down the problem and it seems it only occurs when a single row matrix or a single column matrix is involved in either of the matrices being mulitplied.

int main(int argc, char** argv){
UMat m1(1, 5, CV_32FC1);
UMat m2(5, 1, CV_32FC1);
randu(m1,Scalar::all(0),Scalar::all(1));
randu(m2, Scalar::all(0), Scalar::all(1));

UMat output;

gemm(m1, m2, 1.0, noArray(), 0.0, output);

cout << m1 << endl;
cout << m2 << endl;
cout << output << endl;
return 0;
}

Output: image description

int main(int argc, char** argv){
UMat m1(2, 5, CV_32FC1);
UMat m2(2, 5, CV_32FC1);
randu(m1,Scalar::all(0),Scalar::all(1));
randu(m2, Scalar::all(0), Scalar::all(1));

UMat output;

gemm(m1, m2, 1.0, noArray(), 0.0, output, GEMM_2_T);

cout << m1 << endl;
cout << m2 << endl;
cout << output << endl;
return 0;
}

Output: image description

edit retag flag offensive close merge delete

Comments

Please remove screenshot and copy program and results as text

LBerger ( 2018-03-24 04:25:26 -0600 )edit

Changed, also the problem only seems to be occurring when I have to transpose a matrix.

Udhav ( 2018-03-24 06:02:29 -0600 )edit

No problem with result using your code my opencv version is 3.4.1-dev and platform windows 10 msvc 2017 win 64and graphics card [ INFO:0] Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_970--390_77.

What is your version and platform? (you can insert in post getbuidinformation :cout << getBuildInformation() << endl;

BUT there is an exception thrown : [ INFO:0] Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_970--390_77 OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('transpose', dims=2, globalsize=1x8x1, localsize=32x8x1) sync=false

LBerger ( 2018-03-24 06:13:45 -0600 )edit

Opencv version is 3.4.0, windows 8.1 64bit and graphics card [ INFO 0: ] Preparing OpenCL cache configuration for context: 32-bit--Advanced_Micro_Devices__Inc_--Hainan--2348_4

Udhav ( 2018-03-24 06:34:42 -0600 )edit

Also, turns out that transpose is not the problem, it gives wrong result for row/column matrices.

Udhav ( 2018-03-24 06:35:36 -0600 )edit

You can update to 3.4.1 I don't think it will solve problem. Can you update driver gpu ? In nvidia opencl compiler is nvopencl.dll. Can you update this compiler (amd of course)?

LBerger ( 2018-03-24 08:44:16 -0600 )edit

add a comment

Different results for matrix multiplication gemm() for mat vs umat

Comments

Links

Question Tools

Stats

Related questions

Different results for matrix multiplication gemm() for mat vs umat edit

Comments

Links

Question Tools

Stats

Related questions

Different results for matrix multiplication gemm() for mat vs umat