# High cv::solve error if trained on single line

I use cv::solve to solve two dimensional linear regression. If the training data happen to be on single line (that is, y is equal for all training variables, then the extrapolation produces large error.

Say, the training data are:

X     Y     Z
----  ----  -----
4     7     458
5     7     554
7     7     735
8     7     826


The calculated coefficients are (notice the last two are very large numbers):

{92.8825, 3.74394e+007, -2.62076e+008}


If I use these to extrapolate the original values, large error is produced:

X     Y     Z'
----  ----  -----
4     7     427.53
5     7     520.412
7     7     706.177
8     7     799.06


All values are smaller by about 26-30. This seems to be an edge case. In my use case, if I have values all on single line (horizontal or vertical), I will predict the values only for that line, turning it effectively into one-dimensional linear regression. But the error is unacceptable.

Here is the code:

static void print(float a, float b, float c, int x, int y) {
cout << "x=" << x << ", y=" << y << ", z=" << (a*x + b*y + c) << endl;
}

int main() {
Mat matX(4, 3, CV_32F);
Mat matZ(4, 1, CV_32F);

int idx = 0;
matX.at<float>(idx, 0) = 4;
matX.at<float>(idx, 1) = 7;
matX.at<float>(idx, 2) = 1;
matZ.at<float>(idx++, 0) = 458;

matX.at<float>(idx, 0) = 5;
matX.at<float>(idx, 1) = 7;
matX.at<float>(idx, 2) = 1;
matZ.at<float>(idx++, 0) = 554;

matX.at<float>(idx, 0) = 7;
matX.at<float>(idx, 1) = 7;
matX.at<float>(idx, 2) = 1;
matZ.at<float>(idx++, 0) = 734;

matX.at<float>(idx, 0) = 8;
matX.at<float>(idx, 1) = 7;
matX.at<float>(idx, 2) = 1;
matZ.at<float>(idx++, 0) = 826;

Mat res(3, 1, CV_32F);

cv::solve(matX, matZ, res, DECOMP_QR);

float a = res.at<float>(0);
float b = res.at<float>(1);
float c = res.at<float>(2);

cout << "a=" << a << ", b=" << b << ", c=" << c << endl;
print(a, b, c, 4, 7);
print(a, b, c, 5, 7);
print(a, b, c, 6, 7);
print(a, b, c, 7, 7);
print(a, b, c, 8, 7);
}

edit retag close merge delete

Sort by » oldest newest most voted

A plane can be defined by 3 points, that are not on the same line. This seems to be an edge case for the algorithm, that should actually fail with infinitely many solutions. Although extrapolations can be made for point on the same line, the error seems to be prohibitely large. For points not on the line, the extrapolations are insane.

A workaround seems to be to add a small value, so that the input data are not precisely on the same line. In the example in the question, adding 0.0001 to the first row produced this results:

X     Y     Z'
----  ----  -----
4     7     463.17
5     7     553.736
7     7     734.869
8     7     825.435


Adding this to the program might be easier, but it would more correct to avoid such cases or to use simple linear regression in this case.

more

Official site

GitHub

Wiki

Documentation