Ask Your Question
0

High `cv::solve` error if trained on single line

asked 2016-02-05 16:55:55 -0600

Oliv gravatar image

updated 2016-02-08 03:35:48 -0600

I use cv::solve to solve two dimensional linear regression. If the training data happen to be on single line (that is, y is equal for all training variables, then the extrapolation produces large error.

Say, the training data are:

X     Y     Z
----  ----  -----
4     7     458
5     7     554
7     7     735
8     7     826

The calculated coefficients are (notice the last two are very large numbers):

{92.8825, 3.74394e+007, -2.62076e+008}

If I use these to extrapolate the original values, large error is produced:

X     Y     Z'
----  ----  -----
4     7     427.53
5     7     520.412
7     7     706.177
8     7     799.06

All values are smaller by about 26-30. This seems to be an edge case. In my use case, if I have values all on single line (horizontal or vertical), I will predict the values only for that line, turning it effectively into one-dimensional linear regression. But the error is unacceptable.

Here is the code:

static void print(float a, float b, float c, int x, int y) {
    cout << "x=" << x << ", y=" << y << ", z=" << (a*x + b*y + c) << endl;
}

int main() {
    Mat matX(4, 3, CV_32F);
    Mat matZ(4, 1, CV_32F);

    int idx = 0;
    matX.at<float>(idx, 0) = 4;
    matX.at<float>(idx, 1) = 7;
    matX.at<float>(idx, 2) = 1;
    matZ.at<float>(idx++, 0) = 458;

    matX.at<float>(idx, 0) = 5;
    matX.at<float>(idx, 1) = 7;
    matX.at<float>(idx, 2) = 1;
    matZ.at<float>(idx++, 0) = 554;

    matX.at<float>(idx, 0) = 7;
    matX.at<float>(idx, 1) = 7;
    matX.at<float>(idx, 2) = 1;
    matZ.at<float>(idx++, 0) = 734;

    matX.at<float>(idx, 0) = 8;
    matX.at<float>(idx, 1) = 7;
    matX.at<float>(idx, 2) = 1;
    matZ.at<float>(idx++, 0) = 826;

    Mat res(3, 1, CV_32F);

    cv::solve(matX, matZ, res, DECOMP_QR);

    float a = res.at<float>(0);
    float b = res.at<float>(1);
    float c = res.at<float>(2);

    cout << "a=" << a << ", b=" << b << ", c=" << c << endl;
    print(a, b, c, 4, 7);
    print(a, b, c, 5, 7);
    print(a, b, c, 6, 7);
    print(a, b, c, 7, 7);
    print(a, b, c, 8, 7);
}
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2016-02-08 03:35:08 -0600

Oliv gravatar image

A plane can be defined by 3 points, that are not on the same line. This seems to be an edge case for the algorithm, that should actually fail with infinitely many solutions. Although extrapolations can be made for point on the same line, the error seems to be prohibitely large. For points not on the line, the extrapolations are insane.

A workaround seems to be to add a small value, so that the input data are not precisely on the same line. In the example in the question, adding 0.0001 to the first row produced this results:

X     Y     Z'
----  ----  -----
4     7     463.17
5     7     553.736
7     7     734.869
8     7     825.435

Adding this to the program might be easier, but it would more correct to avoid such cases or to use simple linear regression in this case.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2016-02-05 16:55:55 -0600

Seen: 383 times

Last updated: Feb 08 '16