Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

machine learning split set not working properly?

Hi,

I am using C++ and OpenCV 249 in order to code a machine learning application. I have a .csv file with classes and features. I load it, save it to a Mat and then want to split the data into training and test set. However, when I want to access the ids of the two sets and save them to a Mat, I only get values between 0 and 255 (my original set has considerably more entries). The dimensions of the matrices are correct, but the values always stay below 255.

    //   load data from csv
    CvMLData mlData;
    mlData.read_csv(dataDir.c_str());
    Mat mlMat = mlData.get_values();

//   split into training and test set
    float train_sample_portion = 0.7;       // use 70% as training
    bool random_split = false;      // true = random
    CvTrainTestSplit spl(train_sample_portion, random_split);
    mlData.set_train_test_split(&spl);
    Mat trainSetIds = mlData.get_train_sample_idx();            // !!! values from 0 to 255 !!!
    Mat testSetIds = mlData.get_test_sample_idx();          // !!! values from 0 to 255 !!!

(I want to have a random split, but turned to random split off here)

I figured that the problem could have something to do with the type of the matrices, but adding the following did not solve the problem:

    Mat trainSetIds(mlMat.size().width, train_sample_portion * mlMat.size().height, CV_32SC1);
    Mat testSetIds(mlMat.size().width, (1-train_sample_portion) * mlMat.size().height, CV_32SC1);

Actually, when looking at the output get_train_sample_idx() gives, that are also only values between 0 and 255. I hope someone can help here.

Cheers.

click to hide/show revision 2
problem clarification

machine learning split set not working properly?

Hi,

I am using C++ and OpenCV 249 in order to code a machine learning application. I have a .csv file with classes and features. I load it, save it to a Mat and then want to split the data into training and test set. However, when I want to access the ids of the two sets and save them to a Mat, I only get values between 0 and 255 (my original set has considerably more entries). The dimensions of the matrices are correct, but the values always stay below 255.

    //   load data from csv
    CvMLData mlData;
    mlData.read_csv(dataDir.c_str());
    Mat mlMat = mlData.get_values();

//   split into training and test set
    float train_sample_portion = 0.7;       // use 70% as training
    bool random_split = false;      // true = random
    CvTrainTestSplit spl(train_sample_portion, random_split);
    mlData.set_train_test_split(&spl);
    Mat trainSetIds = mlData.get_train_sample_idx();            // !!! values from 0 to 255 !!!
    Mat testSetIds = mlData.get_test_sample_idx();          // !!! values from 0 to 255 !!!

(I want to have a random split, but turned to random split off here)

I figured that the problem could have something to do with the type of the matrices, but adding the following did not solve the problem:

    Mat trainSetIds(mlMat.size().width, train_sample_portion * mlMat.size().height, CV_32SC1);
    Mat testSetIds(mlMat.size().width, (1-train_sample_portion) * mlMat.size().height, CV_32SC1);

Actually, when looking at the output get_train_sample_idx() gives, that are also only values between 0 and 255. I hope someone can help here.

Cheers.

edit: to clarify the problem

My .csv has the class in the first column and then multiple columns with feature values ranging from -1 to 1. mlMat is showing correct values. trainSetIds and testSetIds should give me the row index of the split data.

This mlMat

(class  feat1   feat2   ...) [not included]
1        0.3    0.6     ...
0        -0.6   -1      ...
0        -0.1   0.1     ...
1        0.2    0.8     ...

should give trainSetIds filled with (0 ; 1 ; 2) and testSetIds with (3). For this minimal example it works, but if there are more than 255 rows it does not.