Ask Your Question

lups789's profile - activity

2014-07-21 06:36:11 -0600 commented question machine learning split set not working properly?

It does not, but this is because there are no values greater than 255 in the csv. mlMat is showing all values correctly. I edited the original post for clarification.

2014-07-21 06:35:02 -0600 received badge  Editor (source)
2014-07-21 05:12:28 -0600 commented question machine learning split set not working properly?

I used "cout << trainSetIds / testSetIds / mlData.get_...._sample_idx()" to output the values.

2014-07-21 04:17:25 -0600 asked a question machine learning split set not working properly?

Hi,

I am using C++ and OpenCV 249 in order to code a machine learning application. I have a .csv file with classes and features. I load it, save it to a Mat and then want to split the data into training and test set. However, when I want to access the ids of the two sets and save them to a Mat, I only get values between 0 and 255 (my original set has considerably more entries). The dimensions of the matrices are correct, but the values always stay below 255.

    //   load data from csv
    CvMLData mlData;
    mlData.read_csv(dataDir.c_str());
    Mat mlMat = mlData.get_values();

//   split into training and test set
    float train_sample_portion = 0.7;       // use 70% as training
    bool random_split = false;      // true = random
    CvTrainTestSplit spl(train_sample_portion, random_split);
    mlData.set_train_test_split(&spl);
    Mat trainSetIds = mlData.get_train_sample_idx();            // !!! values from 0 to 255 !!!
    Mat testSetIds = mlData.get_test_sample_idx();          // !!! values from 0 to 255 !!!

(I want to have a random split, but turned to random split off here)

I figured that the problem could have something to do with the type of the matrices, but adding the following did not solve the problem:

    Mat trainSetIds(mlMat.size().width, train_sample_portion * mlMat.size().height, CV_32SC1);
    Mat testSetIds(mlMat.size().width, (1-train_sample_portion) * mlMat.size().height, CV_32SC1);

Actually, when looking at the output get_train_sample_idx() gives, that are also only values between 0 and 255. I hope someone can help here.

Cheers.

edit: to clarify the problem

My .csv has the class in the first column and then multiple columns with feature values ranging from -1 to 1. mlMat is showing correct values. trainSetIds and testSetIds should give me the row index of the split data.

This mlMat

(class  feat1   feat2   ...) [not included]
1        0.3    0.6     ...
0        -0.6   -1      ...
0        -0.1   0.1     ...
1        0.2    0.8     ...

should give trainSetIds filled with (0 ; 1 ; 2) and testSetIds with (3). For this minimal example it works, but if there are more than 255 rows it does not.