machine learning split set not working properly?
Hi,
I am using C++ and OpenCV 249 in order to code a machine learning application. I have a .csv file with classes and features. I load it, save it to a Mat and then want to split the data into training and test set. However, when I want to access the ids of the two sets and save them to a Mat, I only get values between 0 and 255 (my original set has considerably more entries). The dimensions of the matrices are correct, but the values always stay below 255.
// load data from csv
CvMLData mlData;
mlData.read_csv(dataDir.c_str());
Mat mlMat = mlData.get_values();
// split into training and test set
float train_sample_portion = 0.7; // use 70% as training
bool random_split = false; // true = random
CvTrainTestSplit spl(train_sample_portion, random_split);
mlData.set_train_test_split(&spl);
Mat trainSetIds = mlData.get_train_sample_idx(); // !!! values from 0 to 255 !!!
Mat testSetIds = mlData.get_test_sample_idx(); // !!! values from 0 to 255 !!!
(I want to have a random split, but turned to random split off here)
I figured that the problem could have something to do with the type of the matrices, but adding the following did not solve the problem:
Mat trainSetIds(mlMat.size().width, train_sample_portion * mlMat.size().height, CV_32SC1);
Mat testSetIds(mlMat.size().width, (1-train_sample_portion) * mlMat.size().height, CV_32SC1);
Actually, when looking at the output get_train_sample_idx() gives, that are also only values between 0 and 255. I hope someone can help here.
Cheers.
edit: to clarify the problem
My .csv has the class in the first column and then multiple columns with feature values ranging from -1 to 1. mlMat is showing correct values. trainSetIds and testSetIds should give me the row index of the split data.
This mlMat
(class feat1 feat2 ...) [not included]
1 0.3 0.6 ...
0 -0.6 -1 ...
0 -0.1 0.1 ...
1 0.2 0.8 ...
should give trainSetIds filled with (0 ; 1 ; 2) and testSetIds with (3). For this minimal example it works, but if there are more than 255 rows it does not.
How did you know that your values are only between 0 and 255? Did you used the template accessor to get an integer?
I used "cout << trainSetIds / testSetIds / mlData.get_...._sample_idx()" to output the values.
And mlMat has values gretter than 255?
It does not, but this is because there are no values greater than 255 in the csv. mlMat is showing all values correctly. I edited the original post for clarification.