Ask Your Question
-1

OpenCV C++ for decision tree based problems.

asked 2019-12-21 19:15:55 -0600

saanu gravatar image

updated 2019-12-22 02:49:50 -0600

berak gravatar image

I would like to use OpenCV C++ for below problem to forecast whether one will be able to play the game in the given weather conditions using ML Decision tree.

Here are the train and test data sets. Based on this, i want to build a decision tree to predict the data for test data

Train Dataset (train.csv)

Outlook Temperature Humidity    Windy   Play
Sunny   Hot     High    False   No
Rainy   Mild    High    False   Yes
Sunny   Cool    Normal  False   Yes
Overcast    Hot High    False   Yes
Rainy   Mild    High    False   Yes
Overcast    Hot Normal  False   Yes

Test Dataset (test.csv)

Id  Outlook Temperature Humidity    Windy
1   Sunny   Mild    Normal  True
2   Sunny   Mild    High    False
3   Overcast    Cool    Normal  True
4   Rainy   Mild    High    True

The Prediction File (user_prediction.csv) would looks like.

Id  Play
1   Yes
2   Yes
3   No
4   No

I want to know can i use openCV c++ for this kind of problems? Is openCV used only for Image and Video ML? Please let me know how to solve above problem using openCV C++ Decision tree.

Appreciate sample code to solve above problem. Problem details you can find in - https://www.hackerearth.com/blog/deve...

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2019-12-22 12:21:08 -0600

berak gravatar image

updated 2019-12-22 13:07:46 -0600

opencv might not be the best ml toolkit for this purpose, but it is entirely possible to work with categorical values like in your example, main problem, like with all ml, is beating the data into submission ;)

but first, a mandatory read

you can use ml::TrainData::loadFromCSV(), for your train.csv it would look like:

Ptr<ml::TrainData> data = ml::TrainData::loadFromCSV("train.csv", // our file
                                                     1,           // yes, it has a header line
                                                     -1,          // there are labels
                                                     -1,          // and it's the last column
                                                     "",          // we *only* have categorical
                                                                  // values (not a mix)
                                                     '\t'         // delimiter
                                                     );
cout << data->getTrainSamples() << endl;
cout << data->getTrainResponses().t() << endl;

[1, 2, 3, 4;
 6, 7, 3, 4;
 1, 9, 10, 4;
 11, 2, 3, 4;
 6, 7, 3, 4;
 11, 2, 10, 4]

[5, 8, 8, 8, 8, 8]

as you can see, the categorical names are just switched to a resp. list index.

while decision trees (and their derivatives) can properly handle this (they check for node equality), if you want to use other ml algos, like knn, ann or svm (which use some concept of "distance"), you'd need to switch to "one-hot" encoding instead.

i'd propose, you put both train and test data into the same csv file, and use some split between them.

take another look at the sample code here -- aaaaand good luck ;)

edit flag offensive delete link more

Comments

Thanks.. I need to implement solution for similar problems and working on production code. As per your reply, it's looks like openCV is not suitable for this kind of problems.

Please clarify, is openCV suitable for this kind of problems? or should i need to use mlpack C++ for this kind of problems. Please suggest.

saanu gravatar imagesaanu ( 2019-12-22 14:17:50 -0600 )edit

sorry, but i have no idea what else you should use here.

(and the blog example youre quoting is as useless as it tries to be "general" (say with 5 train samples...))

what is your actual "use-case", which problem do you try to solve in reality ?

(it's an YX-problem, and unless you get clear about it, this won't go anywhere.)

berak gravatar imageberak ( 2019-12-22 16:26:27 -0600 )edit

Thanks. Based on latitude, longitude, date and time, i want to predict whether user need to make a call or not. If user need to make a call then notification message will be displayed to user. The prediction result would be 1/0 based on latitude, longitude, date and time. 1 means Yes and 0 means No.

To start the prediction, i need at least 500 rows of data and max data would be 50,000 rows.

latitude, longitude, date , time, label 10 20 10-12-2019 5pm 1(Yes) 10 20 10-12-2019 4pm 0(No)

I don't know which c++ library is best to solve this problem. I am exploring on openCV and mlpack c++ to solve problem.

Can i use openCV C++ for this kind of problems? or should i need to use mlpack c++? Please suggest which c++ library is best for this kind of problems.

saanu gravatar imagesaanu ( 2019-12-22 17:08:09 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-12-21 19:15:55 -0600

Seen: 487 times

Last updated: Dec 22 '19