Ask Your Question
0

Train Boost model by huge size of data

asked 2015-12-16 11:07:08 -0500

JesseCh gravatar image

updated 2015-12-16 11:10:33 -0500

Hi all, I am trying to train my dataset (over 15G). Obviously, it is not possible to load entire data set into memory at once. Therefore, I am considering to load my data separately, and it worked fine on my own implementation of adaboost.

Now, I would like to train this dataset by OpenCV. I found the training data is store into a smart pointer

Ptr<ml::TrainData>

I need to load all my data, because training process on OpenCV can only input one set of TrainData according to the following OpenCV 3 source code boost.cpp:

bool train( const Ptr<TrainData>& trainData, int flags )
{
    startTraining(trainData, flags);
    int treeidx, ntrees = bparams.weakCount >= 0 ? bparams.weakCount : 10000;
    vector<int> sidx = w->sidx;

    for( treeidx = 0; treeidx < ntrees; treeidx++ )
    {
        int root = addTree( sidx );
        if( root < 0 )
            return false;
        updateWeightsAndTrim( treeidx, sidx );
    }
    endTraining();
    return true;
}

Is there any other function can make me divide my dataset into several chunks and put into training process?

If not, does anyone know other adaboost library can handle huge size of data?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
1

answered 2015-12-19 08:40:24 -0500

JesseCh gravatar image

updated 2015-12-19 08:41:33 -0500

I found the answer and case closed. I use a Matlab tool box: GML Adaboost solve this problem. I transferred my csv files into .mat, and the dataset is compressed into a 2GB .Mat file from a 60 GB .csv. The model can be trained by this toolbox and export as a txt file. This model can be used in my C++ program by using a parser which provided by GML Adaboost .

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2015-12-16 11:07:08 -0500

Seen: 131 times

Last updated: Dec 19 '15