Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Train Boost model by huge size data

Hi all, I am trying to train my dataset (over 15G). Obviously, it is not possible to load entire data set into memory at once. Therefore, I am considering to load my data separately, and it worked fine on my own implementation of adaboost.

Now, I would like to train this dataset by OpenCV. I found the training data is store into a smart pointer

Ptr<ml::TrainData>

I need to load all my data, because training process on OpenCV can only input one set of TrainData according to the following OpenCV 3 source code boost.cpp:

bool train( const Ptr<TrainData>& trainData, int flags )
{
    startTraining(trainData, flags);
    int treeidx, ntrees = bparams.weakCount >= 0 ? bparams.weakCount : 10000;
    vector<int> sidx = w->sidx;

    for( treeidx = 0; treeidx < ntrees; treeidx++ )
    {
        int root = addTree( sidx );
        if( root < 0 )
            return false;
        updateWeightsAndTrim( treeidx, sidx );
    }
    endTraining();
    return true;
}

Is there any other function can make me divide my dataset into several chunks and put into training process?

If not, does anyone know other adaboost library can handle huge size data?

Train Boost model by huge size of data

Hi all, I am trying to train my dataset (over 15G). Obviously, it is not possible to load entire data set into memory at once. Therefore, I am considering to load my data separately, and it worked fine on my own implementation of adaboost.

Now, I would like to train this dataset by OpenCV. I found the training data is store into a smart pointer

Ptr<ml::TrainData>

I need to load all my data, because training process on OpenCV can only input one set of TrainData according to the following OpenCV 3 source code boost.cpp:

bool train( const Ptr<TrainData>& trainData, int flags )
{
    startTraining(trainData, flags);
    int treeidx, ntrees = bparams.weakCount >= 0 ? bparams.weakCount : 10000;
    vector<int> sidx = w->sidx;

    for( treeidx = 0; treeidx < ntrees; treeidx++ )
    {
        int root = addTree( sidx );
        if( root < 0 )
            return false;
        updateWeightsAndTrim( treeidx, sidx );
    }
    endTraining();
    return true;
}

Is there any other function can make me divide my dataset into several chunks and put into training process?

If not, does anyone know other adaboost library can handle huge size of data?