Ask Your Question

Revision history [back]

Preventing Over-fitting

I'm using cv::Boost to learn small image patches (Boost::DISCRETE currently gives me the best results).
I noticed that the more example images that I have, the larger the model/predictor XML file is. It is almost as if the file is storing the images as samples.
I don't care so much for the file size but I am afraid that this growing effect is due to overfitting, where the classifier does very well on the test set (because it almost keeps an internal copy), and would not generalize well to new images.

How can I ensure that I will avoid over-fitting and good generalization?

I currently use setWeightTrimRate(0.4); to keep the file size low.

Preventing Over-fitting

I'm using cv::Boost to learn small image patches (Boost::DISCRETE currently gives me the best results).
I noticed that the more example images that I have, have in my test set, the larger the model/predictor XML file is. It is almost as if the file is storing [some] the images as samples.
I don't care so much for the file size but I am afraid that this growing effect is due to overfitting, where the classifier does very well on the test set (because it almost keeps an internal copy), and would not generalize well to new images.

How can I ensure that I will avoid over-fitting and good generalization?

I currently use setWeightTrimRate(0.4); to keep the file size low.

Preventing Over-fitting

I'm using cv::Boost to learn small image patches (Boost::DISCRETE currently gives me the best results).
I noticed that the more example images that I have in my test training set, the larger the model/predictor XML file is. It is almost as if the file is storing [some] the images as samples.
I don't care so much for the file size but I am afraid that this growing effect is due to overfitting, where the classifier does very well on the test set (because it almost keeps an internal copy), and would not generalize well to new images.

How can I ensure that I will avoid over-fitting and good generalization?

I currently use setWeightTrimRate(0.4); to keep the file size low.

Preventing Over-fitting

I'm using cv::Boost to learn small image patches (Boost::DISCRETE currently gives me the best results).
I noticed that the more example images that I have in my training set, the larger the model/predictor XML file is. It is almost as if the file is storing [some] of the images as samples.
I don't care so much for the file size but I am afraid that this growing effect is due to overfitting, where the classifier does very well on the test set (because it almost keeps an internal copy), and would not generalize well to new images.

How can I ensure that I will avoid over-fitting and good generalization?

I currently use setWeightTrimRate(0.4); to keep the file size low.