Training a classifier if feature size is greater than RAM size
Hello,
How can we train a classifier if the size of the Mat containing features is greater than the RAM size? Can we train the classifier incrementally(for example svm classifier)? Is it possible to update the classifier using the previously created xml?
The project is forgery detection algorithm
The feature size comes to about [18157*300] float = 20mb for one image. For 500 images it will come around 10gb. The RAM size is 6gb. This leads to a out of memory error.
Is their a way to train the classifier with features of such size without resorting to dimensionality reduction methods like pca?
20mb input per image is simply insane.
(are your features raw, HD pixels ? that's probably the bad idea here)
and you can train an ANN incrementally, but not an SVM
These are scrm features. A 2000x1000 image is divided into 32x 32 blocks with a stride of 16. Out of these blocks around 150 is extracted as tampered blocks and 150 as pristine blocks.So total amounting to 300 blocks. For each block a total of 18157 scrm features are calculated. Since it is a forgery detection algorithm we cannot also resize the image as it will lead to loss of information.
can we update any classifier other than ann, like random forest?
idk. but usually, if featurelen > featurecount, any kind of ml goes downhill. you'd need 40000x train features, to compensate that huge size.
yep, that is the plan
For 300 blocks per image and number of images equal to 500, the feature count will be 500*300=1,50,000
SCRM is Spatio–Color Rich Model used for steganalysis
Can any classifier learn incrementally other than ANN, like random forest?
Thanks for the inputs.
you can train a KNearest classifier incrementally, too, given an UPDATE flag in the train() method.
(though i fear, that it has to retain ALL of the trainingdata, it has seen, so no real win, memory-wise)
(but no luck with any of the tree based ones)
why NOT use an ANN here ? all it has to retain is the weights, and you can throw infinite amounts of data at it (given you do that in small batches), and it won't ever need to grow.
The main reasons for not using ANN are 1) Long training time 2) From experience other classifiers like svm gave better accuracy than ann 3) Wanted to make use of trainAuto method of svm. (Does any other classifier has an auto training method?)
But I will try making use of ANN as you suggested
The main reasons for not using ANN are
1) Long training time
2) From experience other classifiers like svm gave better accuracy than ann
3) Wanted to make use of trainAuto method of svm. (Does any other classifier has an auto training method?)
But I will try making use of ANN as you suggested
true
Are you getting high accuracy than svm even with just 3 layers ? I don't have much experience in deep networks.
Looks like trainAuto is not a practical solution for cases with such high feature size..
Actually your point 1 is invalid. ANN's and all derivatives are slow to train from scratch, but there is a thing called transfer learning, which uses pretrained weights of an existing model and finetunes those. In that case training can actually be quite fast.