Training a classifier if feature size is greater than RAM size

asked 2017-10-30 11:30:39 -0500

lama123 gravatar image

Hello,

How can we train a classifier if the size of the Mat containing features is greater than the RAM size? Can we train the classifier incrementally(for example svm classifier)? Is it possible to update the classifier using the previously created xml?

The project is forgery detection algorithm

The feature size comes to about [18157*300] float = 20mb for one image. For 500 images it will come around 10gb. The RAM size is 6gb. This leads to a out of memory error.

Is their a way to train the classifier with features of such size without resorting to dimensionality reduction methods like pca?

edit retag flag offensive close merge delete

Comments

1

20mb input per image is simply insane.

(are your features raw, HD pixels ? that's probably the bad idea here)

and you can train an ANN incrementally, but not an SVM

berak gravatar imageberak ( 2017-10-30 11:36:03 -0500 )edit
1

These are scrm features. A 2000x1000 image is divided into 32x 32 blocks with a stride of 16. Out of these blocks around 150 is extracted as tampered blocks and 150 as pristine blocks.So total amounting to 300 blocks. For each block a total of 18157 scrm features are calculated. Since it is a forgery detection algorithm we cannot also resize the image as it will lead to loss of information.

can we update any classifier other than ann, like random forest?

lama123 gravatar imagelama123 ( 2017-10-30 11:43:54 -0500 )edit

idk. but usually, if featurelen > featurecount, any kind of ml goes downhill. you'd need 40000x train features, to compensate that huge size.

berak gravatar imageberak ( 2017-10-30 12:03:18 -0500 )edit

yep, that is the plan

For 300 blocks per image and number of images equal to 500, the feature count will be 500*300=1,50,000

SCRM is Spatio–Color Rich Model used for steganalysis

Can any classifier learn incrementally other than ANN, like random forest?

Thanks for the inputs.

lama123 gravatar imagelama123 ( 2017-10-30 23:23:42 -0500 )edit

you can train a KNearest classifier incrementally, too, given an UPDATE flag in the train() method.

(though i fear, that it has to retain ALL of the trainingdata, it has seen, so no real win, memory-wise)

(but no luck with any of the tree based ones)

why NOT use an ANN here ? all it has to retain is the weights, and you can throw infinite amounts of data at it (given you do that in small batches), and it won't ever need to grow.

berak gravatar imageberak ( 2017-10-31 02:39:14 -0500 )edit

The main reasons for not using ANN are 1) Long training time 2) From experience other classifiers like svm gave better accuracy than ann 3) Wanted to make use of trainAuto method of svm. (Does any other classifier has an auto training method?)

But I will try making use of ANN as you suggested

lama123 gravatar imagelama123 ( 2017-10-31 05:56:47 -0500 )edit

The main reasons for not using ANN are

1) Long training time

2) From experience other classifiers like svm gave better accuracy than ann

3) Wanted to make use of trainAuto method of svm. (Does any other classifier has an auto training method?)

But I will try making use of ANN as you suggested

lama123 gravatar imagelama123 ( 2017-10-31 05:56:57 -0500 )edit
  1. true. bummer.
  2. my experience is exactly the other way round
  3. no. and train_auto will take ages, too, since it basically does a nfolds x ngrids train & test run. (though latest master branch should have a parallel train_auto)
berak gravatar imageberak ( 2017-10-31 05:59:29 -0500 )edit
  1. true

  2. Are you getting high accuracy than svm even with just 3 layers ? I don't have much experience in deep networks.

  3. Looks like trainAuto is not a practical solution for cases with such high feature size..

lama123 gravatar imagelama123 ( 2017-10-31 08:55:46 -0500 )edit
1

Actually your point 1 is invalid. ANN's and all derivatives are slow to train from scratch, but there is a thing called transfer learning, which uses pretrained weights of an existing model and finetunes those. In that case training can actually be quite fast.

StevenPuttemans gravatar imageStevenPuttemans ( 2017-11-02 06:52:45 -0500 )edit