unfortunately again, opencv's BagOfWords classes are not useable from java, so you'll have to improvise ;)
1) build a BOW dictionary (you onlyneed to do that once). collect as many (SIFT or SURF, as you did before) decriptors as you can, from many images. if you want to retain 200 features later, you need at least 10x train features
Mat features = new Mat(); // bow train data
// for all images:
descriptorExtractor.compute(sceneImage, sceneKeyPoints, sceneDescriptors);
features.push_back( sceneDescriptors );
// later, if you collected all features:
// kmeans cluster on the features, the retained centers will be our BoW vocabulary:
Mat bestLabels = new Mat();
Mat vocab = new Mat();
Core.kmeans(features, 100, bestLabels, new TermCriteria(), 3, vocab);
2) now, for any actual image, we will calculate a signature(bag of words feature). instead of matching the sceneDescriptor to a trainImage (as you did before), we will match them to our vocabulary, and have an array of counters, a histogram with one bin for every vocabulary word. it simply counts, which features in our test image matched our vocabulary.
// (pseudo code, sorry, i don't have java to test)
// for each image
descriptorExtractor.compute(sceneImage, sceneKeyPoints, sceneDescriptors);
matcher.match(vocab, sceneDescriptors, matches);
float hist[] = new float[vocab.cols()];
for (m: matches) {
hist[ m.queryIdx ]++;
}
Mat feature = new Mat(1, vocab.cols(), CV_32F);
feature.put(0,0,hist);
normalize(feature, feature);
// compare:
double distance = norm(feature1, feature2);
this array is the BoW signature for the image. we can compare those with norm() for simple similarity, or apply machine learning to train on object classes (SVM), or use it with knn search.