Tips for improving standard BoW:
Features:
- Typically the SIFT descriptors are computed densely over the complete image (i.e. use as detector 'Dense' and maybe try different step sizes)
- Encode some locality, i.e. either add the (normalized) x,y coordinates to the sift descriptors or (currently more often used so far) use a spatial pyramid (e.g. a spatial pyramid of level 2 means you divide your image in 4 parts and compute for each part the BoW-descriptor additionally to the regular BoW-descriptor resulting in a 5 times larger final descriptor which you pass further to the classifier)
Vocabulary:
- Your dictionary size seems very low, typical values range from 10^3 to 10^5, however this depends much on the application.
Classification:
- Use grid-search to find the optimum SVM parameters, also try other Kernel than just the linear Kernel
A more general advice: always separate your training and test sets, the implementation seems to mix that, however than the test case will always be biased.