# MNIST and Local Binary Patterns

I'm trying to cluster MNIST dataset, I'm using PCA(for dimension reduction) and kmeans for clustering. For now I'm using just raw vector of pixels, I tried to convert it with Local Binary Patterns but it still gives bad results, I think I need to compute histograms based on LBP, but not sure how. I use code from here

can you provide some examples?

edit retag close merge delete

Sort by ยป oldest newest most voted

First try to apply a PCA directly, I think it's sufficient already. Then note, that mathematically a k-means method yields the same centroids as a PCA projection, see:

• Ding C., He X. "K-means Clustering via Principal Component Analysis" in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada 2004. [PDF Online Available here]

So there's no need to perform a k-means clustering on the projected samples. What I would try instead is to see, what clusters a Linear Discriminant Analysis yields. A Linear Discriminant Analysis is available as cv::LDA in the contrib module of OpenCV:

Then I would play around with OpenCV's awesome machine learning library and see how a Multi Layer Perceptron performs or how a SVM performs on the image data (with different kernels).

Regarding Local Binary Patterns, there's a much simpler (and tested!) implementation in the face recognition code I provide. The OpenCV implementation in cv::FaceRecognizer is the same. I am linking to the original project (libfacerec), because everything is in one file there:

If you need some ideas on how to work with OpenCV machine learning, you can have a look into my Guide to Machine Learning with OpenCV:

There are also tutorials on using a SVM in the official OpenCV documentation:

more

by kmeans I cluster unlabled data(for example I have 10 classes(0-9 digits) and cluster to 30 clusters and then I see this clusters and maybe merge some if they are similar) and PCA just give basic(for reduce dimensions), so I don't know how to use PCA for clustering.For testing preprocessing before PCA I project all data to 2D and see how is data separable(for that propose I use labels), for example I tried separable histograms that was in pack with LBP for preprocessing thresholded binary digits.

( 2012-08-22 04:43:22 -0600 )edit

Official site

GitHub

Wiki

Documentation