Revision history [back]

It is definitely not true that this problem has not been tackled before. Have a look at the following papers: (In no special order and probably not up-to-date)

Monay, F., & Gatica-Perez, D. (2004). pLSA-based image auto-annotation: constraining the latent space. ACM Multimedia, 1–4.
Zhang, R., Zhang, L., Wang, X., & Guan, L. (2011). Multi-Feature pLSA for Combining Visual Features in Image Annotation. ACM Multimedia.
Socher, R. (2009). Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. IEEE Conference on Computer Vision and Pattern Recognition, 2036–2043.
Li, J., & Wang, J. Z. J. (2008). Real-time computerized annotation of pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 985–1002.
Wang, C., Blei, D., & Li, F. F. (2009). Simultaneous image classification and annotation. IEEE Conference on Computer Vision and Pattern Recognition, 1903–1910.
Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. ACM SIGIR Conference on Research and Development in Information Retrieval, 127.
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
Cusano, C., Bicocca, M., & Bicocca, V. (2003). Image annotation using SVM. Proceedings of SPIE, (1), 330–338.
Sinha, P., & Jain, R. (2008). Classification and annotation of digital photos using optical context data. Proceedings of the 2008 international conference on Content-based image and video retrieval, 309–318.
Pham, T., Maillot, N. E., Lim, J., & Chevallet, J. (2007). Latent Semantic Fusion Model for Image Retrieval and Annotation. Image (Rochester, N.Y.), 439–443.