Revision history - OpenCV Q&A Forum

First of all, there are several names for codebooks, so maybe you've heard from them in another context: visual vocabularies, bag of visual words, textons... The Wiki site on the bag of words principle is pretty informative I would say. You should check it out: http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision

But the basic idea is rather simple and inspired by text classification tasks. Consider a text classification problem. Typically one would count the number of occurences of certain words to classify a text. E.g. if a text consists of a various number of words like 'price', 'credit card' and so on, one might guess this Mail is some sort of financial text. So depending on the task, some words are counted and some are not, e.g. 'and', 'he' and so on are not helpful to recognize some sort of financial text, while 'price' and 'credit-card' are. Thus the latter words would be part of the Codebook (or vocabulary). Such a codebook is then used to describe an entire text by counting the occurences of every codebook-word in the text, resulting in a so called word frequency histogram.

For an image/object classification task (or even segmentation) this principle is also used, but instead of words we need to obtain so called 'visual words'. This is typcícally achieved by collecting a lot of feature descriptors from images and clustering them. The resulting cluster centers are then your codebook.

First of all, there are several names for codebooks, so maybe you've heard from them in another context: visual vocabularies, bag of visual words, textons... The Wiki site on the bag of words principle is pretty informative I would say. You should check it out: http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision

But the basic idea is rather simple and inspired by text classification tasks. Consider a text classification problem. Typically one would count the number of occurences of certain words to classify a text. E.g. if a text consists of a various number of words like 'price', 'credit card' and so on, one might guess this Mail is some sort of financial text. So depending on the task, some words are counted and some are not, e.g. 'and', 'he' and so on are not helpful to recognize some sort of financial text, while 'price' and 'credit-card' are. Thus the latter words would be part of the Codebook (or vocabulary). Such a codebook is then used to describe an entire text by counting the occurences of every codebook-word in the text, resulting in a so called word frequency histogram.

For an image/object classification task (or even segmentation) this principle is also used, but instead of words we need to obtain so called 'visual words'. This is typcícally achieved by collecting a lot of feature descriptors from images and clustering them. The resulting cluster centers are then your codebook.

How is a codeword defined? - typically as some cluster center feature descriptors

How are codebooks computed? - typically with some sort of clustering algorithm (e.g. KMeans)

First of all, there are several names for codebooks, so maybe you've heard from them in another context: visual vocabularies, bag of visual words, textons... The Wiki site on the bag of words principle is pretty informative I would say. You should check it out: http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision

But the basic idea is rather simple and inspired by text classification tasks. Consider a text classification problem. Typically one would count the number of occurences of certain words to classify a text. E.g. if a text consists of a various number of words like 'price', 'credit card' and so on, one might guess this Mail is some sort of financial text. So depending on the task, some words are counted and some are not, e.g. 'and', 'he' and so on are not helpful to recognize some sort of financial text, while 'price' and 'credit-card' are. Thus the latter words would be part of the Codebook (or vocabulary). Such a codebook is then used to describe an entire text by counting the occurences of every codebook-word in the text, resulting in a so called word frequency histogram.

For an image/object classification task (or even segmentation) this principle is also used, but instead of words we need to obtain so called 'visual words'. This is typcícally achieved by collecting a lot of feature descriptors from images and clustering them. The resulting cluster centers are then your codebook.

How is a codeword defined? - typically as ~~some~~ a cluster center of feature descriptors

How are codebooks computed? - typically with some sort of clustering algorithm (e.g. KMeans)

Revision history [back]