Neural network input/output using bag of words

asked 2017-01-26 09:10:45 -0500

cflavs gravatar image

I'm trying to implement an object detection module which contains the following steps:

1) extract image descriptors with SURF, creating a matrix of size [x, 64], where x depends of the number of keypoints found in the image;

2) fix the descriptor size to a [k,64] format using bag of features/words approach. Where k is the number of clusters created using k-means.

3) feed a neural network using the resulting bag of words matrix as trainingSamples.

So far I've implemented steps 1 and 2 but I'm not quite sure how to format the output vector of the NN. On OpenCV CvANN_MLP, the number of rows in the output vector should have the same number of the input rows (otherwise returns an what() exception), but the number of input rows are the number of the k clusters on step 2, so I'm not understanding how to write the output matrix based on that.

I know the output matrix should have n columns corresponding to the number of classes in the output that I want (e.g. 3 classes: cat, dog and bird will result on a matrix with 3 columns), but how do I organize the rows of this matrix based on the input rows? I thought the rows were equal to the number of columns so the output would have for instance the following matrix output for cat, dog and bird:




But that doesn't match with the clusters so I don't know for sure how to proceed.

edit retag flag offensive close merge delete