Hi,
I am trying to simulate the method in one paper and I have the problem in the understanding the steps of implementation. In which the authors, in the offline procedure, extract SIFT features from the training set and store in an inverted-index form. During the line procedure, a query mass is given to be matched with all training images through Hough voting of SIFT features. A similarity score also is calculated to estimate the similarity of query image and retrieved images.
I have done the following steps:
- extracted SIFT features from the database and query (test) images. Descriptors were saved into des_train and des_query respectively.
- visual vocabularies are created by k-means clustering. I created the tf-idf table with vocab_size=100 (for example) for both training and testing datasets.
- I am stocked here: which I have to extract a tuple
{(v_k,p_k)for k=1:n}
for a training image (x) , wheren
is the number of features extracted from image x, andv_k
is k-th visual word id andp_k
is the position of v_k from the center of object in training image x and it is denoted byp_i=[x_i,y_i]
. - for a given query image
q
, it is matched with all training imagesd
as similarity map is calculated in a matrix of same size ofq
and its element at the positionp
indicates the similarity between the region ofq
centered at positionp
. The matching is based on generalized hough voting of SIFT features.
My question is two-fold:
- How to extract information at the third above-mentioned point? Is this visual vocabulary matching or descriptor matching?
- Does anyone know how can I get the matching score between query image and training images through Hough voting of SIFT (4th point)?
I am quite new in CBIR, your expert opinion is really appreciated. If there is any resources or codes, could you please share with me?