# Problem in the result of kNN

Hi,

I have been working with kNN in OpenCV.

I simply passed some (x,y) coordinates as samples data for kNN and their index as responses. Then I gave a (x,y) value to find the nearest neighbour ( I took this case to visualize it easily). It works fine if k=1. But when I pass k=3, i get weird result. As per my knowledge, knn.find_nearest() returns following things:

retval - I don't know what it is results - It has the label of nearest neighbour. neighborResponses - labels of k nearest neighbours with distance in increasing order. dists - Their corresponding distances in increasing order

In short, results and neighborResponses[0] should give the same result, right? But I am not getting it. See some examples below:

result    neighborResponses           dists
----------------------------------------------------------------------------
[[ 1.]]    [[ 19.  16.   1.]]        [[  36.  293.  338.]]
[[ 2.]]    [[ 24.   2.  21.]]        [[ 306.  490.  557.]]
[[ 1.]]    [[  4.  10.   1.]]         [[  65.  185.  464.]]


See, the result shows not the one with lowest distance. Why it is like that? Or did I understand it wrong ?

edit retag close merge delete

Sort by ยป oldest newest most voted

Guess you have a slight wrong idea of KNN-Classifier. Quoting SciKit-Learn which have a nice explanation (Source: http://scikit-learn.org/stable/modules/neighbors.html)

Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.

Let's take the first row of your example, from a given point it computes the 3 nearest neighbors, unfortunately they come all from different classes, so it has just picked the one with the lowest class-nr (it could have picked any other as well).

Note that this is the typical behavior of a KNN-Classifier, however, in some implementations like the one from scikit-learn you can specify that the output also depends on the distances to the k nearest neighbors.

more

+1 - OK. I thought the classification is based on the distance (modified kNN) in OpenCV implementation. So it just depends on number of members from each class, right? No role for distance. If we want distance to be taken care of, we need to add extra code analyzing returned distance, right?

( 2013-06-26 03:31:34 -0500 )edit

Exactly, if in your example it would have been [[ 19. 19. 1.]] instead of [[ 19. 16. 1.]] then the 19 would have been set, so just the number of labels counts not their distances. If you want the output to be related to the distance then you have to code that on your own, i.e. taking the distance and then weight the label according to it.

( 2013-06-26 10:33:41 -0500 )edit

Official site

GitHub

Wiki

Documentation