Ask Your Question
0

Problem in the result of kNN

asked 2013-06-25 00:22:59 -0600

Abid Rahman K gravatar image

Hi,

I have been working with kNN in OpenCV.

I simply passed some (x,y) coordinates as samples data for kNN and their index as responses. Then I gave a (x,y) value to find the nearest neighbour ( I took this case to visualize it easily). It works fine if k=1. But when I pass k=3, i get weird result. As per my knowledge, knn.find_nearest() returns following things:

retval - I don't know what it is results - It has the label of nearest neighbour. neighborResponses - labels of k nearest neighbours with distance in increasing order. dists - Their corresponding distances in increasing order

In short, results and neighborResponses[0] should give the same result, right? But I am not getting it. See some examples below:

result    neighborResponses           dists
----------------------------------------------------------------------------
[[ 1.]]    [[ 19.  16.   1.]]        [[  36.  293.  338.]]
[[ 2.]]    [[ 24.   2.  21.]]        [[ 306.  490.  557.]]
[[ 1.]]    [[  4.  10.   1.]]         [[  65.  185.  464.]]

See, the result shows not the one with lowest distance. Why it is like that? Or did I understand it wrong ?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
2

answered 2013-06-25 14:17:31 -0600

Guanta gravatar image

Guess you have a slight wrong idea of KNN-Classifier. Quoting SciKit-Learn which have a nice explanation (Source: http://scikit-learn.org/stable/modules/neighbors.html)

Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.

Let's take the first row of your example, from a given point it computes the 3 nearest neighbors, unfortunately they come all from different classes, so it has just picked the one with the lowest class-nr (it could have picked any other as well).

Note that this is the typical behavior of a KNN-Classifier, however, in some implementations like the one from scikit-learn you can specify that the output also depends on the distances to the k nearest neighbors.

edit flag offensive delete link more

Comments

+1 - OK. I thought the classification is based on the distance (modified kNN) in OpenCV implementation. So it just depends on number of members from each class, right? No role for distance. If we want distance to be taken care of, we need to add extra code analyzing returned distance, right?

Abid Rahman K gravatar imageAbid Rahman K ( 2013-06-26 03:31:34 -0600 )edit

Exactly, if in your example it would have been [[ 19. 19. 1.]] instead of [[ 19. 16. 1.]] then the 19 would have been set, so just the number of labels counts not their distances. If you want the output to be related to the distance then you have to code that on your own, i.e. taking the distance and then weight the label according to it.

Guanta gravatar imageGuanta ( 2013-06-26 10:33:41 -0600 )edit

Question Tools

Stats

Asked: 2013-06-25 00:22:59 -0600

Seen: 550 times

Last updated: Jun 25 '13