Revision history [back]

k-nearest neighborhood interruption

i downloaded knn c++ code this code takes two text files train file and test file each has vectors and its labels the vector in text file is look like: 1.2,2.2,5.6,....,1 2.1,1,5,5.5,....,1 1.2,2.5,3.6,....,2 the last index in vector is the label for this vector the code read the data from these files with this function

bool readData4File (char *filename, TRAINING_EXAMPLES_LIST *rlist, 
                    int *rlistExamples)
{
    FILE *fp = NULL;
    int len = 0;
    char line[LINE_MAX+1];
    int lineSize = LINE_MAX;
    TrainingExample *TEObj;
    int index = 0;
    int numExamples = 0;

    *rlistExamples = 0;

    line[0] = 0;

    if((fp = fopen (filename, "r")) == NULL)
    {
        cout<<"Error in opening file."<<endl;
        return false;
    }

    //Initialize weights to random values
    srand (time(NULL));

    char *tmp;
    int tmpParams = 0; //NO_OF_ATT;
    int i = 0;
    double cd = 0.0;

    /* Read the data file line by line */
    while((len = GetLine (line, lineSize, fp))!=0) 
    {
        TEObj = new TrainingExample ();
        tmp = strtok (line,",");
        while (tmp != NULL)
        {
            cd = atof (tmp);
            TEObj->Value[tmpParams] = cd;
            tmpParams ++;

            tmp = strtok (NULL, ",");

            if(tmpParams == NO_OF_ATT)
            {
                tmpParams = 0;
                cd = 0.0;
                line[0] = 0;
                numExamples ++;

                //Not using this normalization anymore. 
                // N(y) = y/(1+y)
                // Doing normalization by standard deviation and mean
                //TEObj->NormalizeVals ();

                /* Generating random weights for instances. */
                /* These weights are used in instance WKNN  */
                double rno = (double)(rand () % 100 + 1);
                TEObj->Weight = rno/100;
                TEObj->index = index++;
                TEObj->isNearest2AtleastSome = false;
                break;
            }
        }

        rlist->insert (rlist->end(), *TEObj);

        delete TEObj;
    }

then the code send the list of vectors in order to classify test vectors i have 1405 train vector and 810 test vector, when running the code read the number of vectors as 1405 and 810 but the list of training and text vectors has 2810 and 1620 index what is the content of these lists and why they have the double number of train and test vectors! my result is 183% for accuracy as the code send the list not the examples why!