Why is my knn network is so bad ?

asked 2016-10-25 10:48:36 -0600

BenNG gravatar image

updated 2016-10-26 08:47:43 -0600

Hello everyone !

I used a k-Nearest Neighbors algorithm (knn) and I trained it with the MNIST database
Here is the code for the training:

Ptr<ml::KNearest> getKnn()
{
Ptr<ml::KNearest> knn(ml::KNearest::create());

FILE *fp = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/train-images-idx3-ubyte", "rb");
FILE *fp2 = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/train-labels-idx1-ubyte", "rb");

if (!fp || !fp2)
{
    cout << "can't open file" << endl;
}

int magicNumber = readFlippedInteger(fp);
int numImages = readFlippedInteger(fp);
int numRows = readFlippedInteger(fp);
int numCols = readFlippedInteger(fp);
fseek(fp2, 0x08, SEEK_SET);

int size = numRows * numCols;

cout << "size: " << size << endl;
cout << "rows: " << numRows << endl;
cout << "cols: " << numCols << endl;

Mat_<float> trainFeatures(numImages, size);
Mat_<int> trainLabels(1, numImages);

BYTE *temp = new BYTE[size];
BYTE tempClass = 0;
for (int i = 0; i < numImages; i++)
{
    fread((void *)temp, size, 1, fp);
    fread((void *)(&tempClass), sizeof(BYTE), 1, fp2);

    trainLabels[0][i] = (int)tempClass;

    for (int k = 0; k < size; k++)
    {
        trainFeatures[i][k] = (float)temp[k];
    }
}
knn->train(trainFeatures, ml::ROW_SAMPLE, trainLabels);

return knn;

}

When I test the algorithm with the 10k images file MNIST provide I have: Accuracy: 96.910000 which is a good news :) The code to test the knn trained is here:

void testKnn(Ptr<ml::KNearest> knn, bool debug)
{
int totalCorrect = 0;

FILE *fp = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/t10k-images-idx3-ubyte", "rb");
FILE *fp2 = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/t10k-labels-idx1-ubyte", "rb");

int magicNumber = readFlippedInteger(fp);
int numImages = readFlippedInteger(fp);
int numRows = readFlippedInteger(fp);
int numCols = readFlippedInteger(fp);
fseek(fp2, 0x08, SEEK_SET);

int size = numRows * numCols;

Mat_<float> testFeatures(numImages, size);
Mat_<int> expectedLabels(1, numImages);

BYTE *temp = new BYTE[size];
BYTE tempClass = 0;

int K = 1;
Mat response, dist, m;

for (int i = 0; i < numImages; i++)
{

    if (i % 1000 == 0 && i != 0)
    {
        cout << i << endl;
    }

    fread((void *)temp, size, 1, fp);
    fread((void *)(&tempClass), sizeof(BYTE), 1, fp2);

    expectedLabels[0][i] = (int)tempClass;

    for (int k = 0; k < size; k++)
    {
        testFeatures[i][k] = (float)temp[k];
    }

    // test to verify if createMatFromMNIST and createMatToMNIST are well.
    m = testFeatures.row(i);

    knn->findNearest(m, K, noArray(), response, dist);

    if (debug)
    {
        cout << "response: " << response << endl;
        cout << "dist: " << dist << endl;
        Mat m2 = createMatFromMNIST(m);
        showImage(m2);
        // Mat m3 = createMatToMNIST(m2);
        // showImage(m3);
    }

    if (expectedLabels[0][i] == response.at<float>(0))
    {
        totalCorrect++;
    }
}
printf("Accuracy: %f ", (double)totalCorrect * 100 / (double)numImages);

}

By the way, you can test the knn I have implemented in my project here: (see the actions part) https://bitbucket.org/BenNG/sudoku-recognizer

But when it comes to use my own data against the algo, it has a bad behavior.
What is the data I give to the algo ?

To answer that I will present a bit my project. My project is a sudoku grabber. So on a picture that holds a sudoku, I'm able to find the sudoku and extract it. Then I'm able to extract every cell in the puzzle. Each cell is preprocessed before I send it to the knn. By ... (more)

edit retag flag offensive close merge delete

Comments

knn != network. can you start to make it more clear by removing that misinterpretation ?

berak gravatar imageberak ( 2016-10-25 11:32:49 -0600 )edit

your code is behind a login (not accessable)

berak gravatar imageberak ( 2016-10-26 00:19:26 -0600 )edit

Hello I made some modification about the bad wording. You were write the code was private but maybe you still have to log on bitbuket to see the code ?Thank you

BenNG gravatar imageBenNG ( 2016-10-26 03:13:50 -0600 )edit

i won't get an account there, just to see your code, sorry.

(can't you just paste your code here, or make a short, reproducable example ?)

berak gravatar imageberak ( 2016-10-26 03:39:21 -0600 )edit

I double check and it is accessible here ! https://bitbucket.org/BenNG/sudoku-re... even if you dont have an account !

BenNG gravatar imageBenNG ( 2016-10-26 07:39:31 -0600 )edit

yea, right. still, it's better , to have the relevant part of your code here..

(nope, noone will parse your whole repo to understancd a problem)

berak gravatar imageberak ( 2016-10-26 08:11:56 -0600 )edit

Well you are right for the code it's better to have it on the website but It's started to be big and I thought It was better for the reader to have all the code. For your answer I don't understand what it means

BenNG gravatar imageBenNG ( 2016-10-26 08:26:07 -0600 )edit
1

I added some code !

BenNG gravatar imageBenNG ( 2016-10-26 08:51:28 -0600 )edit

cool, i promise to test it later !

berak gravatar imageberak ( 2016-10-26 08:53:49 -0600 )edit

thx a lot :)

BenNG gravatar imageBenNG ( 2016-10-26 08:56:42 -0600 )edit

Do you think I can train the cnn with a CV_8UC1 Mat ? I think there is a problem with the type of the trained data and the data I extract from a puzzle

BenNG gravatar imageBenNG ( 2016-10-26 12:27:58 -0600 )edit

idk, what kind of cnn you're trying there, but yes, you'll have to convert to float for any version i know.

berak gravatar imageberak ( 2016-10-26 12:34:39 -0600 )edit

That's anoying I don't know what to do now ... It works but it sucks ... Is the knn efficient for that use case ?

BenNG gravatar imageBenNG ( 2016-10-26 13:19:40 -0600 )edit