Why is my knn network is so bad ?
Hello everyone !
I used a k-Nearest Neighbors algorithm (knn) and I trained it with the MNIST database
Here is the code for the training:
Ptr<ml::KNearest> getKnn()
{
Ptr<ml::KNearest> knn(ml::KNearest::create());
FILE *fp = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/train-images-idx3-ubyte", "rb");
FILE *fp2 = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/train-labels-idx1-ubyte", "rb");
if (!fp || !fp2)
{
cout << "can't open file" << endl;
}
int magicNumber = readFlippedInteger(fp);
int numImages = readFlippedInteger(fp);
int numRows = readFlippedInteger(fp);
int numCols = readFlippedInteger(fp);
fseek(fp2, 0x08, SEEK_SET);
int size = numRows * numCols;
cout << "size: " << size << endl;
cout << "rows: " << numRows << endl;
cout << "cols: " << numCols << endl;
Mat_<float> trainFeatures(numImages, size);
Mat_<int> trainLabels(1, numImages);
BYTE *temp = new BYTE[size];
BYTE tempClass = 0;
for (int i = 0; i < numImages; i++)
{
fread((void *)temp, size, 1, fp);
fread((void *)(&tempClass), sizeof(BYTE), 1, fp2);
trainLabels[0][i] = (int)tempClass;
for (int k = 0; k < size; k++)
{
trainFeatures[i][k] = (float)temp[k];
}
}
knn->train(trainFeatures, ml::ROW_SAMPLE, trainLabels);
return knn;
}
When I test the algorithm with the 10k images file MNIST provide I have: Accuracy: 96.910000 which is a good news :) The code to test the knn trained is here:
void testKnn(Ptr<ml::KNearest> knn, bool debug)
{
int totalCorrect = 0;
FILE *fp = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/t10k-images-idx3-ubyte", "rb");
FILE *fp2 = fopen("/keep/Repo/USELESS/_sandbox/cpp/learning-cpp/sudoku/assets/t10k-labels-idx1-ubyte", "rb");
int magicNumber = readFlippedInteger(fp);
int numImages = readFlippedInteger(fp);
int numRows = readFlippedInteger(fp);
int numCols = readFlippedInteger(fp);
fseek(fp2, 0x08, SEEK_SET);
int size = numRows * numCols;
Mat_<float> testFeatures(numImages, size);
Mat_<int> expectedLabels(1, numImages);
BYTE *temp = new BYTE[size];
BYTE tempClass = 0;
int K = 1;
Mat response, dist, m;
for (int i = 0; i < numImages; i++)
{
if (i % 1000 == 0 && i != 0)
{
cout << i << endl;
}
fread((void *)temp, size, 1, fp);
fread((void *)(&tempClass), sizeof(BYTE), 1, fp2);
expectedLabels[0][i] = (int)tempClass;
for (int k = 0; k < size; k++)
{
testFeatures[i][k] = (float)temp[k];
}
// test to verify if createMatFromMNIST and createMatToMNIST are well.
m = testFeatures.row(i);
knn->findNearest(m, K, noArray(), response, dist);
if (debug)
{
cout << "response: " << response << endl;
cout << "dist: " << dist << endl;
Mat m2 = createMatFromMNIST(m);
showImage(m2);
// Mat m3 = createMatToMNIST(m2);
// showImage(m3);
}
if (expectedLabels[0][i] == response.at<float>(0))
{
totalCorrect++;
}
}
printf("Accuracy: %f ", (double)totalCorrect * 100 / (double)numImages);
}
By the way, you can test the knn I have implemented in my project here: (see the actions part) https://bitbucket.org/BenNG/sudoku-recognizer
But when it comes to use my own data against the algo, it has a bad behavior.
What is the data I give to the algo ?
To answer that I will present a bit my project. My project is a sudoku grabber. So on a picture that holds a sudoku, I'm able to find the sudoku and extract it. Then I'm able to extract every cell in the puzzle. Each cell is preprocessed before I send it to the knn. By ...
knn != network. can you start to make it more clear by removing that misinterpretation ?
your code is behind a login (not accessable)
Hello I made some modification about the bad wording. You were write the code was private but maybe you still have to log on bitbuket to see the code ?Thank you
i won't get an account there, just to see your code, sorry.
(can't you just paste your code here, or make a short, reproducable example ?)
I double check and it is accessible here ! https://bitbucket.org/BenNG/sudoku-re... even if you dont have an account !
yea, right. still, it's better , to have the relevant part of your code here..
(nope, noone will parse your whole repo to understancd a problem)
Well you are right for the code it's better to have it on the website but It's started to be big and I thought It was better for the reader to have all the code. For your answer I don't understand what it means
I added some code !
cool, i promise to test it later !
thx a lot :)