# character recognition

I almost reach my goal which is extract data from sudoku puzzle.

• I found the 4 vertices of the puzzle in a picture
• Isolate the puzzle in a new Mat

right new, I have these kind of images which represent a cell of the puzzle:

For the next step, I would like to "focus on the middle" because I think that the ocr engine focus on the border. Is there a way to focus on the middle or will I have to find contour and focus on the contour close to the middle ? What is the next step for you ?

I know that there are fews sudoku grabber articles on the web but I would like to not copy/paste without understanding

Thank you !

edit retag close merge delete

Sort by » oldest newest most voted

I pre-processed the cell images like that:

// remove noise
medianBlur(cell, cell_no_noise, 1);
// remove background/light
cell_no_light = removeLight(cell_no_noise, calculateLightPattern(cell),2);
// binarize image


After that I try to identify the area that I did not want. As it is a grid, numbers are surrounded by lines. So I filtered areas that are long enough:

int cell_height = cell.rows;
int cell_width = cell.cols;

// setting parameters for long lines filtering
float percent = 0.23;
float width_threshold = cell_width - cell_width * percent;
float height_threshold = cell_height - cell_height * percent;

if(width > width_threshold ) continue;
if(height > height_threshold) continue;


I have done the same for small areas.

if(boundingArea < 220 || boundingArea > 900) continue;
if(area < 110) continue; // area of the connected object


You will find the source code here if you want to play with ! I will be very thankful if you could find something that will improve the project especially in terms of performance as I'm not a cpp/opencv developper.

more

Maybe you can manually set the region on the corner& border in the resulting image to black.

e.g.

image[0:r, 0:w] = 0

Do it 4 times: for upper ,lower, right, and left borders. That way, you kind of eliminate things at the border, and focus on the middle.

EDIT :

Set the region to black = Turn the pixels to black..... using image[0:r, 0:w] = 0

The setting region depends on the size of your picture. e.g. your picture has 300 rows and 500 columns.

Now , you want to eliminate the upper border. e.g.

image[0:20,0:500]=0


It will set all pixels from (row 0 to row 20, and col 0 to col 500 into black) -> which is the upper border to black, hence "eliminates" the upper border.

Note that the 20 is again depend on your picture's size, till where you want to eliminate. Be careful on this number, so that you do not eliminate the number too.

Do this for the left, right, bottom border too! And at the end , you will have eliminate all the white colors except the number itself.

more

Thanks for your answer but what is set the region ? sorry but I'm not a cpp/opencv developper so i'm struggling a bit as you can see ! I managed to isolate some number but even with that the recognition is really bad Is it possible to focus only on numbers ?

( 2016-09-21 04:15:34 -0500 )edit

@BenNG I edited my answer. I believe it is clear now

( 2016-09-23 01:07:26 -0500 )edit

Thank you Alienmon for your answer ! unfortunately I can't rely on this trick too much because I can have edge cases like the "4" I have just added ! I currently try to recognize long line that took all the width or height and delete them with the help of "connectedComponentsWithStats"

( 2016-09-23 01:29:21 -0500 )edit

Official site

GitHub

Wiki

Documentation