Ask Your Question

character recognition

asked 2016-09-21 02:34:39 -0600

BenNG gravatar image

updated 2016-09-23 01:25:11 -0600

I almost reach my goal which is extract data from sudoku puzzle.

  • I found the 4 vertices of the puzzle in a picture
  • Isolate the puzzle in a new Mat

right new, I have these kind of images which represent a cell of the puzzle:

image description image description

image description image description

image description image description

image description image description

image description image description

image description image description

image description image description

For the next step, I would like to "focus on the middle" because I think that the ocr engine focus on the border. Is there a way to focus on the middle or will I have to find contour and focus on the contour close to the middle ? What is the next step for you ?

I know that there are fews sudoku grabber articles on the web but I would like to not copy/paste without understanding

Thank you !

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted

answered 2016-09-23 05:21:28 -0600

BenNG gravatar image

I finally made it !

I pre-processed the cell images like that:

// remove noise
medianBlur(cell, cell_no_noise, 1);
// remove background/light
cell_no_light = removeLight(cell_no_noise, calculateLightPattern(cell),2);
// binarize image
adaptiveThreshold(cell_no_light, final_cell, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY_INV, 3, 1);

After that I try to identify the area that I did not want. As it is a grid, numbers are surrounded by lines. So I filtered areas that are long enough:

int cell_height = cell.rows;
int cell_width = cell.cols;

// setting parameters for long lines filtering
float percent = 0.23;
float width_threshold = cell_width - cell_width * percent;
float height_threshold = cell_height - cell_height * percent;

if(width > width_threshold ) continue;
if(height > height_threshold) continue;

I have done the same for small areas.

if(boundingArea < 220 || boundingArea > 900) continue;
if(area < 110) continue; // area of the connected object

You will find the source code here if you want to play with ! I will be very thankful if you could find something that will improve the project especially in terms of performance as I'm not a cpp/opencv developper.

edit flag offensive delete link more

answered 2016-09-21 02:47:08 -0600

alienmon gravatar image

updated 2016-09-23 01:10:43 -0600

Maybe you can manually set the region on the corner& border in the resulting image to black.


image[0:r, 0:w] = 0

Do it 4 times: for upper ,lower, right, and left borders. That way, you kind of eliminate things at the border, and focus on the middle.


Set the region to black = Turn the pixels to black..... using image[0:r, 0:w] = 0

The setting region depends on the size of your picture. e.g. your picture has 300 rows and 500 columns.

Now , you want to eliminate the upper border. e.g.


It will set all pixels from (row 0 to row 20, and col 0 to col 500 into black) -> which is the upper border to black, hence "eliminates" the upper border.

Note that the 20 is again depend on your picture's size, till where you want to eliminate. Be careful on this number, so that you do not eliminate the number too.


Do this for the left, right, bottom border too! And at the end , you will have eliminate all the white colors except the number itself.

edit flag offensive delete link more


Thanks for your answer but what is set the region ? sorry but I'm not a cpp/opencv developper so i'm struggling a bit as you can see ! I managed to isolate some number but even with that the recognition is really bad Is it possible to focus only on numbers ?

BenNG gravatar imageBenNG ( 2016-09-21 04:15:34 -0600 )edit

@BenNG I edited my answer. I believe it is clear now

alienmon gravatar imagealienmon ( 2016-09-23 01:07:26 -0600 )edit

Thank you Alienmon for your answer ! unfortunately I can't rely on this trick too much because I can have edge cases like the "4" I have just added ! I currently try to recognize long line that took all the width or height and delete them with the help of "connectedComponentsWithStats"

BenNG gravatar imageBenNG ( 2016-09-23 01:29:21 -0600 )edit

Question Tools

1 follower


Asked: 2016-09-21 02:34:39 -0600

Seen: 533 times

Last updated: Sep 23 '16