# Handwritten digit recognition from table

Hi.

The big picture of my project is that I have a paper with multiple tables and I need to recgnize those table and all of their cells. I did this no problem.

Then, on one of the tables there are 2 choices you can make, writing a number from 1 to 4 in a box above the table. I managed to recognize the box containing the digit as shown in the attachment (I used threshold canny and countours to find the boxes)

Now what I need to do is recognize the digit in that given box. I tried to train some models on mnist dataset but I don't know how to properly preprocess my images (the boxes) so that such a model may be able to properly predict the digit.

Any idea of how I can approach this problem?

Thanks

EDIT: For anyone wondering how I managed to do it. I took the box with the digit inside of it. My goal was to make it look as much as possible as a mnist digit so what I had to do was get rid of the border. For the border I used 2 structural elements and I created a mask that I applied over the initial image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
inv = 255 - thresh
horizontal_img = inv.copy()
vertical_img = inv.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (100, 10))
horizontal_img = cv2.erode(horizontal_img, kernel, iterations=1)
horizontal_img = cv2.dilate(horizontal_img, kernel, iterations=2)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10, 100))
vertical_img = cv2.erode(vertical_img, kernel, iterations=1)
vertical_img = cv2.dilate(vertical_img, kernel, iterations=3)


Then I transformed all the pixels on the margin to a certain threshold in black pixels because there was a bit of noise that was remaining in some cases

w, h = image.shape
threshold = 20
image[0:threshold, 0:h] = 255
image[w - threshold:w, 0:h] = 255
image[0:h, 0:threshold] = 255
image[0:h, h - threshold:h] = 255


The last thing I did was train a model on mnist dataset where I chose only the digits corresponding to 1, 2,3 and 4 (as those were the only ones I was in need of). The model is a simple Linear SVC (I used hog transform on the image for the feature array) which has about 99% accuracy on mnist test data.

Unfortunately the above approach has only about 87% accuracy (on the 149 test papers I have). Also there is a problem with the mask, sometimes it may break some ones finding a line where it is not supposed to be.

edit retag close merge delete

btw, you probably want to retrain your mnist model on 4 digits only

( 2019-12-07 11:14:29 -0500 )edit

Sort by » oldest newest most voted

basically, 3 steps, (assuming your image is 'ocv'):

# 1. invert it. mnist needs white numbers on white bg
ocv = ~ocv

# 2. crop the inner rectangle:
BH = 19 # border
BV = 21 # border
ocv = ocv[BH:-BH, BV:-BV]

#3. resize to what mnist wants:
ocv = cv2.resize(ocv, (28,28))


more

1

@berak

The solution is interesting but it will work only for some of the cases because the ones I need to test my solution on are ... pretty badly written digits. Some examples of processed images are below.

Those problems may appear because of the way I chose to get the boxes:

( 2019-12-07 09:18:40 -0500 )edit

Code:

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
edges = cv2.Canny(gray, 30, 200)
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

# get bottom region
rect_bottom = cv2.boundingRect(contours[0])
x, y, w, h = rect_bottom
bottom_box = thresh[y: y + h, x: x+w]

# get top region
rect_top = cv2.boundingRect(contours[1])
x, y, w, h = rect_top
top_box = thresh[y: y + h, x: x + w]

if np.sum(top_box == 0) > np.sum(bottom_box == 0):
# if more black pixels on top then the number in on top side
else:
# else it is on bot side

( 2019-12-07 09:19:50 -0500 )edit
1

because of the way I chose to get the boxes:

well you never showed us, so there cannot ever be an exact answer to your fairly vague question

( 2019-12-07 09:21:07 -0500 )edit

@daneel95. Your code is doing wrong thing. Don't set threshold value too high. Set it between 50 to 127. And this doesn't do anything if np.sum(top_box == 0) > np.sum(bottom_box == 0):. It should be top_box > 0 and bottom_box > 0:

Snippet code:

for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)

# Getting ROI
roi = image[y:y+h, x:x+w]

#cv2.imshow('segment no:'+str(i),roi)
cv2.rectangle(image,(x,y),( x + w, y + h ),(0,255,0),2)

if w > 15 and h > 15:
cv2.imwrite('{}.png'.format(i), roi)

cv2.imshow('Marked Numbers', image)
cv2.waitKey(0)


( 2019-12-07 10:09:27 -0500 )edit
1

@supra56 Thanks for the answer. The problem is not finding the boxes with the numbers or where the number is. The problem is finding the exact number without any noise (which may not even be possible).

Now, I ommited some information as I found it not necessary but now I will add here all the information I have:

1. An example of an image I start from: https://i.imgur.com/Ws9UKII.png (used imgur as the forum doesn't let me upload that file ...)
2. The full code I have now: Break the image in 3 parts, get the bottom right one (as it is the one I need to handle right now), try to get the selection: https://pastebin.com/5DyTh8Ln

It is a lot of code with a lot of things not really necessary for the question but it may help.

( 2019-12-07 10:28:42 -0500 )edit

So you wanted to put box in 1 and box in x?

( 2019-12-07 10:41:10 -0500 )edit
1

@supra56 Nope, so the goal is to score that test example. To do that I need to get the 2 tables in the image: bottom left and bottom right. Bottom left is simple, I can get the X location no sweat. Bottom right it is simple I managed to get the table and get X locations no problem. The trick here is that on bottom right table there is a choice that the one taking the exam must make: choose between the 2 subjects above the table. Now when you choose between the 2 subjects you must write there 1, 2, 3 or 4 depending on the number your exam paper has.

To be able to properly score it I need to know what number there is in the choice.

As you can see I did like 90% of the work but can't properly handle the digit recognition because ...(more)

( 2019-12-07 10:50:18 -0500 )edit

As fact as I understand. This is what you looking for multiple choice scanner. Unfortunately, I'm unable here to help, because of Xmas season. I will try if I have a time.

( 2019-12-07 11:10:13 -0500 )edit
1

@Supra not really, I managed to do that part. All I need is to properly handle the digit inside the 2 boxes above the right side table :) but thanks for the link.

( 2019-12-07 11:15:39 -0500 )edit

Official site

GitHub

Wiki

Documentation