Ask Your Question

Handwritten digit recognition from table

asked 2019-12-07 06:02:19 -0500

daneel95 gravatar image

updated 2020-01-03 06:50:06 -0500

supra56 gravatar image


The big picture of my project is that I have a paper with multiple tables and I need to recgnize those table and all of their cells. I did this no problem.

Then, on one of the tables there are 2 choices you can make, writing a number from 1 to 4 in a box above the table. I managed to recognize the box containing the digit as shown in the attachment (I used threshold canny and countours to find the boxes)

Now what I need to do is recognize the digit in that given box. I tried to train some models on mnist dataset but I don't know how to properly preprocess my images (the boxes) so that such a model may be able to properly predict the digit.

Any idea of how I can approach this problem?


extracted 1

extracted 2

extracted 3

EDIT: For anyone wondering how I managed to do it. I took the box with the digit inside of it. My goal was to make it look as much as possible as a mnist digit so what I had to do was get rid of the border. For the border I used 2 structural elements and I created a mask that I applied over the initial image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
inv = 255 - thresh
horizontal_img = inv.copy()
vertical_img = inv.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (100, 10))
horizontal_img = cv2.erode(horizontal_img, kernel, iterations=1)
horizontal_img = cv2.dilate(horizontal_img, kernel, iterations=2)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10, 100))
vertical_img = cv2.erode(vertical_img, kernel, iterations=1)
vertical_img = cv2.dilate(vertical_img, kernel, iterations=3)
mask_img = horizontal_img + vertical_img
no_border = np.bitwise_or(thresh, mask_img)

Then I transformed all the pixels on the margin to a certain threshold in black pixels because there was a bit of noise that was remaining in some cases

w, h = image.shape
threshold = 20
image[0:threshold, 0:h] = 255
image[w - threshold:w, 0:h] = 255
image[0:h, 0:threshold] = 255
image[0:h, h - threshold:h] = 255

The last thing I did was train a model on mnist dataset where I chose only the digits corresponding to 1, 2,3 and 4 (as those were the only ones I was in need of). The model is a simple Linear SVC (I used hog transform on the image for the feature array) which has about 99% accuracy on mnist test data.

Unfortunately the above approach has only about 87% accuracy (on the 149 test papers I have). Also there is a problem with the mask, sometimes it may break some ones finding a line where it is not supposed to be.

edit retag flag offensive close merge delete


btw, you probably want to retrain your mnist model on 4 digits only

berak gravatar imageberak ( 2019-12-07 11:14:29 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2019-12-07 07:34:28 -0500

berak gravatar image

basically, 3 steps, (assuming your image is 'ocv'):

# 1. invert it. mnist needs white numbers on white bg
ocv = ~ocv

# 2. crop the inner rectangle:
BH = 19 # border
BV = 21 # border
ocv = ocv[BH:-BH, BV:-BV]

#3. resize to what mnist wants:
ocv = cv2.resize(ocv, (28,28))

image description

edit flag offensive delete link more




The solution is interesting but it will work only for some of the cases because the ones I need to test my solution on are ... pretty badly written digits. Some examples of processed images are below.





Those problems may appear because of the way I chose to get the boxes:

daneel95 gravatar imagedaneel95 ( 2019-12-07 09:18:40 -0500 )edit


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
edges = cv2.Canny(gray, 30, 200)
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

# get bottom region
rect_bottom = cv2.boundingRect(contours[0])
x, y, w, h = rect_bottom
bottom_box = thresh[y: y + h, x: x+w]

# get top region
rect_top = cv2.boundingRect(contours[1])
x, y, w, h = rect_top
top_box = thresh[y: y + h, x: x + w]

if np.sum(top_box == 0) > np.sum(bottom_box == 0):
    # if more black pixels on top then the number in on top side
    # else it is on bot side
daneel95 gravatar imagedaneel95 ( 2019-12-07 09:19:50 -0500 )edit

because of the way I chose to get the boxes:

well you never showed us, so there cannot ever be an exact answer to your fairly vague question

berak gravatar imageberak ( 2019-12-07 09:21:07 -0500 )edit

@daneel95. Your code is doing wrong thing. Don't set threshold value too high. Set it between 50 to 127. And this doesn't do anything if np.sum(top_box == 0) > np.sum(bottom_box == 0):. It should be top_box > 0 and bottom_box > 0:

Snippet code:

for i, ctr in enumerate(sorted_ctrs): 
    # Get bounding box 
    x, y, w, h = cv2.boundingRect(ctr) 

    # Getting ROI 
    roi = image[y:y+h, x:x+w] 

    #cv2.imshow('segment no:'+str(i),roi) 
    cv2.rectangle(image,(x,y),( x + w, y + h ),(0,255,0),2) 

    if w > 15 and h > 15: 
        cv2.imwrite('{}.png'.format(i), roi)

cv2.imshow('Marked Numbers', image)

@berak said, you never showed us. Sadly, we can't help you.

supra56 gravatar imagesupra56 ( 2019-12-07 10:09:27 -0500 )edit

@supra56 Thanks for the answer. The problem is not finding the boxes with the numbers or where the number is. The problem is finding the exact number without any noise (which may not even be possible).

Now, I ommited some information as I found it not necessary but now I will add here all the information I have:

  1. An example of an image I start from: (used imgur as the forum doesn't let me upload that file ...)
  2. The full code I have now: Break the image in 3 parts, get the bottom right one (as it is the one I need to handle right now), try to get the selection:

It is a lot of code with a lot of things not really necessary for the question but it may help.

daneel95 gravatar imagedaneel95 ( 2019-12-07 10:28:42 -0500 )edit

So you wanted to put box in 1 and box in x?

supra56 gravatar imagesupra56 ( 2019-12-07 10:41:10 -0500 )edit

@supra56 Nope, so the goal is to score that test example. To do that I need to get the 2 tables in the image: bottom left and bottom right. Bottom left is simple, I can get the X location no sweat. Bottom right it is simple I managed to get the table and get X locations no problem. The trick here is that on bottom right table there is a choice that the one taking the exam must make: choose between the 2 subjects above the table. Now when you choose between the 2 subjects you must write there 1, 2, 3 or 4 depending on the number your exam paper has.

To be able to properly score it I need to know what number there is in the choice.

As you can see I did like 90% of the work but can't properly handle the digit recognition because ...(more)

daneel95 gravatar imagedaneel95 ( 2019-12-07 10:50:18 -0500 )edit

As fact as I understand. This is what you looking for multiple choice scanner. Unfortunately, I'm unable here to help, because of Xmas season. I will try if I have a time.

supra56 gravatar imagesupra56 ( 2019-12-07 11:10:13 -0500 )edit

@Supra not really, I managed to do that part. All I need is to properly handle the digit inside the 2 boxes above the right side table :) but thanks for the link.

daneel95 gravatar imagedaneel95 ( 2019-12-07 11:15:39 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-12-07 06:02:19 -0500

Seen: 223 times

Last updated: Jan 03