Ask Your Question

How can I extract handwritten text from lined paper without the noise caused by the lines to use in a text detection algorithm?

asked 2012-10-03 10:31:52 -0600

ZachTM gravatar image

I have been using just a test piece of paper while learning opencv. So far I have taken an image like this: image description

I identify the corners of the page, perform a perspective transform. Subtract a mean shift filtered version of the image to the original: image description

This gives me a white page with very little shadow: image description

This fixed any problems I was having with adaptive threshold. The paper can be any size and in any lighting so I think this step really cleaned up alot of the problems I was having.

The big problem i have now, is that sometimes images will have lines in them and can cause alot of noise as shown image description

I think I can get rid of all the other noise easily but its just that when alot of the letters on the page are very small (in between the lines on the page) it really makes it hard to separate everything. In an ideal world I would like to just have a blank background with all of the written letters and symbols on it, and none of the noise and lines. I do not know if this is possible, but if anyone has an idea on how I can get closer to that I would be extremely grateful. The letters will always have whitespace separating them from another letter so a Complex text recognition algorithm would not be needed in this case if i manage to get all the noise gone. Thanks for your time!

edit retag flag offensive close merge delete



I cannot see the images, does it happen to anybody else?

elmiguelao gravatar imageelmiguelao ( 2012-10-04 08:23:34 -0600 )edit

They are hosted publicly on dropbox so im pretty sure this is just on your end.

ZachTM gravatar imageZachTM ( 2012-10-04 16:27:12 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2012-10-04 08:56:17 -0600

Ben gravatar image

It's a hard task to extract informations from images as noisy as your last binarized one. Basically, what you need, is a clever binarization algorithm. You could try using adaptiveThreshold() or Otsu's algorithm for example. The problem is, that your image has a very weak contrast. Having lines on your paper doesn't make it easier of course. Maybe by using a thicker pen, you could get rid of the lines with erode()

edit flag offensive delete link more


I am new to computer vision so I really wasnt sure. You sound right so I will mark this as correct. I am going to try other methods. Thank you for pointing me in a new direction!

ZachTM gravatar imageZachTM ( 2012-10-04 16:31:07 -0600 )edit

Question Tools


Asked: 2012-10-03 10:31:52 -0600

Seen: 2,436 times

Last updated: Oct 04 '12