Scanning Dates and Pertinent Numbers from Degraded Documents

asked 2018-03-21 11:41:44 -0600

Joaqim gravatar image

updated 2018-03-21 11:48:41 -0600

Hi, I have a task in digitizing old scanned Documents (Filled forms with either hand-written, type-writ or stamped. with the relevant dates. It would be a tedious task to open every Document and read the Dates manually.

I'm hoping to be able to benefit from the fact that A LOT of Documents are already labeled with the relevant Dates. These Documents follow a pattern by time-period; the forms change over time (The Documents are filled between 1920 and 1985~ ). But there are 3-4 major fill-form patterns.

The stamps are arbitrarily stamped on most pages, most often overlapping whatever is underneath, making it tricky to process. I'v succeeded in detecting stamps on the page as long as they don't intersect with anything. This makes a good start since I can now process large amounts of mostly legible stamps and assign labels.

There are maybe 5 different stamps used over different time periods.

I have high hopes of successfully using SVM for handwriting and digit recognition for processing. I'm just hoping for advice on how to use it to train a Neural Network to learn where the digits are in the page or the context (like a stamp).

The Documents contain multiple Dates, many of them ( I.E irrelevant dates ) can easily be discarded afterwards. I just need to find as many legible digits as possible, each in their context, so I can decipher the relevant information.

Example of one kind of Form-fill (not a stamp): Form-Fill

Degraded stamp, hand-written dates: Degraded stamp, hand-written dates

Degraded stamp, not legible by machine: Degraded stamp; not legible.

edit retag flag offensive close merge delete


please add images

berak gravatar imageberak ( 2018-03-21 11:44:18 -0600 )edit