Extracting data from scanned documents from predefined tables with known position

asked 2018-07-23 13:54:16 -0600

I have a table, for example, a 5x5 grid on a piece of paper. In each box in that grid there is something written (doesn't matter what).

What I need is to extract that data from that grid. I created that grid/table so I exactly know its position, its rotation, etc. I have read some materials, and also a few questions on SO so I know the basic steps to recognise and extract data.

I would like to know, can the fact that I know the position of grid/table (since I created it) help me in the process of recognising it?

Also, small additional question. I noticed that in a process of creating binary image people mostly create inverted black and white image (i.e. black text becomes white, paper becomes black). Why? I know it probably helps the detection, but I am interested in a reasons. I cannot see why would the value 0 be better that 255 or vice versa. Both are extremes, just opposite. Maybe if you can point to some papers or similar.

edit retag flag offensive close merge delete


e.g. findContours() expects white things on black bg

berak gravatar imageberak ( 2018-07-23 23:59:23 -0600 )edit

As @berak mentioned. Also I assume the reason why 255 (white) is used for data, is probably due to the similarity of binary data. 0 = no data, 1 = data. So in this case black means no data.

MikeyR gravatar imageMikeyR ( 2018-07-25 08:50:42 -0600 )edit