Extracting region(s) from parking ticket

OCR

asked 2014-06-07 14:50:47 -0600

tleyden
41 ●1 ●5

updated 2014-06-09 10:27:16 -0600

I want to extract several regions containing text from a parking ticket like:

For personal privacy reasons, I've doctored the image and removed a few fields by drawing a box and deleting. (eg, the Citation and VIN fields both have a white box where the text would normally be)

How would I extract the text for the Date field (04/21/2014) or any of the other fields (Time Issued, Year, etc)? The reason I want to extract the text is to feed into Tesseract OCR, so I can index the document on these fields.

I'm new to opencv, can someone point me in the right direction on how to do this?

edit retag flag offensive close merge delete

add a comment

0

answered 2014-06-09 16:22:52 -0600

GilLevi

1357 ●10 ●27 http://gilscvblog.com/

You can use the "Scene Text Detection" code under the object detection module:

http://docs.opencv.org/trunk/modules/objdetect/doc/erfilter.html

edit flag offensive delete link

add a comment

0

answered 2014-06-09 18:02:56 -0600

Witek

1156 ●1 ●9 ●19

updated 2014-06-10 10:01:42 -0600

1 Scan your ticket with a flatbed scanner and use black background.

1.1 If you cannot scan, make sure the ticket is flat and take a picture. You can flatten the ticket by putting it under a flat glass, but then you need to make sure there are no reflections (shoot at an angle and see 3.1)

2 Threshold your image to get the region of your ticket.

3 Use findContours, then find minAreaRectangle of the largest contour and straighten your ticket by:

3.1. Get 4 corner points and warpPerspective your ticket to a predefined size.

3.2. If you scan you can simply counter rotate your ticket by your RotatedRect.angle, however if you put your ticket carefully, it should be straight enough, so you can skip the entire point no 3.

4 Having a straight ticket of known size, set ROIs to afore measured regions of interest and pass them to tesseract.

edit flag offensive delete link

Comments

The ticket's will be photographed by mobile devices, do you think this approach will still work -- or would you recommend a different approach in that case?

tleyden ( 2014-06-10 09:37:23 -0600 )edit

If the tickets are flat enough and photographed on a dark background and the camera is not distorting the image too much, I think it should work.

Witek ( 2014-06-10 10:01:10 -0600 )edit

add a comment

0

answered 2014-06-09 09:09:52 -0600

Haris

3804 ●3 ●28 ●57

It seems the image you provide is with alpha channel, where as you said the white region is complete transparent(alpha = 0), so just do the following.

Load source with alpha, you should set flags = -1 on imread().

Now split image in to four channel.

Mat splitedBGRA[4];
split(src,splitedBGRA);

Note that OpenCV channel order is BGRA.

Now threshold alpha channel, you should use THRESH_BINARY_INV.
Find contour and calculate bounding box for each contour.

See the result I got with the above algorithm.

image description

edit flag offensive delete link

Comments

Haris thanks so much for your reply. I should have mentioned though that I doctored the image to remove a few fields for personal privacy reasons. I just updated the original question to be clearer:

I've doctored the image and removed a few fields by drawing a box and deleting. (eg, the Citation and VIN fields both have a white box where the text would normally be)

How would I extract the text for the Date field (04/21/2014) or any of the other fields (Time Issued, Year, etc)?

tleyden ( 2014-06-09 10:29:52 -0600 )edit

See the answer here might be helpful.

Haris ( 2014-06-09 11:47:05 -0600 )edit

add a comment

Extracting region(s) from parking ticket

3 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Extracting region(s) from parking ticket edit

3 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Extracting region(s) from parking ticket