Ask Your Question
0

Extracting region(s) from parking ticket

asked 2014-06-07 14:50:47 -0600

tleyden gravatar image

updated 2014-06-09 10:27:16 -0600

I want to extract several regions containing text from a parking ticket like:

http://cl.ly/image/450h2z3t3j17

For personal privacy reasons, I've doctored the image and removed a few fields by drawing a box and deleting. (eg, the Citation and VIN fields both have a white box where the text would normally be)

How would I extract the text for the Date field (04/21/2014) or any of the other fields (Time Issued, Year, etc)? The reason I want to extract the text is to feed into Tesseract OCR, so I can index the document on these fields.

I'm new to opencv, can someone point me in the right direction on how to do this?

edit retag flag offensive close merge delete

3 answers

Sort by » oldest newest most voted
0

answered 2014-06-09 16:22:52 -0600

You can use the "Scene Text Detection" code under the object detection module:

http://docs.opencv.org/trunk/modules/objdetect/doc/erfilter.html

edit flag offensive delete link more
0

answered 2014-06-09 18:02:56 -0600

Witek gravatar image

updated 2014-06-10 10:01:42 -0600

1 Scan your ticket with a flatbed scanner and use black background.

1.1 If you cannot scan, make sure the ticket is flat and take a picture. You can flatten the ticket by putting it under a flat glass, but then you need to make sure there are no reflections (shoot at an angle and see 3.1)

2 Threshold your image to get the region of your ticket.

3 Use findContours, then find minAreaRectangle of the largest contour and straighten your ticket by:

3.1. Get 4 corner points and warpPerspective your ticket to a predefined size.

3.2. If you scan you can simply counter rotate your ticket by your RotatedRect.angle, however if you put your ticket carefully, it should be straight enough, so you can skip the entire point no 3.

4 Having a straight ticket of known size, set ROIs to afore measured regions of interest and pass them to tesseract.

edit flag offensive delete link more

Comments

The ticket's will be photographed by mobile devices, do you think this approach will still work -- or would you recommend a different approach in that case?

tleyden gravatar imagetleyden ( 2014-06-10 09:37:23 -0600 )edit

If the tickets are flat enough and photographed on a dark background and the camera is not distorting the image too much, I think it should work.

Witek gravatar imageWitek ( 2014-06-10 10:01:10 -0600 )edit
0

answered 2014-06-09 09:09:52 -0600

Haris gravatar image

It seems the image you provide is with alpha channel, where as you said the white region is complete transparent(alpha = 0), so just do the following.

  • Load source with alpha, you should set flags = -1 on imread().

  • Now split image in to four channel.

    Mat splitedBGRA[4];
    split(src,splitedBGRA);
    

Note that OpenCV channel order is BGRA.

See the result I got with the above algorithm.

image description

edit flag offensive delete link more

Comments

Haris thanks so much for your reply. I should have mentioned though that I doctored the image to remove a few fields for personal privacy reasons. I just updated the original question to be clearer:

I've doctored the image and removed a few fields by drawing a box and deleting. (eg, the Citation and VIN fields both have a white box where the text would normally be)

How would I extract the text for the Date field (04/21/2014) or any of the other fields (Time Issued, Year, etc)?

tleyden gravatar imagetleyden ( 2014-06-09 10:29:52 -0600 )edit

See the answer here might be helpful.

Haris gravatar imageHaris ( 2014-06-09 11:47:05 -0600 )edit

Question Tools

2 followers

Stats

Asked: 2014-06-07 14:50:47 -0600

Seen: 1,317 times

Last updated: Jun 10 '14