Not able to extract text from attached image

asked 2016-10-04 05:16:07 -0500

image description

Hi All,

I'm new to openCV. From colored image first I did smoothening of the image and then I converted the image to greyscale, but still, I'm not able to extract English test from Image. Could somebody please help me.

Regards, Umesh

edit retag flag offensive close merge delete

Comments

This is one of the most solved problems ever. Take a look at OCR techniques!

StevenPuttemans gravatar imageStevenPuttemans ( 2016-10-04 06:18:03 -0500 )edit

Hey Steven, I'm using Tesseract only to extract text from Image. For few similar kind of Image I'm able to extract text also, but somehow Tesseract is just not throwing any output for this particular image.

umeshsinha gravatar imageumeshsinha ( 2016-10-04 13:42:00 -0500 )edit

Then you should dig into that. My guess, the text for tesseracht to work properly needs to be binary and aligned. Start by doing that.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-10-05 04:05:03 -0500 )edit

Hey Steven, some real good progress. After aligning I'm able to extract following text data.

INCOMETAX DEPARTMENT *1- GOVT. OF INDIA
I bump pum
' MOHAN PURI :
0410211995 -
Permanent Account Number
. CKBPPGIBOM a
3.5%.? ’“‘“ , \ 1..

I'm fine with last line junk data and some issue in forth line where / get converted to 1... but how I can fix the output of the second line ... "I bump pum" and second last line "CKBPPGIBOM". Where ideally data should be DILIP PURI and CKBPP8160M.

And yes, thanks for teaching be OCR :)

Regards, Umesh

umeshsinha gravatar imageumeshsinha ( 2016-10-05 06:57:31 -0500 )edit

The problem is that OCR is not failproof ... also my experiencew with it is quite low, but I am guessing you need to make a binary skeletized image to obtain better results.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-10-05 07:01:09 -0500 )edit