Java OpenCV + Tesseract OCR “code” regocnition

asked 2013-08-02 14:49:39 -0600

TimK
6 ●1 ●5

I'm trying to automate a process where someone manually converts a code to a digital one.

image description

Then I started reading about OCR. So I installed tesseract OCR and tried it on some images. It doesn't even detect something close to the code.

I figured after reading some questions on stackoverflow, that the images need some preprocessing like skewing the image to a horizontal one, which can been done by openCV for example.

Now my questions are:

What kind of preprocessing or other methods should be used in a case like the above image?
Secondly, can I rely on the output? Will it always work in cases like the above image?

I hope someone can help me!

edit retag flag offensive close merge delete

add a comment

1 answer

Sort by » oldest newest most voted

answered 2014-06-21 14:12:53 -0600

tleyden
41 ●1 ●5

Using Tesseract via OpenOCR running on Google Compute Engine, I OCR'd your original image and got the following output:

E' ,‘YHwacpMTDCH ; 3?". ‘ V‘L"~m> I shah-r}. I’VMU' i 5: 1“”. A"

I then tried to pre-process it via Stroke Width Transform using this docker image and the following command:

wget http://answers.opencv.org/upfiles/1375472915202212.png
cd /opt/DetectText && ./DetectText 1375472915202212.png out.png 1

Which resulted in this pre-processed image:

image description

When I re-ran Tesseract on it, I got this output:

YH XMCDMTDC

which isn't perfect, but is a pretty big improvement.

edit flag offensive delete link

add a comment

Links

Question Tools

2 followers

Stats

Asked: 2013-08-02 14:49:39 -0600

Seen: 4,784 times

Last updated: Jun 21 '14

Java OpenCV + Tesseract OCR “code” regocnition edit

1 answer

Links

Question Tools

Stats

Related questions

Java OpenCV + Tesseract OCR “code” regocnition