Ask Your Question

Java OpenCV + Tesseract OCR “code” regocnition

asked 2013-08-02 14:49:39 -0500

TimK gravatar image

I'm trying to automate a process where someone manually converts a code to a digital one.

image description

Then I started reading about OCR. So I installed tesseract OCR and tried it on some images. It doesn't even detect something close to the code.

I figured after reading some questions on stackoverflow, that the images need some preprocessing like skewing the image to a horizontal one, which can been done by openCV for example.

Now my questions are:

  • What kind of preprocessing or other methods should be used in a case like the above image?
  • Secondly, can I rely on the output? Will it always work in cases like the above image?

I hope someone can help me!

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted

answered 2014-06-21 14:12:53 -0500

tleyden gravatar image

Using Tesseract via OpenOCR running on Google Compute Engine, I OCR'd your original image and got the following output:

E' ,‘YHwacpMTDCH ; 3?". ‘ V‘L"~m> I shah-r}. I’VMU' i 5: 1“”. A"

I then tried to pre-process it via Stroke Width Transform using this docker image and the following command:

cd /opt/DetectText && ./DetectText 1375472915202212.png out.png 1

Which resulted in this pre-processed image:

image description

When I re-ran Tesseract on it, I got this output:


which isn't perfect, but is a pretty big improvement.

edit flag offensive delete link more

Question Tools



Asked: 2013-08-02 14:49:39 -0500

Seen: 4,348 times

Last updated: Jun 21 '14