Basic image enhancing for OCR purproses

asked 2013-03-11 14:01:35 -0600

Scheidecker
31 ●1 ●1 ●4

Greeting all!

I am hoping to get some pointers or samples with OpenCV/javaCV to accomplish some image enhancing optimizations for getting OCR performance better.

I would use JavaCV to perform my OpenCV needs as I must accomplish it inside a Java app.

Here is what I would like to do:

1) Deskewing. Some of the scanned images are not aligned properly on the scanner bed, hence the text and images have some skew. How to detect that and how to auto adjust it with JavaCV?

2) Black border removal: Maybe Hough Transform algorithm or some sample withing the javaCV wiki can help me detecting those and removing (ie filling it with the background color: white)

3) Image Binarization. Ideally, I want the image to be a perfect black and white (0, 255) for OCR performance. How to do it via JavaCV?

4) Punch holes removal from scanned papers. Looks like a Circular Hough detection here would help. I need to remove the punch holes, that is fill them with the background color.

5) Skeletonize the image so that the characters are thinner and more precise for OCR process. How can I do that? This is particular useful for handwritten characters.

6) Noise removal. Back-to-front interference removal from old papers or newspapers scannes.

7) Remove image compression if any. How can you detect that there is compression and how to remove it?

So, are there any examples on how to accomplish this with javaCV that anyone could point me to? Basically, I am trying to learn how to write an image pre-processing set of filters that can be applied automatically to images.

Any pointers are welcome.

Thanks in advance,

Carlos.

edit retag flag offensive close merge delete

add a comment

answered 2013-03-12 02:55:33 -0600

Mathieu Barnachon

4678 ●18 ●53 http://www.math-barnac...

Your question has many sub-question, I will try to get pointer for each point.

See getRotationMatrix2D and this sample.
No idea, but you seem to know how to do...
Use threshold function, where each pixel below thresh will be white, and pixels above thresh will be black.
You can detect each hole with Hough or use a simple medianBlur for example. It depends on your hole, maybe with an image sample, we could have more relevant idea.
See this post where the thinning methods seem adapted to your context.
A simple filter could be tested, but, you probably have to do some bibliography on that topic (look at research paper, which could be coded with OpenCV features.
I don't understand what you mean here. Filter JPEG blocks? See if the image was in JPEG? Please give some more explanations.

To use every OpenCV functions with the java binding, see the doc here.

edit flag offensive delete link

add a comment

Basic image enhancing for OCR purproses

1 answer

Links

Question Tools

Stats

Related questions

Basic image enhancing for OCR purproses edit

1 answer

Links

Question Tools

Stats

Related questions

Basic image enhancing for OCR purproses