Revision history [back]

Basic image enhancing for OCR purproses

Greeting all!

I am hoping to get some pointers or samples with OpenCV/javaCV to accomplish some image enhancing optimizations for getting OCR performance better.

I would use JavaCV to perform my OpenCV needs as I must accomplish it inside a Java app.

Here is what I would like to do:

1) Deskewing. Some of the scanned images are not aligned properly on the scanner bed, hence the text and images have some skew. How to detect that and how to auto adjust it with JavaCV?

2) Black border removal: Maybe Hough Transform algorithm or some sample withing the javaCV wiki can help me detecting those and removing (ie filling it with the background color: white)

3) Image Binarization. Ideally, I want the image to be a perfect black and white (0, 255) for OCR performance. How to do it via JavaCV?

4) Punch holes removal from scanned papers. Looks like a Circular Hough detection here would help. I need to remove the punch holes, that is fill them with the background color.

5) Skeletonize the image so that the characters are thinner and more precise for OCR process. How can I do that? This is particular useful for handwritten characters.

6) Noise removal. Back-to-front interference removal from old papers or newspapers scannes.

7) Remove image compression if any. How can you detect that there is compression and how to remove it?

So, are there any examples on how to accomplish this with javaCV that anyone could point me to? Basically, I am trying to learn how to write an image pre-processing set of filters that can be applied automatically to images.

Any pointers are welcome.

Thanks in advance,

Carlos.