Ask Your Question
0

Noise removal from scanned documents

asked 2014-10-01 17:16:03 -0600

Peter55555 gravatar image

Hi,

I'm just beginning with OpenCV and image processing. I need to write program that processing scanned documents (they are sometimes old) and I need to recognize some objects. I'd like to ask you for help where to search and what functions should I use for beginning processing. I think about removal every noise. My scanned documents are only in gray colors (written in black or similar on bright background).

1) What function should I use to remove noises from this kind of scanned documents? I've found fastNlMeansDenoising. Is it ok for this puropse?

2) Some documents can have ... (I don't know how it is in English language). I mean that we can see undesirable effect that there is 'shadow' of other side of the paper. What is the best method to remove this effect? "treeshold" is a good idea?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
3

answered 2014-10-01 22:28:49 -0600

rwong gravatar image

Yes, thresholding, also known as binarization, is the right idea. All of the artifacts (defects) you mentioned are considered in the construction and evaluation of binarization algorithms, because they are created for the purpose of digitizing printed matters, and therefore they have to deal with every kind of practical issues.

When reading research papers on binarization, one should make the following distinctions in terms of their applicability to your own needs:

  • Whether the paper targets pristine pages (e.g. book pages printed in modern times), or degraded/weathered pages, or ancient articles
  • Whether the paper targets binarization of machine-printed text (including typesetting text that was produced before the computerization era), or hand-written text.

You can read about academic research from the following sources:

  • ICDAR (International Conference on Document Analysis and Recognition)
  • IJDAR (International Journal on Document Analysis and Recognition)
  • Various other image processing research venues, such as CVPR, ICASSP, SIGGRAPH, PAMI, etc.

There is a notable algorithm competition, known as DIBCO (Document Image Binarization Contest), which was held in year 2009 and 2011. During these two contests, a very large number of algorithms created by researchers and commercial entities all over the world are evaluated systematically. All of the evaluation results are available online.

Because of proprietary reasons, this is all I can say. I will not be able to comment any further beyond this. Good luck with your findings.

edit flag offensive delete link more

Comments

You don't need a binarization for object detection! It really matters what and how you want to proceed. Often simple Gaussian-blur does the job, more advanced methods like NL-means are of course better for qualitative results.

Guanta gravatar imageGuanta ( 2014-10-02 02:24:23 -0600 )edit

@Guanta: it seems like object detection is in a different question, #43291.

rwong gravatar imagerwong ( 2014-10-02 03:35:02 -0600 )edit

Question Tools

Stats

Asked: 2014-10-01 17:16:03 -0600

Seen: 2,324 times

Last updated: Oct 01 '14