1 | initial version |
Yes, thresholding, also known as binarization, is the right idea. All of the artifacts (defects) you mentioned are considered in the construction and evaluation of binarization algorithms, because they are created for the purpose of digitizing printed matters, and therefore they have to deal with every kind of practical issues.
When reading research papers on binarization, one should make the following distinctions in terms of their applicability to your own needs:
You can read about academic research from the following sources:
There is a notable algorithm competition, known as DIBCO (Document Image Binarization Contest), which was held in year 2009 and 2011. During these two contests, a very large number of algorithms created by researchers and commercial entities all over the world are evaluated systematically. All of the evaluation results are available online.
Because of proprietary reasons, this is all I can say. I will not be able to comment any further beyond this. Good luck with your findings.