Remove all background and show only words for OCR

asked 2015-10-02

fredreload

So I am currently working on OCR with tesseract-ocr, the problem it does not recognize the words with a green background. So I need to do pre-processing by making the background completely white and leaving "only" the black words for tesseract OCR. I would like to know how to do that for the image and if there are other pre-processing techniques I can use to raise the accuracy. Thanks.

answered 2015-10-02

David_86

If your letters are always black and it's the background changing colour you can convert the image into gray-scale and perform a binary threshold. You'll turn white everything above the threshold level.

Cool, I did an Otsu threshold and it works out nicely, also note that increasing the font also helps generate a more accurate OCR.

fredreload ( 2015-10-04 )

Can you share some python code to do the same ?

sundeep ( 2018-11-30 )

