Ask Your Question
0

Remove all background and show only words for OCR

asked 2015-10-02 08:25:28 -0500

fredreload gravatar image

So I am currently working on OCR with tesseract-ocr, the problem it does not recognize the words with a green background. So I need to do pre-processing by making the background completely white and leaving "only" the black words for tesseract OCR. I would like to know how to do that for the image and if there are other pre-processing techniques I can use to raise the accuracy. Thanks.

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
2

answered 2015-10-02 09:03:15 -0500

David_86 gravatar image

If your letters are always black and it's the background changing colour you can convert the image into gray-scale and perform a binary threshold. You'll turn white everything above the threshold level.

edit flag offensive delete link more

Comments

Cool, I did an Otsu threshold and it works out nicely, also note that increasing the font also helps generate a more accurate OCR.

fredreload gravatar imagefredreload ( 2015-10-04 07:31:33 -0500 )edit

Can you share some python code to do the same ?

sundeep gravatar imagesundeep ( 2018-11-30 02:31:11 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2015-10-02 08:25:28 -0500

Seen: 1,101 times

Last updated: Oct 02 '15