Ask Your Question
-1

Treating broken characters in image to improve OCR accuracy

asked 2019-11-21 10:18:52 -0600

Hi ,

I have lots of images in my dataset which looks like image description (OCR libraries like tesseract is not able to OCR these images and printing gibberish)

applying some processing on above image I am able to improve it to something like image description for which I am getting output through tesseract as 27837 "

I have used the cv2.Thresh_Binary function with threshold value as 80 to make first image look like 2nd.( Median Blur ,Gaussian Blur, OTSU_binarisation , Morphological trasformations etc didn't worked in general for all images like these as character size is small and they make the images fizzy which again tesseract is not able to OCR ) transformed_img = cv2.threshold(input_img, 80,255, cv2.THRESH_BINARY)[1]

Can someone suggest a better method to treat these kind of images so that it can inturn improve the accuracy of OCR.

Thanks

edit retag flag offensive close merge delete

Comments

You can threshold to separate them then create you own OCR version by training those images.

(Or retrain tesseract).

Ziri gravatar imageZiri ( 2019-11-21 19:06:19 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2020-03-03 10:52:44 -0600

Harisha, You may also some good ideas on increasing OCR accuracy here: https://www.bisok.com/grooper-data-ca... Good luck

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2019-11-21 10:18:52 -0600

Seen: 547 times

Last updated: Nov 21 '19