Treating broken characters in image to improve OCR accuracy

asked 2019-11-21 10:18:52 -0600

Hi ,

I have lots of images in my dataset which looks like image description (OCR libraries like tesseract is not able to OCR these images and printing gibberish)

applying some processing on above image I am able to improve it to something like image description for which I am getting output through tesseract as 27837 "

I have used the cv2.Thresh_Binary function with threshold value as 80 to make first image look like 2nd.( Median Blur ,Gaussian Blur, OTSU_binarisation , Morphological trasformations etc didn't worked in general for all images like these as character size is small and they make the images fizzy which again tesseract is not able to OCR ) transformed_img = cv2.threshold(input_img, 80,255, cv2.THRESH_BINARY)[1]

Can someone suggest a better method to treat these kind of images so that it can inturn improve the accuracy of OCR.

Thanks

edit retag flag offensive close merge delete

Comments

You can threshold to separate them then create you own OCR version by training those images.

(Or retrain tesseract).

Ziri ( 2019-11-21 19:06:19 -0600 )edit

add a comment

Treating broken characters in image to improve OCR accuracy

Comments

1 answer

Links

Question Tools

Stats

Related questions

Treating broken characters in image to improve OCR accuracy edit

Comments

1 answer

Links

Question Tools

Stats

Related questions

Treating broken characters in image to improve OCR accuracy