How to extract the text from picture using tesseract library

asked 2018-10-11 13:50:59 -0600

msaidm gravatar image

updated 2018-10-12 09:01:50 -0600

berak gravatar image

I was trying to extract the text from this picture using tesseract library but it does not seem to work put here is the code I wrote, I tried to remove the noise from the picture and got the thresholded image and then I used slicing to get a smaller text of the text only without any noise but it does not seem to work
when i tried cropping the text from the thresholded image my self it worked but I want it to be done in the code, I also tried to make a mask by separating any black color from the rest of the image but the output was wrong it printed the word 'Wits' Here is the image I am working on and the mask that I tried to use earlier Can anyone help?

img = cv2.imread("7.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

kernel = np.ones((1, 1), np.uint8)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# cv2.imshow("thresh",img)
R= img.shape[0]
C= img.shape[1]
text = img[280:350,135:220]
image = np.ones((R,C))
image[280:350,135:220] = text
cv2.imshow("image",image)
cv2.imwrite("image.png",image)
result = pytesseract.image_to_string(image)
print (result)


cv2.waitKey(0)
cv2.destroyAllWindows()

Here I tried to take the region of the text only in another white image and tried to read the text but it also doesn't work, when I took a screenshot of the output image and tried manually it worked so I don't know where is the problem.

image description image description

edit retag flag offensive close merge delete

Comments

we can't help much with tesseract problems, i'm afraid.

berak gravatar imageberak ( 2018-10-12 09:09:50 -0600 )edit