How to correctly extract letters after cleaning the background of the image with opencv and python?

asked 2018-11-20 08:27:09 -0500

I'm trying to extract the letters separately from an opencv image but I'm having difficulty in some cases. Sometimes it happens that he picks up the same letter and divides it in the middle. In some cases like the letter "i", it cannot pick up the point and considers it as another character. Below I have 3 examples of the input image, after applying the erosion function and after searching the contour to extract the letters.

Image Example

And my code:

import cv2
import numpy as np
import imutils

img = cv2.imread('captchas/image.jpg')

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU|cv2.THRESH_BINARY_INV)[1]

kernel = np.ones((5,4), np.uint8)

img_erode = cv2.erode(thresh, kernel, iterations = 1)

contours = cv2.findContours(img_erode.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

contours = contours[0] if imutils.is_cv2() else contours[1]

letter_image_regions = []

output = img_erode.copy()

for contour in contours:

    (x, y, w, h) = cv2.boundingRect(contour)

    if cv2.contourArea(contour) > 200:
        if w / h > 0.75:

            half_width = int(w / 2)
            cv2.rectangle(output, (x, y), (x + half_width, y + h), (70,0,70), 3)
            cv2.rectangle(output, (x, y), (x + w, y + h), (70,0,70), 3)

            cv2.rectangle(output, (x, y), (x + w, y + h), (70,0,70), 3)

cv2.imshow("Input", img)
cv2.imshow("Erode", img_erode)
cv2.imshow("Output", image)
edit retag flag offensive close merge delete