Filling gaps in letters using cv2

asked 2018-12-30 04:41:40 -0500

mayur gravatar image

updated 2018-12-30 04:52:36 -0500

I have an image file with text which I want to extract using OCR. But it has a diagonal overlapping line of text over it (top right), like this. I remove this line using,

  image = cv2.imread(image_path)

  image = cv2.resize(image, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)

  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

  image = cv2.GaussianBlur(image, (5, 5), 0)

  image = cv2.threshold(image, 100, 255, cv2.THRESH_BINARY)[1] # 100 here as the diagonal line is grey

This results in an image like, this.

Notice the thick characters for shear stress, it is one of the regions where the diagonal line overlapped. Now I apply OCR. However, the previous steps remove some pixels. For instance, the e in edge dislocation is not complete.

This results in poor results like, "edve dislocation". I tried erosion and dilation but with no significant improvement.

Is there any way to fill up the holes in characters?

Is there any way to reduce thickness of the characters which overlap with the diagonal line?

edit retag flag offensive close merge delete