How to clear all bordered text and get clear text?
I have a document image which have text inside rectangle, i need to clear read the text with tesseract, but the problem is, as long as border there, tesseract cant read it well.
I use opencv with python, the result i need so in whole document image only show the text without box or border.
how to do it?
-- EDIT
this is my code :
What this code do ?
- find contour of outer box
- crop 10% to inner, to remove outline border
the problem is mid line or devider still there, so i tried to hardcode crop with multiplied the hex.
from __future__ import print_function
import numpy as np
import argparse
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (11, 11), 0)
edged = cv2.Canny(blurred, 30, 150)
(_, cnts, _) = cv2.findContours(edged.copy(),
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print("I Count {} in this image".format(len(cnts)))
coins = image.copy()
for (i, c) in enumerate(cnts) :
(x,y,w,h) = cv2.boundingRect(c)
print("Coin #{}".format(i+1))
de = (w/h) + 1
if i > 2 :
for ii in range(0, de):
print('Block #{}'.format(ii))
garis = (h*10/100)
y1 = y+garis
y2 = y + h - garis
x1 = x + garis
x2 = x + w - garis
somecoin = image[y1:y2, x1:x2]
height, width, channels = somecoin.shape
cv2.imshow("Box", somecoin[0:height, (height*ii):height+(height*ii)])
cv2.waitKey(0)
this is the result and new sample test :
result :
i wish opencv have a better solution for this case
show, what you've tried so far.
i've edit what i've tried