Line segmentation in handwritten text

asked 2016-10-09 14:26:59 -0500

I'm developing a simple script for extracting features of each of the lines of a image that contains handwritten text.

After thresholding the image I add to the Numpy matrix a complete white row and complete black row (first two rows). I want to calculate pairwise the cosine similarity of the white row and each of the the rows of the image matrix, I want to do the same with the black row. I want to use the cosine similarity (black row and white row) as input feature in order to train a Knn with scikit learn.

The code:

img = cv2.imread('test.jpg', 0)
ret2,t = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
white=np.full((t.shape[1]), 255, dtype=np.uint8)
black=np.full((t.shape[1]), 1,   dtype=np.uint8)
tn=np.vstack((white,np.vstack((black,t))))

tn[tn==0]=1 #Set all 0 values to 1
cdist=distance.cdist(tn, tn, 'cosine')

The problem is cdist is all 0. I don't get the expected values. What am I missing?

Are there any other techniques to consider when splitting an image with text into lines?

Thank you.

edit retag flag offensive close merge delete