Ask Your Question
0

Extract Handwriting and Print from Background

asked 2019-10-31 06:20:21 -0600

Billie gravatar image

updated 2019-10-31 08:52:36 -0600

Hi! I'm not the smartest person when it comes to image processing. I have a quite simple problem I need some help with. I have images with a red frame as background:

image description

The goal is to remove the red background and keep only the information in front, handwriting and printed information. I did come close by only using the red channel of the image and apply a threshold, but it comes with some noise and thin characters. So I need a smarter algorithm which gives me nice, strong characters for OCR without noise.

Best regards, Billie

edit retag flag offensive close merge delete

Comments

where is the image?

jsxyhelu gravatar imagejsxyhelu ( 2019-10-31 08:04:43 -0600 )edit

@Billie , please edit your question, and use the "upload image" button.

(your current link points to a website, not an image)

berak gravatar imageberak ( 2019-10-31 08:10:07 -0600 )edit

I do see the image, on Desktop and Smartphone (Android). It is a BMP, maybe this is a problem? I tried to upload it directly, but this didn't seem to work.

Billie gravatar imageBillie ( 2019-10-31 08:30:38 -0600 )edit

I merely get fifth rows.

supra56 gravatar imagesupra56 ( 2019-10-31 09:25:26 -0600 )edit

Try using color clustering with kmeans.

holger gravatar imageholger ( 2019-10-31 09:32:37 -0600 )edit

3 answers

Sort by ยป oldest newest most voted
1

answered 2019-11-03 15:19:47 -0600

Billie gravatar image

I found a possible solution. First, I use only the red channel of the image, since print and handwriting do have less red:

image description

Now we're back add a Thresholding problem, where I found ADAPTIVE_THRESH_MEAN_C works great:

adaptiveThreshold(srcMat, dst, 255, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY, 21, 32)

Where a bigger blockSize value makes the characters bolder and a bigger c value reduces noise.

image description

edit flag offensive delete link more
0

answered 2019-10-31 10:03:16 -0600

supra56 gravatar image

updated 2019-10-31 20:53:26 -0600

I solved problem. You don't needed threshold. Used cv2.InRange will suit your need.

#!/usr/bin/python37
#OpenCV 4.1.2-pre, THonny IDE
#Raspberry pi 3/4
#Date: 31 October, 2019

import cv2
import numpy as np

## Read
img = cv2.imread('handwriting.jpg')

## convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, (70, 25, 25), (140, 255,255))

## slice the black
imask = mask>0
mask_black = np.zeros_like(img, np.uint8)
mask_black[imask] = img[imask]

## save 
cv2.imwrite('handwriting_1.jpg', mask_black)

Output: handwriting

I can't go further. You cannot get bold. The pay slip is little visible about 25%. Because of that it is black.

#!/usr/bin/python37
#OpenCV 4.1.2-pre, THonny IDE
#Raspberry pi 3/4
#Date: 31 October, 2019

import cv2
import numpy as np

## Read
img = cv2.imread('handwriting.jpg')

## convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, (100, 15, 15), (190, 165,165))

## slice the black
imask = mask>0
mask_black = np.zeros_like(img, np.uint8)
mask_black[imask] = img[imask]

## save 
cv2.imwrite('handwriting_2.jpg', mask_black)

Output: handwriting

edit flag offensive delete link more

Comments

1

Happy Halloween to every one!

supra56 gravatar imagesupra56 ( 2019-10-31 10:07:49 -0600 )edit

Btw, I'm using linux for raspberry pi 3/4. As for pc can change 70 and 140.

supra56 gravatar imagesupra56 ( 2019-10-31 10:10:21 -0600 )edit

Happy Halloween! But I'm sorry, the image is missing the printed parts, e.g. "30+" indicates the payment slip type and is therefore crucial for the OCR software. Also "Michael Tester" is not as bold as I expect it to be.

Billie gravatar imageBillie ( 2019-10-31 16:08:01 -0600 )edit
1

Welcome to the funny world of ocr - even a CNN does not automatically solve this.

holger gravatar imageholger ( 2019-10-31 20:45:40 -0600 )edit
-1

answered 2019-10-31 09:49:44 -0600

mvuori gravatar image

updated 2019-10-31 09:50:42 -0600

For you, red and green channels are just noise.Use blue channel only and treshold it.

Use split() to extract blue channel.

edit flag offensive delete link more

Comments

Since I'm a new user I have to wait for two days, but I found a good solution. I used only the red channel of the image - the handwriting and printed stuff has less red, so it's a good starting point. And then I applied ADAPTIVE_THRESH_MEAN_C with blockSize 21 and c=32. A huge blockSize makes the text bolder and a big c value reduces noise. I will post the results in two days. I have to wait if the OCR software is pleased as well.

Billie gravatar imageBillie ( 2019-10-31 17:08:27 -0600 )edit

If its tesseract - well good luck :-)

holger gravatar imageholger ( 2019-10-31 20:46:58 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-10-31 06:20:21 -0600

Seen: 3,162 times

Last updated: Nov 03 '19