Ask Your Question
0

Detecting blocks of text

asked 2016-02-14 08:49:16 -0600

joed gravatar image

I have an image which has columns of text broken up in sections as shown below source image

I would like to get the blocks as shown in the red rectangles in the image below image blocks

I have tried the following - threshold - erode - dilate - find contours But does not give me the results.

Is there another way to do this? I just need to identify the blocks as shown, not extract the text.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2016-02-14 11:36:26 -0600

updated 2016-02-14 11:43:13 -0600

a) First, you split the image vertically in the middle and process both sides individually. You then use cv::Houghlines (http://docs.opencv.org/2.4/doc/tutori...) to find the horizontal lines that separate the segments. cv::Houghlines will give you a lot of lines, but you can easily filter them by angle and length.

b) (again, split vertically first). Binarize the image, e.g. with cv::adaptiveThreshold. Divide the image in horizontal strips with a height of 2-3 pixels. Then count the number of columns in this strip in which there is at least one black pixel. The strips that contain the horizontal lines should have a significant higher number of columns with a black pixel than the other strips. However, finding good parameters for the thresholding could be difficult.

c) manually create a template image for the horizontal line and use cv::MatchTemplate to find all instances.

edit flag offensive delete link more

Comments

1

I am looking for a more general solution where the number of columns could be different and the blocks also have variable number of columns. However, the horizontal line below the block is always going to be there to indicate the block.

Here is a thought, but not sure how to code it

  1. If the vertical "white lanes" between the columns are drawn
  2. the horizontal line is drawn to intersect the white lanes
  3. the outermost bounding box of all the text intersects the "white lane" lines and the horizontal lines

this should end up with some kind of rectangles resembling the blocks.

joed gravatar imagejoed ( 2016-02-14 12:13:22 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2016-02-14 08:49:16 -0600

Seen: 3,301 times

Last updated: Feb 14 '16