Revision history [back]

How to remove text region in a document image?

How to remove text region in a document image? Given an document image (i.e. newspaper), how to extract photos in it or remove text region?

I think traditional OCR methods may not be suitable here, as I don't need to recognize the text, and OCR is not accurate and slow. I believe text region (i.e. text blocks) and image region should be distinguishable by some threshold based methods in image processing. Any suggestions or example codes in OpenCV will be appreciated. Thanks！

BTW, what if the background color is not white, or the background color of certain blocks are not white?

Example image:

image description