Recognize oval shaped illustrations in scanned book pages

asked 2013-07-13 18:19:46 -0600

LA2
31 ●3

updated 2016-01-21 09:38:59 -0600

6772 ●3 ●48 ●79 https://github.com/stu...

How can I automatically detect, crop and save the oval shaped portraits in scanned book pages like this one? http://runeberg.org/spg/17/0183.html I have 20,000 portraits in 6000 pages in this format to begin with. After that I have more books where the shapes might be slightly different. (This is not face recognition. I just want to find the oval shapes.) This question was also asked on the Wikimedia Commons graphics forum, https://commons.wikimedia.org/wiki/Commons%3aGraphics_village_pump#Pulling_illustrations_from_scanned_book_pages

A different example, still with oval portraits, but with a different page layout and no contours, is here, http://runeberg.org/pgsmal/0141.html

edit retag flag offensive close merge delete

add a comment

2

answered 2013-07-13 20:20:27 -0600

Mathieu Barnachon

4678 ●18 ●53 http://www.math-barnac...

I suppose the oval are almost in the same area (first column or last column of the page). I will restrict the search to these columns only, perform a contour detection with findContours, and try to fit ellipse, with fitEllipse. If the results are poor, you could try the face detection to see if it working on the drawing, and try to fit the ellipse only in the neighborhood of faces.

If none of these solutions are working, you probably have to train a classifier, try a binary SVM, with samples of your ellipses and samples of text words. Or look at the HAAR and LBP cascade classifiers here and here.

edit flag offensive delete link

Comments

1

i like your ideas :-)

albertofernandez ( 2013-07-15 06:04:04 -0600 )edit

In addition to this, I would suggest blurring your images first with a Gaussing blur and then try to fit an ellipse. This will ensure that you don't fit ellipses to shapes inside the image.

StevenPuttemans ( 2013-07-15 06:34:08 -0600 )edit

add a comment

1

answered 2013-07-15 06:01:22 -0600

albertofernandez
3259 ●6 ●39 ●56 https://es.linkedin.co...

updated 2013-07-15 06:02:59 -0600

I'd start with simple/fast things, and if they fail i'll look for a more difficult/powerfull strategy as the other question suggested.

I'll try a simple template matching technique due to the controlled object to recognize. See this example to know how template matching works. The template can be a rectangle with the oval inside with white background (if the background is always white):

image description

It is a fast approach and if it fails, you can try more complicated things.

edit flag offensive delete link

Comments

I tried this and it looks promising. The example has the drawback that it only looks for the best match, even if that match is rather poor (for example, in a page without portraits). I settled for CV_TM_SQDIFF_NORMED and skipped the normalization. Then I tried to pick out the best candidates. However, my medium-grey mask missed some of the portraits (that were dark grey) and instead suggested hits in the text of the page. I think the page image needs preprocessing for this approach to be successful. Perhaps erosion can make the text go away.

LA2 ( 2013-07-16 11:21:00 -0600 )edit

Take a look at "How to handle template matching with multiple occurences":

http://opencv-code.com/quick-tips/how-to-handle-template-matching-with-multiple-occurences/

And you can also have two templates (one for the "normal" cases , and another for some "peculiar" cases with bad illumination , blurred images or dark grey).

As preprocessing step you can blurr your images first with a Gaussing blur in order to deal with the text of the page (as Steven suggested).

albertofernandez ( 2013-07-17 01:26:09 -0600 )edit

add a comment

Recognize oval shaped illustrations in scanned book pages

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Recognize oval shaped illustrations in scanned book pages edit

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Recognize oval shaped illustrations in scanned book pages