Ask Your Question

Revision history [back]

Group glyphs images from a book page scan

Hello!

I am a type designer working on a type revival project. I have a bunch of high quality scans from a book printed with movable type. My goal is to group all the image of the same glyph in a dedicated subFolder.

Using PIL I can easily split the original scan in single glyph images as these:

o n image description

I started to design a script for the grouping task using as source these tutorials (1, 2).

Here you can find my draft.

The script partially works: I got some matches and I managed to group together a decent number of glyphs. But I am wondering if I can improve the results.

A significant number of glyphs doesn't get any match, and some of the matches were just wrong. Given that the complexity of these images is very low, there is maybe room for improvement.

A few questions:

  • I started with the SURF detection algorithm, is it a good choice? There is an algorithm more indicated for this kind of images?
  • the two variables I am using to declare the match as true are:

    • maximum distance inferior to .58
    • the number of matches superior than 60

should I take something else in consideration?

Any comment or suggestion would be really appreciated.

All the best

Group glyphs images from a book page scan

Hello!

I am a type designer working on a type revival project. I have a bunch of high quality scans from a book printed with movable type. My goal is to group all the image of the same glyph in a dedicated subFolder.

Using PIL I can easily split the original scan in single glyph images as these:

o n image description

I started to design a script for the grouping task using as source these tutorials (1, 2).

Here you can find my draft.

The script partially works: I got some matches and I managed to group together a decent number of glyphs. But I am wondering if I can improve the results.

A significant number of glyphs doesn't get any match, and some of the matches were just wrong. Given that the complexity of these images is very low, there is maybe room for improvement.

A few questions:

  • I started with the SURF detection algorithm, is it a good choice? There is an algorithm more indicated for this kind of images?
  • the two variables I am using to declare the match as true are:

    • maximum distance inferior to .58
    • the number of matches superior than 60

should I take something else in consideration?

Any comment or suggestion would be really appreciated.

All the best

EDIT

–––––––

Extra images. They all come from a group collected by the script.

The right ones (lowercase 'o', 29/35):

image description image description image description image description

The wrong ones (6/35):

image description image description image description image description