Group glyphs images from a book page scan
Hello!
I am a type designer working on a type revival project. I have a bunch of high quality scans from a book printed with movable type. My goal is to group all the image of the same glyph in a dedicated subFolder.
Using PIL I can easily split the original scan in single glyph images as these:
I started to design a script for the grouping task using as source these tutorials (1, 2).
Here you can find my draft.
The script partially works: I got some matches and I managed to group together a decent number of glyphs. But I am wondering if I can improve the results.
A significant number of glyphs doesn't get any match, and some of the matches were just wrong. Given that the complexity of these images is very low, there is maybe room for improvement.
A few questions:
- I started with the SURF detection algorithm, is it a good choice? There is an algorithm more indicated for this kind of images?
the two variables I am using to declare the match as true are:
- maximum distance inferior to .58
- the number of matches superior than 60
should I take something else in consideration?
Any comment or suggestion would be really appreciated.
All the best
EDIT
–––––––
Extra images. They all come from a group collected by the script.
The right ones (lowercase 'o', 29/35):
The wrong ones (6/35):
take a look at shape_example.cpp
sorry, but c++ is not my thing. Any python resource to the topic?
please provide two matched and one different sample image.
some images added at the bottom of the question!
again ShapeContextDistanceExtractor class is good (as i pointed out firstly). try to implement it with python or search a sample python code