Should I try to OCR shorthand
This is not common Gregg. It's a shorthand I created for machine readability. I've never coded any OCR solution, so there's a big initial cost to trying. The problem, as you can see, is that each shape is a word. Each inflection point is a new letter, somewhat as in cursive.
Two strategies seem possible. Either I'd need to train the meaning of each curve and split each image into parts, or I'd need to train every possible word. I'm game for either one, if either is reasonable.
Should I try this?
this is a research problem (and it should be your research, not ours), not an opencv api one.
I concur and thank you for your comment. My hope is that someone will tell me whether I'm chasing a purple squirrel. Does this sound like a 20 hour problem or a 2000 hour problem? I'm too ignorant to know whether it's reasonable and am hoping for someone with experience to give me a quick guess. I'm not an OCR researcher. I'm a shorthand user who happens to write code for a living.
don't take my word on numbers, but i'd say: more a 2000 hours problem, than 20.
no, there is no existing OCR solution for this (afaik)
Thank you!
That’s prettty fascinating. You’d think that the OCR is language agnostic?
P.S. I guess you’d not be making a character recognition program, but a word recognition program.
@kevin , do you have more data ?
Well, I have the ability to create a lot more computer-generated data. I can show one outline for each word. I don't have a lot of hand-written data yet, though. Think hundreds of words written at most.
so, above images are machine-generated ? (from single-letter curves, i assume ?)
Yes. The A sound is created by blending the little hump and the flat line afterward. A complexity is that the vowels separate leading from trailing marks. The downward swoop for the P sound is re-used for the G sound after the vowels. There's a lot of that kind of re-use, but each curve inflection has a single meaning for consonants and each two inflections has meaning for vowels.