Should I try to OCR shorthand

asked 2018-07-07 17:15:15 -0600

image description

This is not common Gregg. It's a shorthand I created for machine readability. I've never coded any OCR solution, so there's a big initial cost to trying. The problem, as you can see, is that each shape is a word. Each inflection point is a new letter, somewhat as in cursive.

Two strategies seem possible. Either I'd need to train the meaning of each curve and split each image into parts, or I'd need to train every possible word. I'm game for either one, if either is reasonable.

Should I try this?

edit retag flag offensive close merge delete

Comments

this is a research problem (and it should be your research, not ours), not an opencv api one.

berak ( 2018-07-07 18:16:27 -0600 )edit

I concur and thank you for your comment. My hope is that someone will tell me whether I'm chasing a purple squirrel. Does this sound like a 20 hour problem or a 2000 hour problem? I'm too ignorant to know whether it's reasonable and am hoping for someone with experience to give me a quick guess. I'm not an OCR researcher. I'm a shorthand user who happens to write code for a living.

Kevin Knox ( 2018-07-07 18:28:38 -0600 )edit

don't take my word on numbers, but i'd say: more a 2000 hours problem, than 20.

no, there is no existing OCR solution for this (afaik)

berak ( 2018-07-07 18:33:38 -0600 )edit

Thank you!

Kevin Knox ( 2018-07-07 18:39:11 -0600 )edit

That’s prettty fascinating. You’d think that the OCR is language agnostic?

sjhalayka ( 2018-07-07 22:37:06 -0600 )edit

P.S. I guess you’d not be making a character recognition program, but a word recognition program.

sjhalayka ( 2018-07-08 12:51:05 -0600 )edit

@kevin , do you have more data ?

berak ( 2018-07-09 05:24:12 -0600 )edit

Well, I have the ability to create a lot more computer-generated data. I can show one outline for each word. I don't have a lot of hand-written data yet, though. Think hundreds of words written at most.

Kevin Knox ( 2018-07-09 07:16:00 -0600 )edit

so, above images are machine-generated ? (from single-letter curves, i assume ?)

berak ( 2018-07-09 07:22:57 -0600 )edit

Yes. The A sound is created by blending the little hump and the flat line afterward. A complexity is that the vowels separate leading from trailing marks. The downward swoop for the P sound is re-used for the G sound after the vowels. There's a lot of that kind of re-use, but each curve inflection has a single meaning for consonants and each two inflections has meaning for vowels.

Kevin Knox ( 2018-07-09 10:43:30 -0600 )edit

add a comment

Should I try to OCR shorthand

Comments

Links

Question Tools

Stats

Related questions

Should I try to OCR shorthand edit

Comments

Links

Question Tools

Stats

Related questions

Should I try to OCR shorthand