How does this Computer Vision Toy work?

asked 2019-03-08 08:11:32 -0600

joinity gravatar image

Looking through the internet i found an interesting toy called LUKA which claims to read out your books as you lay them in front of him. On further investigation Luka uses an Ingenic T20 Chip and has a Camera that points in the book direction. They almost certainly have to use some form of 2dfeature matching to see which book is in front of the toy. But what really bucks me out is that it reportively works offline with a bunch of picture books! So the hardware has to compute and compare 50 picture pages to the camera sensor in real time with rotation invariance! and all that on a small SoC

Someone has a guess on how they managed to do this?

edit retag flag offensive close merge delete

Comments

1

No idea, sorry, but it does remind me of the book reader that Ray Kurzweil made for Stevie Wonder! That was back in the day.

sjhalayka gravatar imagesjhalayka ( 2019-03-08 08:34:32 -0600 )edit
2

hard to find, btw: https://luka.ling.ai/

Luka can recognise over 6,000 English, 2,000 Spanish and 30,000 Chinese picture books

imho, they scanned a ton of known picture books. once you know which one it is (that's probably the hard part), it's only about finding the most likely page, it does not have to "understand" any text

that it reportively works offline

imho, only finding out, which book it is, requires an online connection. once you've found out, and downloaded (and some graphical page-clues) from the central webserver -- sure it will work offline.

berak gravatar imageberak ( 2019-03-09 04:01:35 -0600 )edit

Thanks for your coments guys! Im still fascinated and curious on how they managed to have a feature matching real time with lets say 100 images on this hardware!

joinity gravatar imagejoinity ( 2019-03-12 04:12:34 -0600 )edit
1

^^ they might use a flann::index or similar, not the usual feature matching.

(think of it, you can use pre-serialized descriptors, and have to extract descriptors only for the image from the camera, then it's just a knn search with L2 or hamming distance)

berak gravatar imageberak ( 2019-03-12 04:57:22 -0600 )edit