general ocr training question [closed]
Hello i need to ask some general question on OCR (Optical character recognition) - maybe you can help me?
- I trained a model which can detect numberplates(using yolo) This is working fine and i am happy with the detection rate
I want now to extract the numbers from the numberplates. I could just train number & digits on top of my existing model or i could train a separate model for this. What would you recommend me to do? Here some thoughts i have right now:
Training on top:
- maye performance gain as i detect everything with one forward
- maybe performance loss as i dont restrict character recognition on the numberplate(will be applied to the whole image)
Training separate model:
- seperation of concern - in general - i like to split different responsebilities.
- maybe performance gain because of small image input size (not the whole image but a portion of it - the numberplate)
- maybe performance loss as i need to evaluate another model
Any comments on this are highly welcome. Greetings, Holger
i don''t quite understand, what you mean with: "training on top". (and where the difference to a "training a seperate model" is)
With training on top i mean - i will include the number & digits in my dataset.In the end i will have one model.
In the other scenario i will have two datasets. I will have two model - one for numberplate / car detection and one for character recognition.
wouldn't you get many false positives (letters not part of a license plate) in the 1st case ?
also, training a model for 2 different purposes at the same time seems a bad idea to me.
Yes i am also concerned about having letter bboxes(and yes - tons of false postive) all over the image. Well during evaluation i would only look in the bboxes of the license plate. I think i will really train two models.
But my biggest concern with that is performance. I know that model evaluation with yolo takes around 20 ms which is good for real time. Double this and you are not realtime anymore.
Anyway, Thank you for your opinion and hint - thats why i was asking :-) Have a nice weekend + Greetings, Holger
imho, you NEED 2 passes.
maybe some "classical" findContours() or connectedComponents() to seperate them, then a classification network (you've already done the detection) on, say, 20x30 cropped letter images
or, if you go for yolo again, remember, that your input is much smaller, than the street image. (cnn's spend most of the time in the 1st few layers, large imes being convoluted). you could also try a cheaper architecture, like tiny-yolo (remember, you only have very few classes: capital letters and numbers).
Thank you for more insights. I think i worry to much about the 2nd evaluation - just keep the input image small(crop the license plate detections) and pick an architecture with few layers should really do the job.
Good to have someone to discuss these things with - thank you very much - i am closing this as anwered. Have a nice day!