Number plate recognition using tessract
I am planning to perform ocr on Indian number plates.I used tessract 4.0 beta which uses LSTM engine for ocr. Although recognized characters are not coming out to be correct. I used cv2.Laplacian() while picking up images without blur and performed noise reduction using cv2.fastNlMeansDenoisingColored() on the image. For preprocessing I performed following steps
1)Sharpened image
2)performed otsu thresholding
3)Eliminated smaller noises/contours
4)Then inverted image
5)OCR using tesseract 4.0 --oem 1(Which uses lstm as detectio module )
Test images look like these.
Detected output: OL 1CT 5079 (Which seems ok)
Can you please suggest any other preprocessing required to improve image(Reduce noise)?
Also is there a way to restrict special characters in tesseract?(Could not find it in latest version)
(ps. I am using python)
Thanks in advance
You can perform dilation/erosion operation to highlight text. If you mean the Tesseract library you can put the special characters in a "black list" to avoid detecting them (I've used the JS version of Tesseract, I don't know what language are you using)
Did you take a look of openalpr. They use Tesseract as OCR backend.