1 | initial version |
Well this a little complex problem and is an active area for research. For the fact I am working on the same problem. Here is the procedure I have adopted
Now that you have your image loaded
2 a) Extract pixels representing individual letters of text
Approach 1: SWT (Stroke Width Transform) REF SWT needs to know if your text is Light-on-dark background or Dark-on-light background, if you do not know this then you have to make two passes of this algorithm.
Approach 2: Flood fill your image repetitively until all pixel are covered. For every seed pixel mark the pixel that got included in the floodfill and do not run floodfill for these pixel again.
For each seed pixel collected the relevant pixel and call the collection a component. Filter out non-relevant components using any appproach similar to the case of SWT.
Group the components to make text words, then to text lines and finally into text blocks.
Pass the text block to tesseract.
Current implementation of floodfill in opencv check here returns only the number of pixel marked for a seed pixel and not the pixel location. Hence to get pixel location in order form components some modification needs to be made. The marked pixel location can be collected in a Boost::unordered_map for better performance.
Also this is a github project that extracts text components from images using SWT approach.