1 | initial version |
The problem you are looking for called "Page layout detection and character segmentation" , the generic steps go as follow:
1-Detect page zones such as , Text Headers , Text paragraph , Graphics and pictures , tables , .... 2-For Text zones (Header , table cell , paragraph) do the following.
In your case you only have one paragraph -you can split paragraph by using horizontal histogram and cut line on local minimum , or you can use contours by adding regions which share vertically some height threshold into one line.
finally if you need ready made solution , Tesseract can do all previous tasks plus the recognition
2 | No.2 Revision |
The problem you are looking for called "Page layout detection and character segmentation" , the generic steps go as follow:
1-Detect
In your case you only have one paragraph -you can split paragraph by using horizontal histogram and cut line on local minimum , or you can use contours by adding regions which share vertically some height threshold into one line.
finally if you need ready made solution , Tesseract can do all previous tasks plus the recognition