# Using HoughLinesP to straighten skewed receipt scans

I am trying to use HoughLinesP with Python to identify lines from text to eventually straighten up a bunch of images. The images are scanned receipts, fairly high DPI, so I scale them down and pre-process them with noise removal and threshold so they look like this:

I was hoping the majority of the lines found by HoughLines would align with the text flow direction (the text's horizontal axis). I played around with the different parameters of the HoughLinesP method in Python but cannot find a good way to accurately do this. For some reason most of the lines that are found are along the texts' vertical axis instead, which to me seems odd since the lines are definitely longer and more precise along the texts' horizontal axis. Here is an example (HoughLines drawn in thin grey lines) with the following input values:

minLineLength = 45 (one sixth of image width)
maxLineGap = 5
pixelRes = 1
rotationRes = pi/180
threshold = 200


Typically the receipts are off by +/- 10-45 degrees or so, so the text flow is almost always closer to horizontal than vertical. Not sure what I'm missing here, is there any way to tweak the HoughLinesP method to better identify the general flow of the text in this type of image?

edit retag close merge delete

Can you post original RGB image, may be you could directly find the rectangle and can align easily.

( 2017-07-26 06:23:09 -0500 )edit

Sort by » oldest newest most voted

I would not use Hough Lines for this purpose. But rather would follow of these two strategies:

Strategy 1:

1. Blur your image a lot, so that you lose those black spaces between letters.
2. Once your letters become "white united pixels", go for contour detection and capture them in a bounding box.
3. Once you have your blob in a box, you can then play with it as much as you want. This may guide you through.

Strategy 2:

1. Use a feature detector, which may work with this, like in the given example.
2. Once you detect the bill and put a bounding box around it, again, it is a matter of geometry to figure out how much it is tilted.
more

Official site

GitHub

Wiki

Documentation