Using HoughLinesP to straighten skewed receipt scans
I am trying to use HoughLinesP with Python to identify lines from text to eventually straighten up a bunch of images. The images are scanned receipts, fairly high DPI, so I scale them down and pre-process them with noise removal and threshold so they look like this:
I was hoping the majority of the lines found by HoughLines would align with the text flow direction (the text's horizontal axis). I played around with the different parameters of the HoughLinesP method in Python but cannot find a good way to accurately do this. For some reason most of the lines that are found are along the texts' vertical axis instead, which to me seems odd since the lines are definitely longer and more precise along the texts' horizontal axis. Here is an example (HoughLines drawn in thin grey lines) with the following input values:
minLineLength = 45 (one sixth of image width)
maxLineGap = 5
pixelRes = 1
rotationRes = pi/180
threshold = 200
Typically the receipts are off by +/- 10-45 degrees or so, so the text flow is almost always closer to horizontal than vertical. Not sure what I'm missing here, is there any way to tweak the HoughLinesP method to better identify the general flow of the text in this type of image?
Can you post original RGB image, may be you could directly find the rectangle and can align easily.