Ask Your Question
0

Tesseract accuracy

asked 2019-04-15 07:29:03 -0600

Bleach gravatar image

updated 2019-04-15 07:59:06 -0600

I try use Tesseract from Opencv, but the recognition accuracy is terrible compared to the direct use Tesseract.

Test code:

cv::Ptr<text::OCRTesseract> ocr = text::OCRTesseract::create(ssTessdata.c_str(), ssLang.c_str());
ocr->run(CurImg, ssRecText);

tesseract::TessBaseAPI *ocr1 = new tesseract::TessBaseAPI();
if (ocr1->Init(ssTessdata.c_str(), ssLang.c_str()) == -1) {//catch err}
ocr1->SetImage(CurImg.data, CurImg.cols, CurImg.rows, CurImg.channels(), static_cast<int>(CurImg.step));
ssRecText.append("\n\nTesseract direct:\n");
ssRecText.append(string(ocr1->GetUTF8Text()));

Sample output:

ContentsPropyleneGlycolVegetabie
GlycerolWaterItalian FlavoursNotsuitable
forpregnantorbreastfeedingwoman
NottobesoldtominorsKeepinadark
coolplaceKeepawayfrom children

Tesseract direct:
Contents: Propylene Glycol, Vegetabie
Glycerol, Water, Italian Flavours. Not suitable
for pregnant or breastfeeding woman.
Not to be sold to minors. Keep in a dark
cool place. Keep away from children.

Sample image: image description

Tesseract, Leptonica and Opencv was build from source.

Has anyone encountered such problem? Any ideas how to fix it?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2020-08-04 15:59:39 -0600

@Bleach It depends on what text manipulations you are performing on the image data. Somehow I think the letters are closer together in your input to Tesseract after opening/manipulating the image data with OpenCV. I don't know if this has an impact, but reading the image data using OpenCV strips all the metadata, which includes DPI. The lack of DPI metadata might be affecting how Tesseract visualizes the image data when parsing it for text. Could potentially be the cause of the letters being even closer to together.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2019-04-15 07:29:03 -0600

Seen: 621 times

Last updated: Apr 15 '19