Tesseract accuracy

asked 2019-04-15 07:29:03 -0500

Bleach gravatar image

updated 2019-04-15 07:59:06 -0500

I try use Tesseract from Opencv, but the recognition accuracy is terrible compared to the direct use Tesseract.

Test code:

cv::Ptr<text::OCRTesseract> ocr = text::OCRTesseract::create(ssTessdata.c_str(), ssLang.c_str());
ocr->run(CurImg, ssRecText);

tesseract::TessBaseAPI *ocr1 = new tesseract::TessBaseAPI();
if (ocr1->Init(ssTessdata.c_str(), ssLang.c_str()) == -1) {//catch err}
ocr1->SetImage(CurImg.data, CurImg.cols, CurImg.rows, CurImg.channels(), static_cast<int>(CurImg.step));
ssRecText.append("\n\nTesseract direct:\n");
ssRecText.append(string(ocr1->GetUTF8Text()));

Sample output:

ContentsPropyleneGlycolVegetabie
GlycerolWaterItalian FlavoursNotsuitable
forpregnantorbreastfeedingwoman
NottobesoldtominorsKeepinadark
coolplaceKeepawayfrom children

Tesseract direct:
Contents: Propylene Glycol, Vegetabie
Glycerol, Water, Italian Flavours. Not suitable
for pregnant or breastfeeding woman.
Not to be sold to minors. Keep in a dark
cool place. Keep away from children.

Sample image: image description

Tesseract, Leptonica and Opencv was build from source.

Has anyone encountered such problem? Any ideas how to fix it?

edit retag flag offensive close merge delete