Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Recognizing and reading text from an image using tesseract and opencv in c++?

I am trying to recognize text from an image and I am using Tesseract and opencv for this. The code I have used for this is below.

#include "stdafx.h"
#include <string>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main(int argc, char* argv[])
{
    string outText;
    string imPath = "Images/newspaper2.jpeg";

    // Create Tesseract object
    tesseract::TessBaseAPI *ocr = new tesseract::TessBaseAPI();

    // Initialize tesseract to use English (eng) and the LSTM OCR engine. 
    ocr->Init("tessdata", "eng", tesseract::OEM_LSTM_ONLY);

    // Set Page segmentation mode to PSM_AUTO (3)
    ocr->SetPageSegMode(tesseract::PSM_AUTO);

    // Open input image using OpenCV
    Mat im = cv::imread(imPath, IMREAD_COLOR);

    // Set image data
    ocr->SetImage(im.data, im.cols, im.rows, 3, im.step);

    // Run Tesseract OCR on image
    outText = string(ocr->GetUTF8Text());

    // print recognized text
    cout << outText << endl; // Destroy used object and release memory ocr->End();

    return EXIT_SUCCESS;
}

But I am facing an error while I am executing this. Error is namespace "tesseract" has no member "oem_lstm_only". It is having problems in initializing tesseract to use English (eng) and the LSTM OCR engine. But I am using tesseract version 4.0 only so LSTM OCR engine should work fine. Can anyone please tell me why I am getting this error and how can I solve this.

Recognizing and reading text from an image using tesseract and opencv in c++?

I am trying to recognize text from an image and I am using Tesseract and opencv for this. The code I have used for this is below.

#include "stdafx.h"
#include <string>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main(int argc, char* argv[])
{
    string outText;
    string imPath = "Images/newspaper2.jpeg";

    // Create Tesseract object
    tesseract::TessBaseAPI *ocr = new tesseract::TessBaseAPI();

    // Initialize tesseract to use English (eng) and the LSTM OCR engine. 
    ocr->Init("tessdata", "eng", tesseract::OEM_LSTM_ONLY);

    // Set Page segmentation mode to PSM_AUTO (3)
    ocr->SetPageSegMode(tesseract::PSM_AUTO);

    // Open input image using OpenCV
    Mat im = cv::imread(imPath, IMREAD_COLOR);

    // Set image data
    ocr->SetImage(im.data, im.cols, im.rows, 3, im.step);

    // Run Tesseract OCR on image
    outText = string(ocr->GetUTF8Text());

    // print recognized text
    cout << outText << endl; // Destroy used object and release memory ocr->End();

    return EXIT_SUCCESS;
}

But I am facing an error while I am executing this. Error is namespace "tesseract" has no member "oem_lstm_only". It is having problems in initializing tesseract to use English (eng) and the LSTM OCR engine. But I am using tesseract version 4.03.2 only so LSTM OCR engine should work fine. Can anyone please tell me why I am getting this error and how can I solve this. this.