Can openCV be used to extract character strings from images?

asked 2014-02-11 17:49:47 -0600

sherrellbc
1 ●1 ●1 ●2

updated 2014-02-12 02:15:16 -0600

berak
32993 ●7 ●81 ●312

I have an idea of sorting through IC chips based on their types for a design project. Of course, the information about each device is printed on the top in a string of characters:

enter image description here

The idea is that we could take a picture of this device, and process the image and extract the string "74AC139PC." The basis of the project would be taking a bin of random DIP chips and sorting through them to find the ones associated with a string input to the program by a user.

How difficult would it be to extract such information from an image? The process is simplified because most chips have a nice white/gold text overlaid onto a black background. Further, the text is usually formatted just like the image above, so no fancy text is used.

Any suggestions on where to start?

edit retag flag offensive close merge delete

add a comment

2 answers

Sort by » oldest newest most voted

answered 2014-02-12 03:16:54 -0600

dervish
21 ●2

updated 2014-02-12 03:17:36 -0600

The easiest path is to integrate tesseract library in your Opencv project, then use its API to recognize your characters. It's so possible to retrain your charset, if you suppose that your letters are special.

to install tesseract : see this tuto

this is an example of one of my projects:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <iostream>
#include <string.h>
#include <vector>
#include <opencv2/highgui/highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/nonfree/features2d.hpp"

int main( int argc, char** argv )
{

  cout << "OCR: starts" << endl;
  Mat scene_plate = imread("plate.jpg", CV_LOAD_IMAGE_COLOR );
  read_image(scene_plate.cols, scene_plate.rows, (char*)scene_plate.ptr());
}


 void read_image(int width, int height, char *image)

 {
    cv::Mat Image(height, width, CV_8UC3, image);
    // you may need to define the area of interest, where the test is found

// initializing Tesseract API
 char *outText;
tesseract::TessBaseAPI *tess_api = new tesseract::TessBaseAPI();
if (tess_api->Init(NULL, "eng"))  // eng is a flag of which trained language you use, if you just train your own language, you gave "XYZ" as a falge, you have to use it here
   {
    cout<<"Could not initialize tesseract.\n";
    exit(1);
}
tess_api->SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ012345789.-");
tess_api->SetImage((uchar*)plate_image.data, plate_image.size().width,    plate_image.size().height, plate_image.channels(), plate_image.step1());
tess_api->Recognize(0);
char* out =tess_api->GetUTF8Text();

double confidence =ocr_plate.confidence = tess_api->MeanTextConf();

    cout<<"OCR output:"<< out<< "  with confidence "<<confidence<<endl;

 }