How can I split these characters for character recognition?

asked 2016-01-19 21:08:40 -0500

cv_new gravatar image

updated 2016-01-20 04:24:59 -0500

I want to split the following characters for character recognition. How can I achieve my goal?

image description image description

Should I use color clustering or any other methods, since there are four different or similar colors among characters? But I don't know how to do that.

The following code is to use kmean method to split those characters.

Mat src= imread(name);
cv::Mat reshaped_image = src.reshape(1, src.cols * src.rows);
Mat reshaped_image32f;
reshaped_image.convertTo(reshaped_image32f, CV_32FC1, 1.0 / 255.0);

cv::Mat labels;
int cluster_number = 5;
cv::TermCriteria criteria{ cv::TermCriteria::COUNT, 100, 1 };
cv::Mat centers;
cv::kmeans(reshaped_image32f, cluster_number, labels, criteria, 1, cv::KMEANS_RANDOM_CENTERS, centers);
Mat new_image;

int* clusters_p = (int*)labels.data;

Mat label(src.size(), CV_32SC1);
int* label_p = (int*)label.data;
unsigned long int size = src.cols * src.rows;

for (int i = 0; i < size; i++)
{
    *label_p = *clusters_p;
    label_p++;
    clusters_p++;
}

double minH, maxH;
minMaxLoc(labels, &minH, &maxH);

cout << "minH = "  << minH<<endl;
cout << "maxH = " << maxH << endl;

Mat outImg(src.size(), CV_8UC3);
Vec3b colorPix[5] = { { 221, 37, 49 }, { 242, 130, 54 }, { 241, 234, 84 }, { 182, 228, 33 }, { 0, 164, 228 } };

for (int x = 0; x < label.cols; x++)
{
    for (int y = 0; y < label.rows; y++)
    {
        for (int p = 0; p < 5; p++)
        {
            if (label.at<int>(y, x) == p)
            {
                outImg.at<Vec3b>(y, x) = colorPix[p];
                break;
            }
        }           
    }       
}

return outImg;

My resulting images are different in colors. I don't know whether I have done something wrong...

image description image description

edit retag flag offensive close merge delete

Comments

Good luck with that, besides splitting them, which is difficult due to merging letters, you will need to train a new OCR module for that letterfont, because this is not a standard typefont.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-01-20 03:03:59 -0500 )edit

I tried to use kmean method to differentiate them, but the result is not quite good...

cv_new gravatar imagecv_new ( 2016-01-20 03:39:14 -0500 )edit

Nope because on what data will you tell the kmeans to split ... the chars have all black and white pixel constellations.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-01-20 03:57:59 -0500 )edit

In fact, there are subtle differences between these characters in colors.

cv_new gravatar imagecv_new ( 2016-01-20 04:22:54 -0500 )edit

So, every character has a different colour every time?

Pedro Batista gravatar imagePedro Batista ( 2016-01-20 04:25:36 -0500 )edit

Yes, they will be different in colors.

cv_new gravatar imagecv_new ( 2016-01-20 19:40:40 -0500 )edit