How to remove small black spots & make digits more clear, complete & sharp in image?

asked 2019-07-21 11:13:42 -0600

Bhavtosh gravatar image

updated 2019-07-24 10:37:26 -0600

Witek gravatar image

PS: Numbers are Enrollment IDs that are the combination of course + class + other ids etc

Im trying to write a program to remove a logo from image, clean it before sending it to Ocr program. Here is the input image:

image description

Im totally new to code in Opencv & C++, I googled and merged below code from various sources so far, It reads a b/w image and removes a mark from it.

im = imread(fpath + ".jpg", IMREAD_GRAYSCALE);
// 1. make a copy & approximate the background
bg = im.clone();
// get the structure & apply morphology
kernel2 = getStructuringElement(MORPH_RECT, Size(2 * 5 + 1, 2 * 5 + 1));    
morphologyEx(im, bg, CV_MOP_CLOSE, kernel2);
// threshold the difference image
threshold(dif, bw, 0, 255, CV_THRESH_BINARY_INV | CV_THRESH_OTSU);
// threshold the background image so we get dark region
threshold(bg, dark, 0, 255, CV_THRESH_BINARY_INV | CV_THRESH_OTSU);
// extract pixels in the dark region
vector<unsigned char>darkpix(countNonZero(dark));
int index = 0;
for (int r = 0; r < dark.rows; r++)
{
    for (int c = 0; c < dark.cols; c++)
    {
        if (dark.at<unsigned char>(r, c))
        {
            darkpix[index++] = im.at<unsigned char>(r, c);
        }
    }
}
// threshold the dark region so we get the darker pixels inside it
threshold(darkpix, darkpix, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU); 
// paste the extracted darker pixels
index = 0;
for (int r = 0; r < dark.rows; r++)
{
    for (int c = 0; c < dark.cols; c++)
    {
        if (dark.at<unsigned char>(r, c))
        {
            bw.at<unsigned char>(r, c) = darkpix[index++];
        }
    }
}
// Clean image to make more readable and clear  
adaptiveThreshold(bw, dst, 75, CV_ADAPTIVE_THRESH_MEAN_C, CV_THRESH_BINARY, 3, -15); 
image_out = bw - dst;
imshow("Final", image_out);

image description

With the above output image, now Im badly stuck with 2 open queries:

  1. Remove small black spots (black color) & noise
  2. And make numbers, for example 0, 2, 6, 8, 9 etc & text NA, more complete, sharp & readable

Please advise & suggest, thanks a lot...

Im very new to Opencv but somehow Im able to load and display image, please refer this image for more clarity on what I mean:

image description

Going ahead with the help of getStructuringElement, morphologyEx & threshold functions, I could try little image processing.

The last sample code I tried before posting this query was boundingRect and findContours & below is what i got in result, it looks like some rectangles are including spots & noise AND I dont know how to move ahead from here... image description

Im badly stuck on below issues: 1. Clean black spots, noise and anything else except ID numbers 2. Make numbers like 0, 2, 6, 8, 9 more thick, complete and sharp

Kindly help out with some code sample, link or article which can help me.

Thanks in advance...

edit retag flag offensive close merge delete

Comments

2

I merged your two questions into one.

The last image looks promising. How about removing small spots based on their contour area, axis length and/or axis ratio? Can you increase resolution of the input image?

Witek gravatar imageWitek ( 2019-07-24 10:40:46 -0600 )edit
1

Hi Witek, thanks for merging and now post has more info.... I confess, being new to opencv its hard for me to understand either code-samples or try any code myself but im trying to sort it out...

To answer yr query, increasing resolution of input image will dither the digits more but in what way inout resolution will help when it is good enough to create this output image?

Somehow I just want to (1). clear these spots and (2). make digits more clear and complete.

Bhavtosh gravatar imageBhavtosh ( 2019-07-25 07:36:26 -0600 )edit
1

Higher resolution might make the digits easier to separate - they will not blend together so much. Also it should improve thresholded results as transitions between black and white will be smoother. This should improve the shape of the digits,which could be important for OCR at the later stage. I played a little with your problem, and I must admit, I was not able to get a clear image. It is not going to be easy, especially if the big gray watermark is going to be different (brighter or darker) in different images. Perhaps using simple template matching would be easier? That would also solve the OCR problem.

Witek gravatar imageWitek ( 2019-07-25 08:55:38 -0600 )edit
1

Making a image patch template will be another added task here and that too with so many numbers :) Right now watermark is not a problem because it is removed but yes left few spots and took out edges of few numbers.... Is it not possible to play with contours in some more ways?

Bhavtosh gravatar imageBhavtosh ( 2019-07-25 10:20:13 -0600 )edit
1

Since you enclosed the numbers in rectangles, you already removed the small spots, I guess? Increasing the resolution should help you achieve point 2.

Witek gravatar imageWitek ( 2019-07-25 15:13:28 -0600 )edit
1

I only used rects just to help others understand & visualize the problem, yes i agree with yr 2nd suggestion, zooming the image a bit does help in clarity BUT finding and filling the edges still to be solved + these spots also

Bhavtosh gravatar imageBhavtosh ( 2019-07-25 22:53:03 -0600 )edit
1

You can remove the spots by their properties, that is find contours and their bounding rectangle and remove all these contours that are singular - ones that do not have a neighbor of similar size in close vicinity. This not likely to work in 100% cases as there always might be a spot that is similar to a number in terms of bounding box size and position AND some numbers might be broken into two small parts that could be treated as spots.

Witek gravatar imageWitek ( 2019-07-26 04:06:15 -0600 )edit
1

In my opinion it will be extremely difficult to design an algorithm based on thresholding and morphology that will provide 100% accurate results. There is a very thin line between filling edges and removing spots - these are opposite requirements - on one head you want to enhance small, thin lines of numbers and on the other you want to remove small spots that are noise. I don't think it is possible to separate these two classes as they overlap sometimes. I will repeat my suggestion of using template matching that might solve all your problems. It is quite easy and quick to try. Or perhaps use a deep network like Yolo - this will require much more effort to deploy, but might work better and faster.

Witek gravatar imageWitek ( 2019-07-26 04:09:16 -0600 )edit

I will explore Yolo, thanks.

Bhavtosh gravatar imageBhavtosh ( 2019-07-26 09:46:24 -0600 )edit