How to remove small black spots & make digits more clear, complete & sharp in image?
PS: Numbers are Enrollment IDs that are the combination of course + class + other ids etc
Im trying to write a program to remove a logo from image, clean it before sending it to Ocr program. Here is the input image:
Im totally new to code in Opencv & C++, I googled and merged below code from various sources so far, It reads a b/w image and removes a mark from it.
im = imread(fpath + ".jpg", IMREAD_GRAYSCALE);
// 1. make a copy & approximate the background
bg = im.clone();
// get the structure & apply morphology
kernel2 = getStructuringElement(MORPH_RECT, Size(2 * 5 + 1, 2 * 5 + 1));
morphologyEx(im, bg, CV_MOP_CLOSE, kernel2);
// threshold the difference image
threshold(dif, bw, 0, 255, CV_THRESH_BINARY_INV | CV_THRESH_OTSU);
// threshold the background image so we get dark region
threshold(bg, dark, 0, 255, CV_THRESH_BINARY_INV | CV_THRESH_OTSU);
// extract pixels in the dark region
vector<unsigned char>darkpix(countNonZero(dark));
int index = 0;
for (int r = 0; r < dark.rows; r++)
{
for (int c = 0; c < dark.cols; c++)
{
if (dark.at<unsigned char>(r, c))
{
darkpix[index++] = im.at<unsigned char>(r, c);
}
}
}
// threshold the dark region so we get the darker pixels inside it
threshold(darkpix, darkpix, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
// paste the extracted darker pixels
index = 0;
for (int r = 0; r < dark.rows; r++)
{
for (int c = 0; c < dark.cols; c++)
{
if (dark.at<unsigned char>(r, c))
{
bw.at<unsigned char>(r, c) = darkpix[index++];
}
}
}
// Clean image to make more readable and clear
adaptiveThreshold(bw, dst, 75, CV_ADAPTIVE_THRESH_MEAN_C, CV_THRESH_BINARY, 3, -15);
image_out = bw - dst;
imshow("Final", image_out);
With the above output image, now Im badly stuck with 2 open queries:
- Remove small black spots (black color) & noise
- And make numbers, for example 0, 2, 6, 8, 9 etc & text NA, more complete, sharp & readable
Please advise & suggest, thanks a lot...
Im very new to Opencv but somehow Im able to load and display image, please refer this image for more clarity on what I mean:
Going ahead with the help of getStructuringElement, morphologyEx & threshold functions, I could try little image processing.
The last sample code I tried before posting this query was boundingRect and findContours & below is what i got in result, it looks like some rectangles are including spots & noise AND I dont know how to move ahead from here...
Im badly stuck on below issues: 1. Clean black spots, noise and anything else except ID numbers 2. Make numbers like 0, 2, 6, 8, 9 more thick, complete and sharp
Kindly help out with some code sample, link or article which can help me.
Thanks in advance...
I merged your two questions into one.
The last image looks promising. How about removing small spots based on their contour area, axis length and/or axis ratio? Can you increase resolution of the input image?
Hi Witek, thanks for merging and now post has more info.... I confess, being new to opencv its hard for me to understand either code-samples or try any code myself but im trying to sort it out...
To answer yr query, increasing resolution of input image will dither the digits more but in what way inout resolution will help when it is good enough to create this output image?
Somehow I just want to (1). clear these spots and (2). make digits more clear and complete.
Higher resolution might make the digits easier to separate - they will not blend together so much. Also it should improve thresholded results as transitions between black and white will be smoother. This should improve the shape of the digits,which could be important for OCR at the later stage. I played a little with your problem, and I must admit, I was not able to get a clear image. It is not going to be easy, especially if the big gray watermark is going to be different (brighter or darker) in different images. Perhaps using simple template matching would be easier? That would also solve the OCR problem.
Making a image patch template will be another added task here and that too with so many numbers :) Right now watermark is not a problem because it is removed but yes left few spots and took out edges of few numbers.... Is it not possible to play with contours in some more ways?
Since you enclosed the numbers in rectangles, you already removed the small spots, I guess? Increasing the resolution should help you achieve point 2.
I only used rects just to help others understand & visualize the problem, yes i agree with yr 2nd suggestion, zooming the image a bit does help in clarity BUT finding and filling the edges still to be solved + these spots also
You can remove the spots by their properties, that is find contours and their bounding rectangle and remove all these contours that are singular - ones that do not have a neighbor of similar size in close vicinity. This not likely to work in 100% cases as there always might be a spot that is similar to a number in terms of bounding box size and position AND some numbers might be broken into two small parts that could be treated as spots.
In my opinion it will be extremely difficult to design an algorithm based on thresholding and morphology that will provide 100% accurate results. There is a very thin line between filling edges and removing spots - these are opposite requirements - on one head you want to enhance small, thin lines of numbers and on the other you want to remove small spots that are noise. I don't think it is possible to separate these two classes as they overlap sometimes. I will repeat my suggestion of using template matching that might solve all your problems. It is quite easy and quick to try. Or perhaps use a deep network like Yolo - this will require much more effort to deploy, but might work better and faster.
I will explore Yolo, thanks.