Select only gray pixels in color image

asked 2013-11-26 06:08:27 -0600

To aid my OCR component in recognizinig text I'd like to binary threshold my image first.

I'd like to try a new method of thresholding where I do not only define a threshold value, but als define that the R-G-B component values must be very close to each other.

This way I hope to separate dark-grey text from dark-blue background, where the pixels fall into the same intensity range, but a human could easily distinguish them because of their color.

Example:

RGB(9, 9, 9) => becomes black
RGB(1, 1, 10) => becomes white

Now I can figure out how to iterate over every pixel and do just that. The question is, do you know if this type of thresholding algorithm is already implemented or if there already exists a name for it?

Thank you very much!

edit retag flag offensive close merge delete

add a comment

3 answers

Sort by » oldest newest most voted

answered 2013-11-28 10:57:49 -0600

Linke Seitentasche
108 ●1 ●1 ●10

It can be done and here are my results. This routine is a combination of thresholding and my custom color-component dependant threshold (Java-Code):

Code

// initialization
Mat colorsub = ...
byte[] data = new byte[4];
byte[] black = new byte[] { 0, 0, 0, -1 };
byte[] white = new byte[] { -1, -1, -1, -1 };

// iterate over the image
for (int y = 0; y < colorsub.height(); y++) {
    for (int x = 0; x < colorsub.width(); x++) {

        // extract color component values as unsigned integers (byte is a signed type in java)
        colorsub.get(y, x, data);
        int r = data[0] & 0xff;
        int g = data[1] & 0xff;
        int b = data[2] & 0xff;

         // do simple threshold first
        int thresh = 100;
        if (r > thresh || g > thresh || b > thresh)
        {
            colorsub.put(y, x, white);
            continue;
        }

        // adjust the blue component
        b = (int)(b * 1.3);

        // quantification of color component's values distribution  
        int mean = (r + g + b) / 3;
        int diffr = Math.abs(mean - r);
        int diffg = Math.abs(mean - g);
        int diffb = Math.abs(mean - b);

        int maxdev = 80;

        if ((diffr + diffg + diffb) > maxdev)
            colorsub.put(y, x, white);
        else
        colorsub.put(y, x, black);
    }
}

On my camera I have noticed that in darkgray text the blue channel is too low so I amateurishly increase it a little. This could be improved by a real histogram correction.
Also, the first threshold operation is not really based on pixel intensity, but I think the effect is negligible for this demonstration.

Results

First image is the regular image caught by the classifier, in a green border.
Second image is the grayscale image by OpenCV, for reference
Third image is OpenCVs binary threshold
Fourth image is the output from my custom threshold function above.

On a regular desk. This doesn't look too bad:

image description

On a sofa in low light condition. Here the threshold performs better. I had to correct the values for my custom threshold after this one:

image description

On a stack of paper, reduced artifacts:

image description

On the kitchen bar. Since the metal is gray, wen cannot filter it out, obviously:

image description

I think this is the most interesting image. Much less and smaller segments: image description

Conclusion

As with every thresholding algorithm, fine tuning is paramount. Given a thresholded image with finely tuned parameters, a color-coded threshold can still further improve the picture.

It looks to be useful to remove inlined emblems and pictures from the text, e.g. smileys or colored bullet points.

Maybe the information in this post can be of use for someone else.

edit flag offensive delete link

add a comment

answered 2013-11-26 11:04:00 -0600

albertofernandez
3259 ●6 ●39 ●56 https://es.linkedin.co...

I'm not an expert in image segmentation techniques, but due to noboby has answered it, here are my two cents:

One simple approach is to perform an euclidean distance segmentation between the points. For example: RGB(10, 10, 10) is closer from RGB(9, 9, 9) than RGB(1, 1, 10) But one key point to take into account is the correlation in your variables. gray pixels are correlated because a gray value has all the RGB components very similar. That is why the mahalanobis distance is often used in image segmentation. mahalanobis distance differs from Euclidean distance in that it takes into account the correlations of the data set.

In order to use the Mahalanobis distance to classify a test point as belonging to one of N classes, one first estimates the covariance matrix of each class, usually based on samples known to belong to each class. Then, given a test sample, one computes the Mahalanobis distance to each class, and classifies the test point as belonging to that class for which the Mahalanobis distance is minimal.

This is an example that performs image segmentation using the mahalanobis distance:

mahalanobis distance segmentation

First, you use the mouse to select pixels which act as the "training set" --> build the covariance matrix. Then, the mahalanobis distance is used to segment your images.

The mahalanobis distance is also used in background substraction (discriminate between foreground and background pixels by building and maintaining a model of the background). You could also try to build a model of the background (dark-blue background) and try to segment the foreground (dark-grey text).