Opencv detect ellipse around text

asked 2016-09-21 06:04:20 -0600

TheMarcus gravatar image

updated 2016-09-21 06:08:43 -0600

I'm working on an OCR software for table detection using the Java version of Opencv. I'm able to detect almost all text borders of the images but i've problems with "circled" words/numbers.

For text detection I do the following:

Starting image

I detect horizontal and vertical lines from the table using morphological operations (from this answer). Detected lines are removed from the original Mat image performing a subtraction (not sure if is the best approach to do this):

    Imgproc.cvtColor(src, gray, Imgproc.COLOR_BGR2GRAY);
    // apply adaptive threshold at the bitwise_not of gray
    Mat bw = new Mat();
    Core.bitwise_not(gray, gray);
    Imgproc.threshold(gray, bw, 0.0, 255.0, Imgproc.THRESH_BINARY | Imgproc.THRESH_OTSU);

    // create the images that will use to extrat the horizontal and vertical lines
    Mat horizontal = bw.clone();
    Mat vertical = bw.clone();

    // Specify size on horizontal axis
    int horizontalSize = horizontal.cols() / 20;
    Mat horizontalStructure = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(horizontalSize, 1));

    // apply morphology operations
    Imgproc.erode(horizontal, horizontal, horizontalStructure, new Point(-1, -1), 1);
    Imgproc.dilate(horizontal, horizontal, horizontalStructure, new Point(-1, -1), 1);
    // Imgproc.blur(horizontal, horizontal, new Size(3,3));
    Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_DILATE, new Size(3, 3));
    Imgproc.dilate(horizontal, horizontal, kernel, new Point(-1, -1), 1);

    // Specify size on vertical axis
    int verticalSize = vertical.rows() / 20;
    Mat verticalStructure = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(1, verticalSize));

    // apply morphology operations
    Imgproc.erode(vertical, vertical, verticalStructure, new Point(-1, -1), 1);
    Imgproc.dilate(vertical, vertical, verticalStructure, new Point(-1, -1), 1);
    kernel = Imgproc.getStructuringElement(Imgproc.MORPH_DILATE, new Size(3, 3));
    Imgproc.dilate(vertical, vertical, kernel, new Point(-1, -1), 1);

    //Is the correct way to do this?
    // delete lines from binary image
    Core.subtract(bw, horizontal, bw);
    Core.subtract(bw, vertical, bw);

Then with the findContours method I get words boundaries:

    ArrayList<MatOfPoint> contours = new ArrayList<MatOfPoint>();
    Mat hierarchy = new Mat();
    Imgproc.findContours(bw, contours, hierarchy, Imgproc.RETR_LIST, Imgproc.CHAIN_APPROX_SIMPLE);

The intermediate and final result are shown in the image:

Inv binary image and final result

The problem is that cirled numbers are not recognized and I am not able to detect and remove the cirles around numbers at the bottom of the table. I've tried using _Hough Circles Transform_ and the fitEllipse method with no decent results. Is the best approach trying to remove circles by matching them with an ellipse?

Can anyone suggest an effective procedure to achieve this?

edit retag flag offensive close merge delete