Implementation Question: How to create bounding boxes around answers on worksheets

asked 2015-03-01 12:29:03 -0600

updated 2017-08-25 13:52:52 -0600

6772 ●3 ●48 ●79 https://github.com/stu...

I've played around with template matching to try to standardize the submitted assignment versus the template assignment but its not quite getting the job done well and reliably. I've also experimented with keypoint detection but I believe that approach is way overkill for my use case.

Here is an example image of the TEMPLATE:

And a sample image I'm trying to match against:

What would be the best approach in terms of normalizing the sample image (scaling, deskewing), subtracting out the template image, and be left with the remaining text?

edit retag flag offensive close merge delete

Comments

the template matching approach I think that is an overkill, isn't it? You will need to have a template for each different sheet. I think you should go for a more dynamic approach.

theodore ( 2015-03-02 06:43:29 -0600 )edit

I will have access to each newly generated template so that wouldn't be an issue. I've also though about writing a classifier to identify handwritten digits only but it would still give false positives on the typed digits within the answer.

jacobmarley138 ( 2015-03-02 08:43:13 -0600 )edit

How about applying a small keypoint detector on the Homework title, matching that one, then aligning both images based on the matched keypoints and simply perform background subtraction with both images.

StevenPuttemans ( 2015-03-02 09:17:25 -0600 )edit

@jacobmarley138 Steven has a point here, actually what you want to do can be done with many different ways. It is just up to you to decide the way.

theodore ( 2015-03-02 09:45:40 -0600 )edit

add a comment

answered 2015-03-02 09:08:53 -0600

theodore

4133 ●12 ●41 ●101

For the fun of it I did a quick and dirty approach, considering that you will have this "result" separation lines. Have a look below:

#include <iostream>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int main()
{
    Mat src = imread("worksheet.jpg");

    if(!src.data)
    {
        cerr << "Problem loading image!!!" << endl;
        return EXIT_FAILURE;
    }

//        imshow("src", src);

    // resizing for practical reasons
    Mat rsz;
    Size size(800, 1132);
    resize(src, rsz, size);

//        imshow("rsz", rsz);

    Mat gray;
    cvtColor(rsz, gray, CV_BGR2GRAY);

    // Apply adaptiveThreshold at the bitwise_not of gray, notice the ~ symbol
    Mat bw;
    adaptiveThreshold(~gray, bw, 255, CV_ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, 15, -2);

    // Dilate a bit in order to correct possible gaps
    Mat kernel = Mat::ones(2, 2, CV_8UC1);
    dilate(bw, bw, kernel);

    // Show binary image
    imshow("bin", bw);

image description

    // Create the images that will use to extract the horizontal lines
    Mat horizontal = bw.clone();

    // Specify size on horizontal axis
    int horizontalsize = horizontal.cols / 30;

    // Create structure element for extracting horizontal lines through morphology operations
    Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));

    // Apply morphology operations
    erode(horizontal, horizontal, horizontalStructure, Point(-1, -1));
    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1));

    // Show extracted horizontal lines
    imshow("horizontal", horizontal);

image description

    // Find external contour
    vector<Vec4i> hierarchy;
    std::vector<std::vector<cv::Point> > contours;
    cv::findContours(horizontal, contours, hierarchy, CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, Point(0, 0));

    vector<vector<Point> > contours_poly( contours.size() );
    vector<Rect> boundRect( contours.size() );
    for( size_t i = 0; i < contours.size(); i++ )
    {
        approxPolyDP( Mat(contours[i]), contours_poly[i], 3, true );
        boundRect[i] = boundingRect( Mat(contours_poly[i]) );
    }

    // Draw the contour as a solid blob filling also any convexity defect with the extracted hulls
    for (size_t i = 0; i < contours.size(); i++)
    {
//        cout << boundRect[i].tl() << endl;
//        cout << boundRect[i].br() << endl << endl;

//        cout << arcLength(cv::Mat(contours[i]), true) << endl;
        double length = arcLength(cv::Mat(contours[i]), true);
        // skip any noise lines
        if(length < 75)
            continue;

        if(length > 200) // filter long with short lines
        {
            boundRect[i] += Size(0, -40); // expanding rectangle by a certain amount
            boundRect[i] -= Point(0, 3); // shifting rectangle by a certain offset
        }else{
            boundRect[i] += Size(0, 40);
            boundRect[i] -= Point(0, -4);
        }

        drawContours( rsz, contours, i, Scalar(0, 0, 255), 1, 8, vector<Vec4i>(), 0, Point() );
        rectangle( rsz, boundRect[i].tl(), boundRect[i].br(), Scalar(0, 255, 0), 1, 8, 0 );
    }

    imshow("src", rsz);

image description

    /* Now you can do whatever post processing you want
     * with the data within the rectangles. */

    waitKey(0);
    return 0;
}

enjoy. But be aware that might some optimization is needed to fit all your use cases.

edit flag offensive delete link

Comments

Hehe nice one :D

StevenPuttemans ( 2015-03-02 09:25:47 -0600 )edit

indeed it is quite funny :-D, though with some work I think there is some potential. Moreover, it can be combined with the classifier that @jacobmarley138 wants to create.

theodore ( 2015-03-02 09:32:03 -0600 )edit

Thanks! I'm new to the opencv package so many of these functions were unknown to me. I'm going to take a deeper dig on this tonight and contribute my python implementation of what you shared... I'm wondering if 2.4.9 is sufficient or would I need to upgrade to the development 3.0.0 to have all the python bindings of the methods you chose to use?

jacobmarley138 ( 2015-03-02 12:03:52 -0600 )edit

I think you will not have any problem finding the corresponding functions in python both in latest stable 2.4.10 or in 3.0.0beta/development_git version

theodore ( 2015-03-02 13:02:26 -0600 )edit

add a comment

Implementation Question: How to create bounding boxes around answers on worksheets

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Implementation Question: How to create bounding boxes around answers on worksheets edit

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Implementation Question: How to create bounding boxes around answers on worksheets