Detect type of document in a real-time

asked 2015-04-05 09:15:07 -0500

nicolausYes gravatar image

updated 2015-04-05 11:07:27 -0500

I'm new to OpenCV and computer vision, so maybe you can push me in the right way.

The task is: to detect the type of document (from a certain number of known documents) in a real-time with the help of camera in the android device. Yet without OCR, just type recognition.

Examples of documents:
- Example 1
- Example 2
- Example 3

Documents always have a strict pattern of data placement.

What I've already tried and what are the main ideas.

  1. As you see, all documents have a logo of a company. According to the logo and it's position we can filter a significant amount of patterns to search.

    • At first I've tried Template Matching.
      Pros: none.
      Cons: very slow and has bad results in my case. The bad thing it's moving a template picture pixel by pixel along the picture (from a camera) so it can't do anything if the logo in template and the logo in a picture (from a camera) have different resolutions. So it seems it's not suitable for my needs.

    • Then I've tried Feature Matching with FLANN (but on PC).
      Pros: really good match results.
      Match 1
      Match 2
      Cons: very slow. It takes a long time to detect, compute keypoints and perform the match. It seems it's not suitable for real-time needs or I'm doing something wrong.

  2. So I need really simple and effective idea to detect types of documents fast. And adding new types of documents shouldn't decrease performance significantly.

    Now the idea is...

    • To get all contours first.
      Finding contours in your image. Maybe process image with Canny Edge Detector previously, it's doing a great job – one, two. The bad is it decreases finally performance significantly. On Nexus 5, on 1280x960 sample, Canny is working for 250-350ms, findContours (after Canny) – up to 60ms. So the result is 3 fps, which is not very responsive. Maybe to try some threshold functions here to filter unnecessary data.

    • Then try to convert contours to rectangles.

    • And according to rectangles sizes and positions try to recognize the document type.
      So try to match such document to something looking like this pattern (roughly).

    The bad thing, I think it will be hard and not very accurate to make rectangles from contours sometimes. Especially to make rectangles from blocks of text, diagrams or barcodes.

Any advices, ideas which way should I move?

edit retag flag offensive close merge delete



Only some general advice. (1) To get good accuracy, you must combine a lot of different methods. Therefore, don't fall into the trap that you can just keep the single best performing effort and throw away the best. (2) To combine different methods, you will look into some statistics and machine learning stuff to make a function (mathematically) to combine these results. (3) Regarding the performance, your choices are: (a) send image to server for processing, (b) do feature extraction on mobile and send "feature / signatures" to server for database lookup or feature-matching, (c) do everything on the mobile. Since nobody else have access to your source code, you are the only person who can conduct performance tests using each approach.

rwong gravatar imagerwong ( 2015-04-05 10:34:46 -0500 )edit

i think you can use InRange instead of canny. for speed up the process you can resize your image.

sturkmen gravatar imagesturkmen ( 2015-04-05 19:10:23 -0500 )edit

you could also use some morphological operations or hough transform to detect vertical and horizontal lines and then apply findContours() on this result, maybe also filter it a bit in order to discard some unwanted lines. Moreover, as @sturkmen pointed resizing your image could help.

theodore gravatar imagetheodore ( 2015-04-05 21:36:03 -0500 )edit