How can I detect logos in scanned documents, and identify if two logos are similar [closed]

asked 2018-06-29 06:21:12 -0600

viraf gravatar image

I have files that comprise of scanned / faxed pages from multiple documents. I am trying to identify related pages of a document, so that I can separate the documents into individual PDF files. I'm new to CV and appreciate any guidance.

Logo's appear to be a criteria that potentially identify start of a document - how can I go about detecting a logo in a scanned page. An initial thought was to find large connected components. As I have OCR data, I can filter out large text areas given text bounding boxes.

At times, the logo may occur on subsequent pages of a document, indicating that the page is possibly part of the same document - how can I determine if the logos are the same (I am assuming some similarity measure is required).



edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by sturkmen
close date 2020-11-13 13:48:34.260481


Logo matching should be easy, search for template matching

Alternatively you could train a classifier to detect your logo, but maybe this is overkill for your problem.

holger gravatar imageholger ( 2018-06-29 06:44:25 -0600 )edit

@berak I hope my comment makes sense?

holger gravatar imageholger ( 2018-06-29 06:50:01 -0600 )edit

Thanks - I'll look into template matching. I was unclear if template matching would work with poor quality images (scanned several times at low resolution) that may be skewed and/or possibly scaled. Any alternate suggestions on locating logos and extracting relevant aspects that may be used for template or some other matching scheme

viraf gravatar imageviraf ( 2018-06-29 06:54:19 -0600 )edit

@viraf, maybe you have some smallish example images / logos, you can show us ?

berak gravatar imageberak ( 2018-06-29 07:02:42 -0600 )edit

Snipp Alternatively you could train a classifier to detect your logo Snipp You would use this approach if you have a lot of invariances (logo rotated or distorted for example). Usually logo matching is pretty easy and template matching is enough.

holger gravatar imageholger ( 2018-06-29 07:06:51 -0600 )edit

The set of logos is undefined, so I don't believe that training will work.

What are alternative ways of detecting logos on letters ? and what are the pros/cons ? I'm assuming that I will have to test out various algorithms to see what gives me the best result for the images I receive.

Given that SIFT/SURF are proprietary, what are alternate means for matching if the images are (a) scaled and/or (b) skewed. In general I see little skewing, but do see documents that have been scaled. I'll try template first as it should cover > 80% of the cases.

viraf gravatar imageviraf ( 2018-06-29 07:20:49 -0600 )edit

Given that SIFT/SURF are proprietary, what are alternate means for matching

first, that is only interesting, if you plan to sell your software. and there are alternatives, like AKAZE / ORB / BRIEF and such.

note, that this approach (matching 2d features) will only work, IF the logo is there, you can never find out, if it's so or not. (it's not a tool to find "similarity")

In general I see little skewing, but do see documents that have been scaled

IF you go with template matching, you can try to mend that by rotating/scaliing your query logo, and make several attempts.

berak gravatar imageberak ( 2018-06-29 07:26:48 -0600 )edit

Ouch... Is there a way to detect that the logo exists, and if so if it is similar ?

viraf gravatar imageviraf ( 2018-06-29 07:49:51 -0600 )edit

@viraf, there simply is no silver bullet.

berak gravatar imageberak ( 2018-06-29 07:57:27 -0600 )edit