How can I detect logos in scanned documents, and identify if two logos are similar [closed]
I have files that comprise of scanned / faxed pages from multiple documents. I am trying to identify related pages of a document, so that I can separate the documents into individual PDF files. I'm new to CV and appreciate any guidance.
Logo's appear to be a criteria that potentially identify start of a document - how can I go about detecting a logo in a scanned page. An initial thought was to find large connected components. As I have OCR data, I can filter out large text areas given text bounding boxes.
At times, the logo may occur on subsequent pages of a document, indicating that the page is possibly part of the same document - how can I determine if the logos are the same (I am assuming some similarity measure is required).
Thanks
Viraf
Logo matching should be easy, search for template matching https://docs.opencv.org/ref/master/de...
Alternatively you could train a classifier to detect your logo, but maybe this is overkill for your problem.
@berak I hope my comment makes sense?
Thanks - I'll look into template matching. I was unclear if template matching would work with poor quality images (scanned several times at low resolution) that may be skewed and/or possibly scaled. Any alternate suggestions on locating logos and extracting relevant aspects that may be used for template or some other matching scheme
@viraf, maybe you have some smallish example images / logos, you can show us ?
Snipp Alternatively you could train a classifier to detect your logo Snipp
https://docs.opencv.org/3.4/dc/d88/tu... You would use this approach if you have a lot of invariances (logo rotated or distorted for example). Usually logo matching is pretty easy and template matching is enough.
The set of logos is undefined, so I don't believe that training will work.
What are alternative ways of detecting logos on letters ? and what are the pros/cons ? I'm assuming that I will have to test out various algorithms to see what gives me the best result for the images I receive.
Given that SIFT/SURF are proprietary, what are alternate means for matching if the images are (a) scaled and/or (b) skewed. In general I see little skewing, but do see documents that have been scaled. I'll try template first as it should cover > 80% of the cases.
first, that is only interesting, if you plan to sell your software. and there are alternatives, like AKAZE / ORB / BRIEF and such.
note, that this approach (matching 2d features) will only work, IF the logo is there, you can never find out, if it's so or not. (it's not a tool to find "similarity")
IF you go with template matching, you can try to mend that by rotating/scaliing your query logo, and make several attempts.
Ouch... Is there a way to detect that the logo exists, and if so if it is similar ?
@viraf, there simply is no silver bullet.