Revision history [back]

How can I detect logos in scanned documents, and identify if two logos are similar

I have files that comprise of scanned / faxed pages from multiple documents. I am trying to identify related pages of a document, so that I can separate the documents into individual PDF files. I'm new to CV and appreciate any guidance.

Logo's appear to be a criteria that potentially identify start of a document - how can I go about detecting a logo in a scanned page. An initial thought was to find large connected components. As I have OCR data, I can filter out large text areas given text bounding boxes.

At times, the logo may occur on subsequent pages of a document, indicating that the page is possibly part of the same document - how can I determine if the logos are the same (I am assuming some similarity measure is required).

Thanks

Viraf