Is this type of audio analysis possible with OpenCV, or related?

I'm aware that OpenCV may not be able to completely achieve what I'll outline below. This forum seemed like a good place to ask this question, since there seems to be quite a few people with extensive knowledge of OpenCV and other related libraries.

I'm working on a project that boils down to this, primarily :

  1. Analyze three audio files, and translate them into audio fingerprint images. The idea is to utilize the Chromaprint/AcoustID( technology, which is available in C and Python, I think.

  2. Take the resulting audio fingerprint images, compare the images, and determine the amount of similarities between each fingerprint.

  3. Output some display of the amount of similarities between them.

Is there a way to utilize Chromaprint, with OpenCV to do this?
Here you can see what the actual audio fingerprint images look like :

Does anyone have a suggested process for solving this?
Should I not be using OpenCV to do this, and instead be using a different library?

I'm not expecting an entire step-by-step on how to do this, just whatever information I can gather. There are some other aspects of this involving translating the data in the images to specific colors, but I figure I should ask one question at a time instead of all my questions at once ...

Thank you!