# Find an image by nearest color?

Hi OpenCV’ers!

I’ve got a project in mind and need some expert input. I have a collection of about 1100 abstract images (desktop wallpaper tiles) that I want to be searchable by predominant color.

I’ve already found Python code to help me use the OpenCV libraries to do k-means clustering and extract the top 3 (for example) colors in each image. From that, I can get the three colors as RGB values and the percentage of each.

I plan to stuff all that data into a database for searching.

Ideally, I want the program user to select a color (using a standard color-picker dialog box), and then query the database for the top X images with the “closest matching” predominant color.

The problem is that I don’t know how to store and search this data effectively. That’s where I need your help. The math involved here is miles over my head.

From doing some reading (Wikipedia), I have a hunch that storing and searching HSV values makes more sense than RGB values. I can’t fully explain why, except that it seems easier to find “more or less blue-ish” using HSV.

Is anyone willing to help me untangle this? I thought this project would be straightforward, until I started digging into image analysis. This led me to k-means clustering, which led me here.

edit retag close merge delete

Using kmeans you get a features for each images. If you keep N colors you will get a feature 4N (3 for rgb +1 for percentage).

You can use this post to build data space and try to get nearest image

( 2018-03-14 11:05:06 -0500 )edit

Sort by » oldest newest most voted

How to store these values?

Just store the raw HSV values of each colour. This would make it easier especially on finding the closest match.

In general, you have a distance minimisation problem. When the user selects their colour from the dialog box, you want to search through your database looking for a colour whose Euclidean distance is the smallest.

If we assume that a database does not exist, then your function would look something like this,

import sys

all_colours_in_hsv = [colour1, colour2, colour3, colour4, colour5]

def closest_colour(selected_colour):
# set the distance to be a reallly big number
# initialise closest_colour to empty
shortest_distance, closest_colour = sys.max(), None

# iterate through all the colours
# for each colour in the list, find the Euclidean distance to the one selected by the user
for colour in all_colours_in_hsv:
# since your colours are in 3D space, perform the calculation in each respective space
current_distance = sqrt(pow(colour.H - selected_colour.H, 2) + pow(colour.S - selected_colour.S, 2) + pow(colour.V - selected_colour.V, 2))

# unless you truly care about the exact length, then you don't need to perform the sqrt() operation.
# it is a rather expensive one so you can just do this instead
# current_distance = pow(colour.H - selected_colour.H, 2) + pow(colour.S - selected_colour.S, 2) + pow(colour.V - selected_colour.V, 2)

# update the distance along with the corresponding colour
if current_distance < shortest_distance:
shortest_distance = current_distance
closest_colour = colour

return shortest_distance, closest_colour


Since you have a database and have not told us what kind it is (MySQL, MS. Access, NoSQL etc), then it is hard to tell you exactly what you need to do but the general query would be something like

Select top 3 from table colours order by Euclidean Distance calculation

more

Thank you. It seems you understand what I'm trying to do.

My database will be SQLite, so I can easily pack the database in with the application and not need to worry about a client/server setup.

I plan to pre-populate the tables with the color values, running k-means on each image and storing the results in columns like "color1_H", "color1_S", and "color1_V". Repeat for colors 2 and 3. I also plan to store the percentage of each color so I have another option for ranking the results.

I'm going to use Python for the k-means crunching and loading the database. The end application will be C++, so hopefully the distance calculation(s) will be fast.

I've got a follow-up question, but I'm running out of characters. Please see the next comment! Thanks!

( 2018-03-14 11:49:38 -0500 )edit

Can I also use the distance calculation you described to determine the level of "contrast" between the top 3 colors in the image? By "contrast" I mean how far apart the top 3 are on the spectrum. If the image is all shades of blue, I presume your algorithm would produce what amounts to a "low difference" between the three colors. But if the image is blue, yellow, and red, the distances would be much larger. Am I understanding this correctly?

That way, I could pick images that are close to the chosen color AND contain low-contrast companion colors. Or the chosen color and high-contrast companion colors.

If you're curious to see what I'm talking about, or if it makes the situation clearer, here's my plan:

http://www.bundito.com/index.php/2018...

Thanks!

( 2018-03-14 11:57:12 -0500 )edit

@bundito Your website try to extract fingerprint : it is not very fair

"It seems you understand what I'm trying to do." May be you should try to understand what is flann

( 2018-03-14 12:26:01 -0500 )edit

A couple things. 1: The method suggested by @LBerger does the exact thing I mentioned above (I didn't realize his comment until I posted this).It actually is better because it is optimized for image operations hence faster. All that I did is give you an introductory review of your problem and how to solve it on a small scale. flann not only implements this, but also gives you an array of other optimised operations. There are many ways to tackle distance minimisation. 2: I am still not 100% sure regarding the contrast measure because there are different types of contrasts. My advise, get the Euclidean stuff implemented then run experiments to see what conclusions you can draw about their contrasts as well.

( 2018-03-14 12:58:57 -0500 )edit

If no conclusion can be drawn, then refer to the different contrast calculations and see which one best suites your needs. If you are relatively new to all this, then I'd recommend you first implement my suggested method and run it on a couple images before even connecting it to your DB. Once you fully understand and can interpret the results, switch to flann for scalability.

( 2018-03-14 13:03:56 -0500 )edit
2

@LBerger: I wasn't aware of the device fingerprinting going on. It must be part of the generic WordPress install. I'll find it and kill it. I don't like it any more than you do. Sorry about that, but thanks for pointing it out.

@LBerger & @eshirima: I'll read up on flann and try to make sense of it.

I appreciate the help. I know what I'm trying to accomplish, but I never got enough mathematics education to understand these kinds of calculations. But as I'm wiring in a new blog post, this is an open-source project, and open-source projects almost always have helpful community members. It's true once more. I really appreciate it.

( 2018-03-14 13:13:40 -0500 )edit

I think you should real also hash module in opencv_contrib. If you want to use it in python you will need need to build opencv and opencv_contrib ( there is no official package for opencv_contrib)

( 2018-03-14 13:55:48 -0500 )edit

Hello again @LBerger and @eshirima ! I successfully ran all my images through a K-Means implementation from the sklearn library. I was also able to successfully search for the nearest color using the formula @eshirima outlined above. It worked great and ran quickly. Thank you both for helping me understand how that works.

But I've got a problem I don't fully understand. My extracted colors are "muddier" and darker than what an image-processing website generates. Theirs are more in line with what I want, and what I'd expect.

Rather than cluttering up this helpful answer, I posted my dilemma on Stack Overflow. Can someone take a look and take a stab at an answer? https://stackoverflow.com/q/49336210/...

Thanks. Very much appreciated.

( 2018-03-17 12:43:31 -0500 )edit

Official site

GitHub

Wiki

Documentation