Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

oh, wait. maybe that problem (finding (close) duplicates) can be far easier (also much cheaper) solved using image hashes , instead of SIFT/BOW features. e.g. phash results in a64bit entry, that can be easily searched in a common relational database (like postgres, which supports HAMMING norm, iirc)

oh, wait. maybe that problem (finding (close) duplicates) can be far easier (also much cheaper) solved using image hashes , instead of SIFT/BOW features. e.g. phash results in a64bit entry, that can be easily stored and searched in a common relational database (like postgres, which supports HAMMING norm, iirc)

oh, wait. I have a chat. I have a webhook that triggers when someone adds an image to the chat. I have to say if this image was previosly posted or not. Image is considered previosly posted if it looks similar to one of previosly posted, so rescaled, skewed etc images are recognized. That's all backstory here.

maybe that problem (finding (close) duplicates) can be far easier (also much cheaper) solved using image hashes , instead of SIFT/BOW features. e.g. phash results in a64bit entry, that can be easily stored and searched in a common relational database (like postgres, which supports HAMMING norm, iirc)

I have a chat. I have a webhook that triggers when someone adds an image to the chat. I have to say if this image was previosly posted or not. Image is considered previosly posted if it looks similar to one of previosly posted, so rescaled, skewed etc images are recognized. That's all backstory here.

maybe that problem (finding (close) duplicates) can be far easier (also much cheaper) solved using image hashes , instead of SIFT/BOW features. e.g. phash results in a64bit entry, that can be easily stored and searched in a common relational database (like postgres, which supports HAMMING norm, iirc)

[EDIT]:

while i would not say, your 2 lena images are the same, or even "sufficiently similar", it's you,who has to decide that, not me...

wrote a little test code:

void hash_fun(Ptr< img_hash::ImgHashBase > hasher) {
    Mat i1; hasher->compute(imread("lena1.png"), i1);
    Mat i2; hasher->compute(imread("lena2.png"), i2);
    Mat i3; hasher->compute(imread("solvay.jpg"), i3);

    cout << "lena1 -> lena2 : " << hasher->compare(i1,i2) << endl;
    cout << "lena1 -> solvay: " << hasher->compare(i1,i3) << endl;
    cout << "lena2 -> solvay: " << hasher->compare(i2,i3) << endl;
}

int main(int argc, char** argv)
{
    cout << endl << "average" << endl;
    hash_fun(img_hash::AverageHash::create());
    cout << endl << "blockmean" << endl;
    hash_fun(img_hash::BlockMeanHash::create());
    cout << endl << "colormoments" << endl;
    hash_fun(img_hash::ColorMomentHash::create());
    cout << endl << "marrhidreth" << endl;
    hash_fun(img_hash::MarrHildrethHash::create());
    cout << endl << "radialvariance" << endl;
    hash_fun(img_hash::RadialVarianceHash::create());
    cout << endl << "phash" << endl;
    hash_fun(img_hash::PHash::create());
}

resulting in:

average
lena1 -> lena2 : 31
lena1 -> solvay: 30
lena2 -> solvay: 23

blockmean
lena1 -> lena2 : 131
lena1 -> solvay: 113
lena2 -> solvay: 136

colormoments
lena1 -> lena2 : 0.6555
lena1 -> solvay: 22.5625
lena2 -> solvay: 22.4344

marrhidreth
lena1 -> lena2 : 273
lena1 -> solvay: 307
lena2 -> solvay: 324

radialvariance
lena1 -> lena2 : 0.522041
lena1 -> solvay: 0.30779
lena2 -> solvay: 0.485308

phash
lena1 -> lena2 : 28
lena1 -> solvay: 30
lena2 -> solvay: 40

so, it's like it always is, we have to watch our data closely, and pick the best fitting algo for it.