How to search for near identical images?
I want to implement an image DB. If i've got it correctly, I need to use BoW and then I would be able to classify a new image if I have seen it before.
But all examples I have seen nefore (e.g. this) have some set of traning images, when training is done then BoW is using to classify the rest.
My situation is different: I have to search through the BoW for an image, and if there is no match then we consider it to be a new image, otherwise we don't add it and say "already exist". Skewed, rescaled and others images should be considered the same, so I used SIFT as a feature detector.
So the problem in a nutshell: At beginning, I have an empty database of images (so I have no data to train on). I have a stream of images. I can calculate features of this image. Then I have check if there is some images which descriptors match descriptors of this image more than some treshold then I say "existing", otherwise I add these descriptors as a new image and return "new image". If I get the same image (or similar image) in future I say "existing".
It requires rebuild BoW with every new image, which seems to be a very complicated solution. All search engines on the internet are doing this so I'm surprised there is almost no information in the web.
Okay, I tried to use PHash
to perform a search operation. But it fails to recognize slightly different images. here is a sample code (it's a thin Rust wrapper over C++ interface, so don't be affraid to see an alien language):
pub trait Database {
fn save_image(&mut self, image: &[u8]);
fn load_images(&self) -> Vec<Vec<u8>>;
}
pub enum ImageVariant {
New,
AlreadyExists
}
pub struct Storage<T: Database> {
database: T,
hasher: PHash,
images: Vec<Mat>
}
impl<T: Database> Storage<T> {
pub fn new(database: T) -> Self {
Self {
database,
hasher: PHash::new(),
images: Vec::new()
}
}
}
// the main execution happens here
impl<T: Database, D: Clone> Storage<T> {
pub fn save_image_if_new(&mut self, image: &[u8], filename: &str) -> ImageVariant {
const DIFF: f64 = 0.5;
let mat = Mat::image_decode(image, ImageReadMode::Grayscale);
let mat = self.hasher.compute(&mat);
let mut last_diff = std::f64::INFINITY;
for image in self.images.iter() {
let diff = self.hasher.compare(&mat, &image);
if diff < last_diff {
last_diff = diff;
}
}
if last_diff < DIFF {
return ImageVariant::AlreadyExists;
}
self.database.save_image(image);
self.images.push(mat);
ImageVariant::New
}
}
Then I run the test on following images:
So PHash.compare gives distance 28
for images 1 and 2 (should get small distance), while it gives 30
for images 1 and 3 (should get a big distance). So it's practically doesn't allow to know if I already say this image before.
The entire code is available here and here
Test code:
#[test]
fn it_works() {
let lenna = fs::read(get_asset_path("lenna.png")).unwrap();
let lenna_demotivator = fs::read(get_asset_path("lenna_demotivator.png")).unwrap();
let ...
sorry,but your question is too fuzzy, or something, to be answered, or something.
also, you're kinda asking 5 questions at the same time.
split it up into small, seperate questions / problems, show what you tried so far, and at what point you got stuck, then we can (try to) help with it.
@berak okay, i removed some consideration, if you like to. My question is straight: how can a DB being implemented? All examples I have seen (e.g. this) have a training set of images and working set of images, while I have to have a dynamically increasing storage. There is no examples or information how to do so, while rebuilding
BoW
every time I add every image seems to be impractical.no, you only build the bow vocabulary once (and save it somehow, e.g. using cv::FileStorage) . you need enough data for this, though, so you find representative clusters for even unseen (future) images.
if you're using a flann::Index, (on BowimageDescriptors) you'll have to train it on startup of your program (it's pretty fast), you can serialize the train data, though, and add new items on demand.
how to store it ? idk. opencv does not have any (good) means for that. using a db for this is a bit unpractical, since you need ALL of the training data at once, to train your index. the better ideas i've seen here went like: make chunks of ~1000 feature vectors , and append new data to the last. once it's full, add a new chunk.
@berak I have rewritten my question more clearly, please, feel free to give a new feedback if I'm doing something wrong. I tried to provide some context (just in case) with an exact question "How BoW can be incrementally filled and used simultaneously)."
no, wait, Bow does not work like this.
instead you compare your image features to the BoW cluster centers, and increase a bin for the best matching cluster feature. the resulting histogram is the final, BoW feature, that goes into your index (for either testing or training)
Well, maybe BoW is a no way here and it's an XY problem. This is why I add so much context, to make a decision wheighted. What approach should I choose then? I whould love to mark it as an answer if you have any idea how it could be implemented.
and again, afaik, you cannot retrain an index, and retain the old tree structure (append data).
you have to retrain it from scratch, but at least you can try to store the features in a (more or less) dynamic way
again, you cannot use matching sift features from images directly here, because most of the matches are insignificant for clustering/classification. also the dimension problem (3 features in 1 image, 17 in another). makes it impossible to use with an index (or any other ml)
the Bow approach is an attempt to overcome those problems. (and commonly used for CBIR like situations)
maybe you have to look at improvements of that idea, like VLAD or fisher-vectors ?
to get over the XY problem, you'd have to all the way back, and give us a specification, whatyou're trying to achieve, using what kind of data, etc.
I have a chat. I have a webhook that triggers when someone adds an image to the chat. I have to say if this image was previosly posted or not. Image is considered previosly posted if it looks similar to one of previosly posted, so rescaled, skewed etc images are recognized. That's all backstory here.