Ask Your Question
0

There is an optimal size for images used with SIFT in object recognition?

asked 2016-12-05 04:47:39 -0600

lovaj gravatar image

In this answer I asked if there is any difference between formats for SIFT. Now, my next question is: there is any optimal/suggested/standard/better/whatever image size?

Results from papers are preferred, obviously personal experience matters also.

edit retag flag offensive close merge delete

Comments

imho, you can't go smaller, than the internal patch size (32,iirc), but from then on, it's only: the larger, the slower.

berak gravatar imageberak ( 2016-12-05 06:46:58 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2016-12-05 05:01:03 -0600

Vintez gravatar image

updated 2016-12-05 05:05:28 -0600

I can't give you an answer with a Paper or sth. similar (I don't even know, if someone tested sth. like this). But I can tell you what are the advantages or disadvantages of a small or a high Image size.

You probably already know, that SIFT is not the fastest existing Feature Descriptor which exists. If you want to accelerate the Recognition, you can try to get smaller Image Sizes. The processing will be by far better than. For Example, in an Android App I used a combined FAST Detector and SIFT Descriptor algorithm (so a already speeded up style of SIFT). The Initial Size I used, was 3264 x 2448 (so a large resolution for a smartphone) The Algorithm for training an Image runned in ~10 seconds (on a Nexus 5 device). After that I tested the 1920 x 1080 Resolution on the sam device and the training of the image runned in ~4 seconds.

So if you want to process the recognition or training as fast as possible, you should use a small Image size.

But, if you take a small Image size, you will probably get a smaller Image Pyramid, which also means, that you will have a smaller range of scales that you can provide with your Algorithm. e.g. my created Pyramid with the 3264 x 2448 resolution was slower to build, but it also had more Images in the Pyramid, so my Object could be detected from a bigger range than it could with the image pyramid builded by the 1920 x 1080 resolution Image.

So if you want to support a large set of distances ( scales ) it is better if you use a large Image Size, which will result in a slower processing. If you want a fast recognition and training of a Image you should pick a smaller Size.

What would be a good combination is: Use a rather large Image to train your Object for the recognition, thus you gain a good range, in which the Image could be detected, and for recognition use a smaller Image (but not too small!) which you compare to your reference Image so you get a good performance while you want to track your object.

edit flag offensive delete link more

Comments

Thanks for your answer. I'm trying to use SIFT for the oxford building dataset (http://www.robots.ox.ac.uk/~vgg/data/...), which consists of ~5k images, with the largest size around 1024. On average 10k keypoints per image are detected: as consequence we run out of memory in no time. In fact 10^3 (avg # keypoints per image) * 5 * 10^3 (images) * 128 (descriptor dimension) * 32 (float representation) / 8 (bytes) / 10^6 (GBs) = 2560 GBs! Did I do something wrong in the math?

lovaj gravatar imagelovaj ( 2016-12-06 12:45:42 -0600 )edit

(just to be clear) You try to build a KD-Tree with that amount of KeyPoints and that Descriptor Dimension? Or do you even save the Images too? Because, you just have to build on KD-Tree with the Descriptors in it, not anything else. If you already do this, try to ask a new Question, because I'm no expert in Memory usage

Vintez gravatar imageVintez ( 2016-12-07 01:57:50 -0600 )edit

KD-trees are inefficients for big dimension vectors, such as SIFT descriptors. LSH or others Approximate Nearest Neighbor solutions are used instead :)

lovaj gravatar imagelovaj ( 2016-12-07 07:41:58 -0600 )edit

Ok, and do you just save the descriptor? What about the (3x3x4) 36 element Solution? According to Lowe it just is 10% worse than the Original 128 Elements Vector. With that you could get a little less Memory usage. Also what about a Vocabulary Tree or Spill Forest Solution? (I don't have memory problems in my Application, because I just use a single Reference Image -> BFM).

Vintez gravatar imageVintez ( 2016-12-07 08:00:48 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2016-12-05 04:47:39 -0600

Seen: 978 times

Last updated: Dec 05 '16