Confusion on Transfer Learning Models

asked 2018-10-05 09:41:27 -0500

magic56 gravatar image

Maybe this may not be the place to ask this, but I am a beginner and I feel like someone here may think this is an easy question. I'll start with a little background. I am in the process of creating a Real-time object detection program that can detect some US traffic signs and maybe traffic lights. I have decided that openCV with the use of Keras/Tensorflow are probably my best options in achieving this goal. Within these I figured (I'm still new to this field) that transfer-learning on ImageNet would be my best option in building an image classifier. I have downloaded the LISA traffic sign data set and decided this is what I want to use for Transfer-learning on ImageNet. The part that's hanging me up in this whole process would be would it be better to show a street view with signs on it (this is what the LISA dataset contains) or just gather my own pictures (from the internet) of closeups of these signs. Also if any other part of my logic is flawed please let me know I am very interested in learning.


edit retag flag offensive close merge delete


  • does the LISA data fit your situation ?

  • do you know how the annotations look like ? you'll probably need bounding boxes.

  • did you already decide, which pretrained model to use for this ? transfer learning would mean here: use a pretrained network (e.g. on imagenet), and retrain the box related layers (only, freezing the conv layers!) with your own data(&classes/boundingboxes).

berak gravatar imageberak ( 2018-10-05 09:53:13 -0500 )edit
  • Not neccisarily sure as this is my first time ever doing a project that is this complex.
    • The annotations are labeled within the downloaded data set as well as some scripts to separate the images into separate folders by class
    • I planned on using mobilenet in hopes to transfer this application to a mobile device (eventually)
magic56 gravatar imagemagic56 ( 2018-10-05 13:59:33 -0500 )edit

Well if i get you right you are asking how to build a dataset? Some hints:


  • Around 1000 samples per class as a approximate value


  • Images with your object in natural situations. This is important because you want your object to be detected in that situations. For example a car on a street.

  • Various Image and Object sizes
    This is important to learn a scale invariant representation of your object. For example toy truck or truck

  • Various angles This is to detect your object from various positions

  • Supply negatives This helps during learning to separate your object from the background

From my opinions you can archieve the same when putting the background images into a separate class "other". I think of negatives as a pseudo class.

holger gravatar imageholger ( 2018-10-05 14:53:09 -0500 )edit

imho, the LISA dataset is already quite a good choice for this.

it has ~70 classes and ~100 images per class in various scales. (for multi-class detection, it does not need any "background" samples)

berak gravatar imageberak ( 2018-10-06 02:26:14 -0500 )edit

Here is my thought on the approach then:

To train: *Preprocess the LISA images to get only the signs? ( crop the background out) *Put them in separate class folders. *Train the neural network on these images.

In real scenario (after training): *Identify if something is a sign by using the different contour functions? *Feed through CNN *profit?

The ?s are where I'm mostly confused

magic56 gravatar imagemagic56 ( 2018-10-08 11:12:40 -0500 )edit


Well sampling the background and use them as negatives is ok - i think you could gain from this. But berak has a point with saying "...or multi-class detection, it does not need any "background" samples". On the other hand, i made good experiences with negatives even in a multiclass scenario and the yolo framework encourages you to do so.

"...Desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects..."

But please dont crop out the traffic signs as train images. This could hurt your detection abilities. Just mark the bounding box(es) and feed in the image unprocessed

holger gravatar imageholger ( 2018-10-08 16:03:56 -0500 )edit


This is also called interference. For object detection you usually run a model(feed image to network in non train mode) and get back a matrix with the bounding boxes and probabilities. Thats it. Forget about contours in that scenario.

I recommend you to actually run a pretrained model for testing purposes before training so you get familar with detection "api".

holger gravatar imageholger ( 2018-10-08 16:08:26 -0500 )edit

Happy labeling :)

holger gravatar imageholger ( 2018-10-08 16:14:00 -0500 )edit