This is a brief description of
training process which has been used
to get
res10_300x300_ssd_iter_140000.caffemodel.
The model was created with SSD
framework using ResNet-10 like
architecture as a backbone. Channels
count in ResNet-10 convolution layers
was significantly dropped (2x- or 4x-
fewer channels). The model was trained
in Caffe framework on some huge and
available online dataset.
Prepare training tools You need to use "ssd" branch from this repository
https://github.com/weiliu89/caffe/tre...
. Checkout this branch and built it
(see instructions in repo's README)
Prepare training data. The data preparation pipeline can be
represented as:
(a)Download original face detection
dataset -> (b)Convert annotation to
the PASCAL VOC format -> (c)Create
LMDB database with images +
annotations for training
a) Find some datasets with face
bounding boxes annotation. For some
reasons I can't provide links here,
but you easily find them on your own.
Also study the data. It may contain
small or low quality faces which can
spoil training process. Often there
are special flags about object quality
in annotation. Remove such faces from
annotation (smaller when 16 along at
least one side, or blurred, of
highly-occluded, or something else).
b) The downloaded dataset will have
some format of annotation. It may be
one single file for all images, or
separate file for each image or
something else. But to train SSD in
Caffe you need to convert annotation
to PASCAL VOC format. PASCAL VOC
annotation consist of .xml file for
each image. In this xml file all face
bounding boxes should be listed as:
<annotation> <size>
<width>300</width>
<height>300</height> </size> <object>
<name>face</name>
<difficult>0</difficult>
<bndbox>
<xmin>100</xmin>
<ymin>100</ymin>
<xmax>200</xmax>
<ymax>200</ymax>
</bndbox> </object> <object>
<name>face</name>
<difficult>0</difficult>
<bndbox>
<xmin>0</xmin>
<ymin>0</ymin>
<xmax>100</xmax>
<ymax>100</ymax>
</bndbox> </object> </annotation>
So, convert your dataset's annotation
to the format above. Also, you should
create labelmap.prototxt file with the
following content: item { name:
"none_of_the_above" label: 0
display_name: "background" } item {
name: "face" label: 1
display_name: "face" }
You need this file to establish
correspondence between name of class
and digital label of class.
For next step we also need file there
all our image-annotation file names
pairs are listed. This file should
contain similar lines:
images_val/0.jpg
annotations_val/0.jpg.xml
c) To create LMDB you need to use
create_data.sh tool from
caffe/data/VOC0712 Caffe's source code
directory. This script calls
create_annoset.py inside, so check out
what you need to pass as script's
arguments
You need to prepare 2 LMDB databases:
one for training images, one for
validation images.
- Train your detector For training you need to have 3 files:
train.prototxt, test.prototxt and
solver.prototxt. You can ...
yes i would like to know that dataset too and use it :) Alternativly you can use the voc dataset and filter by person class. Then label the faces.
I can also recommend the labeled faces in the wild dataset - as its name suggests - its labeled.
Hello @vikasguptaiisc and @Eduardo@berak, I checked the wiki page on Opencv but still cannot figure out which dataset is used for training the current caffe modell "res10_300x300_ssd_iter_140000_fp16.caffemodel". Any information on that will be helpful.