Ask Your Question

Transparency (alpha) handling in cascade training?

asked 2016-07-13 01:55:24 -0500

virtualnobi gravatar image

updated 2016-07-19 05:24:16 -0500

Since I was not getting good recognition results (on aerial photography of cattle on fields), I tried to use transparency in the positives, to let the cows stand out more. Ultimately, I want to count the cows on the image.

But using transparency is a dumb idea, it seems, and I would just like to confirm that

  • transparency in the positive images is ignored by the createsamples and traincascade tools
  • only the -bgcolor and -bgthresh of createsamples determine what is considered transparent

Is that correct?

The problem I see is that the background of the positive cow images vary a lot (gras vs. sand, gras texture, etc.), so I will have an issue specifying a single -bgcolor. Before I set out to mask the background, I'd like to know how to do it correctly.

Thanks, nobi

EDIT: Here are samples of the images I use:

Overview of field, from which cows (and calves) should be counted (cropped): image description

Positive image, extracted from one of the overview images: image description

Resulting sample file (since .vec cannot be uploaded, this is a screenshot of a .vec file): image description

The sample file is not one created from the positive image - because opencv_createsamples renames the files, I cannot easily find the corresponding one.

What you can clearly see in the sample file is the background of the positive, which could lead to the (very) low recognition rate (actually, zero).

edit retag flag offensive close merge delete


  • there is no transparency in computervision at all, this is only a rendering concept.
  • cows are not "rigid objects" . i think, training any kind of cascade is the wrong tool for that. rather try something parts based (dpm has a pretrained cow model)
  • createsamples has an option to produce synthetic samples from one image - don't try it that way, rather get enough real positive and negative images for training.
berak gravatar imageberak ( 2016-07-13 03:28:34 -0500 )edit

I don't think transparency will improve much the results, unless it clearly divides the cattle from the background. I would separate cows vs ground based on the cow's color, since it varies less, to get 'True' groups of cows, and then create artificial positives and negatives by copy-paste of True groups into photos of varied fields for positives vs empty or cluttered fields with other animals, houses, etc. for negatives. Just an idea.

bio_c gravatar imagebio_c ( 2016-07-13 03:31:22 -0500 )edit

@berak, thanks for the hints. I saw your answer to the linked question, but our photography is from above, so the cow is seen only from above (neck and back), instead of the front view shown in the linked question. I'll try to add a sample positive.

virtualnobi gravatar imagevirtualnobi ( 2016-07-19 04:35:48 -0500 )edit

@bio_c, that's what I tried (I think ;-) Unfortunately, we have Black Angus, which don't differ much from their shadows, and as far as I've seen, color is anyway reduced to greyscale, isn't it? The "copy-paste of True groups into photos of varied fields" is done automagically by the opencv_createsamples program, I figure - or what would you use? I prepared samples with (trial, comic cow) positives on transparent background, but the opencv_createsamples would ignore the transparency in the positives so that the samples still don't show the silhouette nicely....

virtualnobi gravatar imagevirtualnobi ( 2016-07-19 05:06:30 -0500 )edit

Basically you need to train your own part based object model, which is impossible through OpenCV if I am remembering correct. Like @berak said, your object is far from rigid so forget rigid models. They will never reach satisfying results!

StevenPuttemans gravatar imageStevenPuttemans ( 2016-07-19 07:33:22 -0500 )edit

@StevenPuttemans Thanks for the clear statement, but - since I am new to computer vision - can you give me some pointers to "non-rigid" models? I thought the pedestrian, face, cat, etc. recognition videos found on youtube were created with this technique... wrong assumption?

virtualnobi gravatar imagevirtualnobi ( 2016-07-20 00:54:47 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2016-07-20 03:47:01 -0500

updated 2016-07-20 03:48:15 -0500

Yep wrong assumption. Let me add some clarity through some publications.

  • The cat FACE detection and face detection are with this technique. This is because the face on its own is kind of rigid, so it works well. This is of course applicable to all the models found in
  • You can see that we also have fullbody models there, but they basically perform weakly. Not really due to the rigid content, but due to the technique being outperformed by many techniques out there.
  • Another rigid approach is the HOG+SVM approach described by dallal and triggs.
  • If you really want rigid models, which can perform good, go for ICF/ACF for example of Piotr Dollar.
  • Non rigid models are part based models, like the latent SVM models of Felzenswalb.

But if you want it even more exotic, go for the latest trend which is neural net processing. Techniques like CNN, RCNN, DNN, ... are top notch object detectors in their field!

But keep in mind, the more recent, the more complex they get, and most of them do not have a stable interface in OpenCV, though through GSoC they are slowly pooring in!

edit flag offensive delete link more



Steven, thanks a lot for this answer. It'll take a while to take that in... I already stumbled across the LatentSVM thingy (whatever that means :-) so maybe I'll try that first. I don't really want it exotic, just working ;-)

virtualnobi gravatar imagevirtualnobi ( 2016-07-22 01:13:43 -0500 )edit

Question Tools

1 follower


Asked: 2016-07-13 01:55:24 -0500

Seen: 557 times

Last updated: Jul 20 '16