Ask Your Question

Detecting car's windshield and rear window

asked 2018-08-30 06:26:37 -0500

pouya.1991 gravatar image

Hi I'm working on a car occupant counter system that must count occupant of cars in a video stream (see attached image). I read some related papers and found that in all models they firstly define tow ROIs by detecting the windshield and rear window, then look up for an occupant in these ROIs. I have searched for different object detectors in OpenCV and found two implemented object detectors (Cascade Classifier and YOLO). I have already used YOLO for detecting cars, but I'm hesitated about which model to pick for detecting windows. Is there any advise?

Sample Image

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2018-08-30 06:44:44 -0500

Since car windshields have quit a versatile shape, I would suggest to go for YOLO. My guess is that the cascade classifier is not versatile enough and works better for a more rigid object. If you want to stick with classical methods, and not deep learning, have a look at ICF/ACF/DeformableParts, but they have no integrated implementation in OpenCV.

Also consider the SSD detector, which is also a very efficient implementation. If you want to go a step further (and move outside OpenCV, then have a look at the Mask-RCNN detector en segmentor, which will give you the exact window area within your detection, and that in a single run through the architecture.

edit flag offensive delete link more


I am just curious if this makes sense - can you just use a pretrained yolo model (it can detect cars) And use an image segmentation network(i think such a thing exists) on the dtections ? Or is it better to train on windhsields directly?

holger gravatar imageholger ( 2018-08-30 06:50:00 -0500 )edit

I would suggest to train the network to detect the windows/windshields directly. This will require lot of labeling work, but you'll be able to have different classes for windshield, front and back window. It will make further processing easier.

...or what about labeling directly the vehicle occupants - or windows with and without occupants? It will hopefully give you directly the number of passagers.

kbarni gravatar imagekbarni ( 2018-08-30 07:14:53 -0500 )edit

@holger I think it is possible, however, it would require you to annotate the complete cars for segmentation. That is a lot more difficult. You can extend the window detector with a SVM that does a classification on person or not.

StevenPuttemans gravatar imageStevenPuttemans ( 2018-08-30 08:05:31 -0500 )edit

Thank you - also to the author of this thread for asking this!

holger gravatar imageholger ( 2018-08-30 08:36:51 -0500 )edit

thanks for your responds, @StevenPuttemans I think using Mask-RCNN is a good idea, because it can give me the exact shape, not a rectangle, Do you suggest l train a new Mask-RCNN to do just windows segmentation or car detection and windows segmentation together ? or as @holger and @kbarni suggested I just train a network which can detect/segment car windows ? In recent related works have been used two cameras (one for taking picture from windshield to detect front occupant(s) and another for taking picture from back rear window to detect back-seat occupant(s)) but I want to use one camera and a video stream, so I need to distinguish between windshield, front rear and back rear windows. Can I do this distinguishment by using a Mask-RCNN network?

pouya.1991 gravatar imagepouya.1991 ( 2018-08-30 08:38:15 -0500 )edit

I personally would combine the proposed solutions:

Train a yolo model(darknet yolo) as initially suggested by steven. That model will have one class 0: windshield with person

You will need to label(draw bounding box) windshields with persons from the different angles. A big variety and combination of occupant and windshield sizes/positions is necessary for good detection.

Or alternativly you can define more classes(windshield with person, windshield without persons, windshield top, etc) but i personally think one class should be enough and darknet yolo should be able to learn the concept of "person behing windshield" if supplied with enough data.

You can also try other dnn frameworks than yolo - tensorflow and a ssd model for example is an option but a bit slower than yolo. Good luck

holger gravatar imageholger ( 2018-08-30 11:57:47 -0500 )edit

some training hints for darknet yolo

  1. Use picture with different aspect ratios
  2. Supply negatives too. Best is if you sample the negatives from the picture with your objects in it. For example an empty windshield would be a good negative. You can also take negatives from somewhere else but i think you got the point.
  3. Use big variety of windshields and persons from different angles and position.
  4. It is possible to mark multiple objects in one picture. Supply such pictures too (one, two, n persons in the car)
  5. supply at least around 1000 sample.In your case, as you really have a big combination of person <> windshield, more would be good.

Yolo can be "crashy" on training. For example only 0 s for coordinates lead to segmentation fault during resizing. Verify your data!

holger gravatar imageholger ( 2018-08-30 12:57:51 -0500 )edit

thanks @holger, I already use pre-trained YOLO3-spp model for the car and occupant detection, Do you think train a YOLO3-tiny could effectively do windows detection task?

pouya.1991 gravatar imagepouya.1991 ( 2018-08-30 13:56:23 -0500 )edit

Yes definitivly.

One thing i want to add is the following: If you final task is "count all cars with occupants and tell how many are in that car": You would also need a car class.

  • Get the bboxes for cars
  • Get the bboxes for occupants (persons behind windshield)

If bbox occupant in bbox car -> count -> that car has n occupants.

I know that the yolo v3 tiny has car class pretrained and performs well for that detection. It also has person class.

It already sometimes detect persons in cars through the windshield. Maybe fine tuning that model is also a way to go.

holger gravatar imageholger ( 2018-08-31 02:13:27 -0500 )edit

I think the proposals made by @holger are all valid. If you really want the segmentation part, know that MaskRCNN does not segment multiple elements within a window. It first detects and then segments the class. So going for a car detector that then segments the whole car, would need a YOLO + seperate segmentation network, instead of mask-RCNN.

StevenPuttemans gravatar imageStevenPuttemans ( 2018-08-31 07:23:04 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-08-30 06:26:37 -0500

Seen: 331 times

Last updated: Aug 30 '18