Ask Your Question

Which method for object detection at 25 fps, full HD?

asked 2019-10-14 12:42:34 -0500

Erik Langskjegg gravatar image

updated 2019-10-14 13:37:33 -0500

So I promised to prototype a model for object detection trained on my own labeled videos; in real time on full HD video @25 fps. I have spent quite some time learning Mask R-CNN. Now the model is running I realized that this library is too slow.

I have googled OpenCV, browsed through LearnOpenCV, searched these forums, peaked at the tutorials at etc. I understand that using the DNN module with C++ will let me train my own model and do object detection at some frame rate.

Which OpenCV based method would you choose for training an object detection model to work @25 fps, full HD?

edit retag flag offensive close merge delete


Based on this blog post, it looks like that specific method has an inference time of ~300 milliseconds? Does that extend to video inference using ~8 seconds for 25 frames?

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-14 13:26:44 -0500 )edit

Sorry if I'm conceived as being lazy for asking instead of trying. I just would like a friendly heads-up if my goal is not feasible.

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-14 13:46:20 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2019-10-14 14:42:34 -0500

kbarni gravatar image

updated 2019-10-15 04:38:20 -0500

Come on, please make a minimal effort if you want to get some help!

Anyway, don't expect an answer to a question like that! It depends on many factors:

  • What do you want to detect? (for detecting simple shapes you don't need complex algorithms)
  • Do you really need full HD? (hint: no. anyway, no DNN works on full HD; they resample the image to a lower resolution).
  • What precision do you need for the object identification?
  • Do you need 25 fps real time? (hint: no. your scene won't change a lot in 1 second, so no need to analyse 25 times the same image).
  • What kind of hardware do you have? (if you have a strong desktop computer and nvidia GPU, nothing will beat a DNN on CUDA. If you have a raspberry pi or a basic laptop, don't even dream about DNN).
  • Do you have lot of training data? (DNNs will overfit on small database)
edit flag offensive delete link more


@kbarni, you are right, sorry. I do have my own Mask R-CNN model working offline (separate still frames).

  • I want to detect moving road signs filmed from a car with a HD TV camera. The bounding boxes should be filled with a solid color and moving as smoothly as possible. The video output is full HD TV broadcast quality 25 fps.

  • Right now I have a laptop with 2GB GPU, but I can perhaps ask for a workstation with a decent GPU. I thought OpenCV did not support Cuda?

  • I have to make all the training data myself, right now I have only a handful of labels. Those labels were enough for a simple test of the Mask model.

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-14 15:06:41 -0500 )edit

I still don't understand why do you need broadcast quality output? You want to run a TV show with traffic signs? Normally you should be only interested what kind of sign is near the road...

kbarni gravatar imagekbarni ( 2019-10-15 04:50:56 -0500 )edit

Thanks for asking @kbarni! If I can get a proof of concept running, the solution might eventually be shown on TV. The input will be a live HD video feed; the model should detect road signs and replace them with colors; the output will be a HD video feed for TV, delayed by maximum 5 seconds compared to input. I am sorry I can't disclose more details.

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-15 05:04:47 -0500 )edit

I do not need pixel-precise segmentation or advanced polygons by the way. Rectangular bounding boxes with 4 corners will be sufficient. I expect quite a bit of zooming and camera pan, so the objects could move relatively fast.

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-15 06:01:12 -0500 )edit

Okay, now you are getting clearer. So: - You don't need masks, just bounding boxes! - You don't need HD quality output, just bounding boxes drawn on a HD image.

I suggest to use YoloV3; it is very fast (much faster than Mask-RCNN) and runs real-time on GPUs or nvidia Tegra cards. If you prefer OpenCV, it can take cv::Mat images as input (otherwise just pass the memory buffer). Then you can draw the results on the HD image:

Detector det("network.cfg","network.weights");
std::vector<bbox_t> markers;
markers = det.detect(img);
for(size_t i=0; i < markers.size(); i++) {
    if(markers[i].prob > 0.5)
        rectangle(img,Point2i(markers[i].x, markers[i].y), Point2i(markers[i].x + markers[i].w, markers[i].y + markers[i].h), Scalar(0,0,255), 5);
kbarni gravatar imagekbarni ( 2019-10-15 08:47:10 -0500 )edit

Thank you so much! I will look into YoloV3 now, starting with Learn OpenCV

Erik Langskjegg gravatar imageErik Langskjegg ( 2019-10-15 09:19:25 -0500 )edit

Question Tools

1 follower


Asked: 2019-10-14 12:42:34 -0500

Seen: 750 times

Last updated: Oct 15 '19