# How to run the UINT8 DNN face detector example?

I downloaded dkurt's great pre-trained models from the contrib repo following Adrian Rosebrock's blog, and am able to run the floating-point Caffe models, but not the uint8 model which is in Tensorflow format. I'm using a rpi3, but the issue seems unrelated to the architecture: it throws the errors in the model importing stage.

Please help! Solving this would help a lot with the performance on rpi - the caffe models take about 800 millis per frame to process, which is just a little too slow for real-time processing. This must be working for others, as I can see a test being checked in that runs this model...

If I don't use a pbtxt, it stops on unknown nodes, which I believe is expected:

OpenCV(3.4.1) Error: Unspecified error (Unknown layer type Square in op conv4_3_norm/l2_normalize/Square) in populateNet, file /home/pi/opencv/opencv-3.4.1/modules/dnn/src/tensorflow/tf_importer.cpp, line 1582 Traceback (most recent call last): File "detect_faces_video.py", line 33, in <module> net = cv2.dnn.readNetFromTensorflow(model) cv2.error: OpenCV(3.4.1) /home/pi/opencv/opencv-3.4.1/modules/dnn/src/tensorflow/tf_importer.cpp:1582: error: (-2) Unknown layer type Square in op conv4_3_norm/l2_normalize/Square in function populateNet

If I use the pbtxt I found in the repo, then I get this

OpenCV(3.4.1) Error: Assertion failed (layer.input_size() == 1) in populateNet, file /home/pi/opencv/opencv-3.4.1/modules/dnn/src/tensorflow/tf_importer.cpp, line 1485 Traceback (most recent call last): File "detect_faces_video.py", line 32, in <module> net = cv2.dnn.readNetFromTensorflow(model, prototxt) cv2.error: OpenCV(3.4.1) /home/pi/opencv/opencv-3.4.1/modules/dnn/src/tensorflow/tf_importer.cpp:1485: error: (-215) layer.input_size() == 1 in function populateNet

If I export my own pbtxt with this

python tf_text_graph_ssd.py --input opencv_face_detector_uint8.pb --output opencv_face_detector_uint8.pbtxt

then I get the following:

Traceback (most recent call last): File "detect_faces_video.py", line 31, in <module> net = cv2.dnn.readNetFromTensorflow(model, prototxt) cv2.error: OpenCV(3.4.1) /home/pi/opencv/opencv-3.4.1/modules/dnn/src/tensorflow/tf_importer.cpp:553: error: (-2) Input layer not found: BoxPredictor_0/ClassPredictor/BiasAdd in function connect

import cv2
#prototxt = "deploy.prototxt.txt"
#model = "res10_300x300_ssd_iter_140000.caffemodel"
#model = "res10_300x300_ssd_iter_140000_fp16.caffemodel"
#net = cv2.dnn.readNetFromCaffe(prototxt, model)    # This works with either model

prototxt = "opencv_face_detector_uint8.pbtxt"
model = "opencv_face_detector_uint8.pb"
net = cv2.dnn.readNetFromTensorflow(model, prototxt)   # This doesn't

edit retag close merge delete

1

@BViktor, please try to import model again with a newer state of OpenCV. Actually, despite weights are in UINT8 they are converted to FP32 because only FP32 computations are supported on CPU for now. However you can achieve better efficiency by varying input image size. Model has been trained on 300x300 images but it also works well on lower resolutions. In example, one of tutorials resizes inputs to 128x96. Note that the lower image the less accurate predictions could be. So you can use an origin caffemodel without OpenCV recompilation.

( 2018-05-21 00:33:05 -0500 )edit

Wow, this is amazing! I played around with the settings and managed to get pretty reasonable detection with 80x80 inputs at 10 FPS on the Raspberry Pi 3! I'm using a low confidence threshold (0.15), but I guess due to the low resolution it doesn't seem to have many false positives. At 128x128 speed is about 3.5 FPS, but interestingly at 96x96 or 88x88 the speed is still the same. We hit some sweet spot at 80x80 that the speed increases drastically. I wonder why? Also very interestingly, the 80x80 case uses two CPU cores, whereas the 88, 96 etc. only uses one.

( 2018-05-21 01:43:07 -0500 )edit

@BViktor, good luck! You may also keep an origin aspect ratio so height could be even less than 80.

( 2018-05-21 04:40:39 -0500 )edit

Sort by » oldest newest most voted

change this line maybe usefull: //cv::Mat blob = cv::dnn::blobFromImage(frame,1./255.,Size(300,300)); cv::Mat blob = cv::dnn::blobFromImage(frame,1.0, Size(300,300)); model is uint8 has no need to normalize.

more

i got it running using the pbtxt from here , but you'll probably need to update your opencv libs / cv2 to latest master

(then, the uint8 model has significantly less accuracy, so i'd also recommend to stick with the fp16 caffemodel and your current codebase for now)

more

1

@berak, actually, on the origin 300x300 resolution it's even a bit better. The thing is that weights are just converted back to FP32 during import and all the computations are done in FP32.

( 2018-05-21 00:42:22 -0500 )edit

@dkurt i wonder, where that weird 128x96 shape comes from ?

and indeed, accuracy is much better with 300x300 (~90%), but it's very slow, then.

( 2018-05-21 00:45:51 -0500 )edit
1

Thank you - I wouldn't mind some accuracy loss if it could bring the FPS to 5+, which would be enough for controlling a small robot. Haar cascade is the other alternative but that only detects literally frontal faces, not if face is slighly angled.

I have to admit that I'm a bit scared to check out and recompile master directly, it is fairly time consuming on the Pi... but your helpful responses give me hope that I'm not on my own and I can get help if I get stuck. Thank you!

( 2018-05-21 00:47:15 -0500 )edit
1

@berak, we just experimented with a balance between an accuracy and efficiency applicable for tutorial demo because opencv.js turns most of compiler optimizations off. 128x96 is just a 640x480 divided by 5 :)

( 2018-05-21 04:37:44 -0500 )edit

Thanks for all your help. Here are my results:

I managed to successfully re-build opencv from master, with NEON & FVPV3 on top of a nightly build of VC4CL. I also have a TensorFlow 1.8.0 installed from tensorflow-on-arm project here: https://github.com/lhelontra/tensorfl.... I tried to enable TBB but I seem to get defaulted to pthreads.

I can now load the uint8 model with the provided prototxt, and the detection works, by and large.

1. The detection boundaries appear slightly off if I downscale the input - this seems specific to this model, as it doesn't happen with either Caffe models.
2. Speed is essentially the same as the Caffe model. Again 80x80 is the sweet spot where it can reach 10 FPS, anything larger and it drops to 3-4 FPS
3. The speed of the Caffe model is also roughly the same as it was with 3.4.1
4. When I run it as the pi user, it only uses OpenCL for the camera capture (v4l), and complains about lack of root privileges. If I run it as root, it does seem to allocate a chunk of GPU memory, but speed is not any faster. Annoyingly, the process hangs upon exit, and does not relinquish the GPU until reboot.
5. It seems to use all 4 CPUs evenly in all cases, but it only reaches about 60% of usage, which leads me to think that there is some other bottleneck at play here.
6. I also noticed that detection times for 128x128 are fairly jittery compared to 80x80. (200-350ms versus 80-100ms). (Now as I understand, feeding forward a neural net should always pretty much take the same time, right?)
7. I tried to play around with setNumThreads now to see if the perf jump is due to the threading overhead. Running things on a single core is in fact only half slower, so the parallelization does have large overhead. However the sweet spot at 80x80 is the same on single core.

Maybe I'll try the Raspi Zero next for a really minimalistic system...

more

Official site

GitHub

Wiki

Documentation

## Stats

Seen: 2,050 times

Last updated: May 25 '18