Bottleneck on dnn network .forward()
Hi,
I’m doing object detection using the dnn module and OpenCV 3.3.
I’m getting 3fps on an ARM board (ssd + mobilenet), but I can’t figure out what’s the bottleneck. Here are my observations (same results for python and C++):
the board has 4 little and 4 big cores, but max FPS is achieved when running only on the big cores (using taskset). Making use of all 8 cores by increasing manually the thread number or allowing it to run on its own threads/cores combination, results in worse results
ram usage is not a problem
when using the 4 big cores, cpu usage doesn’t go more than 300% (out of 400%) with no core getting above 80% (out of 100%)
getting frames from webcam is not an issue, I tried grabbing in different thread but there was no change in results
So, any idea how can I tune my code to get 100% cpu usage? What can the bottleneck be? Memory speed? CPU cache? Here’s my board’s stats when running: https://m.imgur.com/a/D9tdp
Thanks.