1 | initial version |
This issue is known for any object detection algorithm out there. Although not always visible for the human eye, between two consecutive frames there is the possibility of illumination differences, pixel shifting, noise, ... which can slightly change where the face is finally found and located.
Only way to solve this problem is by implementing some temporal smoothing, where you average out the face location over time (a number of frames). This can be done by adding a tracker for example and putting the weight of the new detection in relation to the shift of the frame compared to the previous one.