The images seem already aligned (the camera didn't move?).
I will do the following:
- subtract all images => you get a mask of nothing is moving
- get the bounding box of all moving part (where user is)
- for each bounding box: there is one image with the user and two images without. Copy the bounding box of the user (it's the image where subtraction is not "mostly" zero.
- Invert your mask and copy any of the three image to the final image => aka copy the "background".
I have no prior of the quality attended... but try if there is no illumination changes on your scene, it should works.