Welp, the answer is quite simple
Take more images. When using multiple images in multiple poses, a better intrinsic matrix can be calculated.
So two steps :
- First image is the main image and is used to determine the x,y,z axes for all the cameras (so one image visible by all)
- More images for each camera. This will improve the estimation for the intrinsic matrix and in turn will improve the projection of 3D points to the image