After some research, I think I came to answer by myself. There are three elements that I didn't get right.
- Tag size and numbers: As pointed out by Eduardo the tags in my image were too small and too few were detected. I took new captures with charts closer to the camera. I ended up with at least 50 tags detected per images and 20 different chart orientations. It helps a little bit, but the calibration results were not that different from the one I got previously.
- Image rectification: What was concerning me the most in my question was the rectification between camera 1->2 which was looking completely wrong. Initially, I thought this wrong rectification was caused by an imprecise calibration but it wasn't. Since rectification between 1->3 was not so bad I did a little bit of research and found out that my rectification of 1->2 is probably not working because my epipolar pole is located inside my images as shown in the image below. I computed the epipolar lines on my image and indeed my epipole is in my image.
- Pose: I assumed the pose between my two cameras to be the identity. However, this is not totally the case, after computing the intrinsic I found the yaw, pitch, and roll of
[-4.81677474, 3.30338745, 1.37281632]
(degree) between camera 1->3. This explains why I found a tvec
with non-zeros value on Y and Z axis between my camera 1->3.