Rectification accomplishes the task of rotating and translating left and right images into epiolar geometry.
Recall that with epipolar geometry corresponding features of real objects appear on the same row of the left and right image. A feature will show a disparity offset in the two camera views proportional to baseline length and inversely proportional to distance. (Note: I'm not exactly sure where cross-eyed camera correction is done, but it is accounted for in the stereo/extrinsic camera R and T from the stereo calibration step).
Rectification also includes translation of an image in the axis normal to the baseline between the two cameras (up and down for a left/right camera pair).
If rectification were to translate one of the camera images along the baseline axis, then camera's rotation and translation vectors determined during multi-camera stereo or extrinsic calibration would no longer be valid for that image. Nor would the camera intrinsics matrix, distortion coefficients or projection matrix.
Furthermore, translating a rectified image does not increase or decrease the volume overlap between the two cameras. The math and geometry is simpler if one does not complicate things with an unnecessary translation.