When implementing monocular SLAM or Structure from Motion using single camera, translation can be estimated up to unknown scale. It is proven that without any other external information, this scale can not be determined. However, my question: How to unify this scale in all sub translations. For example, if we have 3 frame (Frame0, Frame1 & Frame2), we applied tracking as follow:
- Frame0 -> Frame 1 : R01, T01 (R&T can be extracted using F Matrix and K matrix and Essential Matrix decompostion)
- Frame 1-> Frame 2 : R12, T12
The problem is T01 & T12 are normalized so their magnitude is 1. However, in real, T01 magnitude may be twice as T12.
How can I recover the Relative magnitude between T01 and T12?
P.S. I do not want to know what is exactly T01 or T12. I just want to know that |T01| = 2 * |T12|.
I think it is possible because Monocular SLAM or SFM algorithms are already exists and working well. So, there should be some way to do this.