# Handling unknown scale in structure from motion initialization

I am trying to establish a structure from motion pipeline, and am struggling a bit regarding the initialization/bootstrapping of such a system, more specifically how to handle ambiguous scale differences between image pairs.

My goal is to generate input information for e.g. cvsba (http://www.uco.es/investiga/grupos/av...), for further refinement. Required input is the following:

- Initial intrinsic/extrinsic calibrations for all cameras.
- Image positions and corresponding initial 3D locations.

For simplicity say I have 3 overlapping images of an object. I know the intrinsic calibration (roughly). Image #1 overlaps Image #2, Image #2 overlaps Image #3. Camera undergoes general motion.

To initialize the system, I have done the following:

- Find (sparse) correspondences between #1 and #2 using keypoint matching.
- Compute essential matrix between #1 and #2 (using findEssentialMatrix(...)).
- Recover extrinsic calibration of camera 2 (assuming camera 1 is at origo using recoverPose(...) ).
- Triangulate all the correspondences using projective matrices of camera #1 and camera #2 (using triangulatePoints(...)

Then I repeat the procedure for the correspondences between #2 and #3. The problem is step 3 as the extrinsic camera translation is only recovered up-to an unknown scale, meaning that the triangulated points from step 4 are not comparable to the 3D points i triangulated in the first iteration. It seems I need to apply a scaling somewhere, but am unsure how approach this.