What is meant by "scale" in case of monocular visual odometry?

The following are 2 reasons why monocular visual odometry is not the best option:

1) Results from monocular sequences can only be recovered up to a 'scale'; without additional information, absolute measurements are not possible.src

2) All pose estimates from a mono VO algorithm are relative to some unknown 'scaling factor'.src

Can someone please explain what exactly is meant by the 'scaling factor' and how does it affect visual odometry?