Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

So looking for a solution on the Web I found that what I wish to do is called extrinsic calibration.

To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:

  • Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
  • Easier explanation: http://nghiaho.com/?page_id=671
  • Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_

So, to transform 3D points from the camera reference frame, do the following:

1 - Define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.

2 - Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.

3 - From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.

Then, for any 3D point in the camera reference frame p, as a 3x1 vector, you can obtain the corresponding world point, q with q = R.dot(p)+t.


THE PROBLEM WITH THIS SOLUTION:

In the first step, it only works if I have a standard for the location of the points (for example using a rectangle of known dimentions I would be able to find the correct world coordinates). Now assume that those points are chosen by a user by clicking on the 2D original Depth Image. For those points I would only know the local coordinates, not the global ones. So, how will I know the corresponding 3D world coordinates? Is there a way to say "I got some 3D camera coordinates that belong to the ground plane, find me the rotation matrix that projects those points to an horizontal plane"

So looking for a solution on the Web I found that what I wish to do is called extrinsic calibration.

To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:

  • Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
  • Easier explanation: http://nghiaho.com/?page_id=671
  • Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_

So, to transform 3D points from the camera reference frame, do the following:

1 - Define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.

2 - Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.

3 - From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.

Then, for any 3D point in the camera reference frame p, as a 3x1 vector, you can obtain the corresponding world point, q with q = R.dot(p)+t.


THE PROBLEM WITH THIS SOLUTION:

In the first step, it only works if I have a standard for the location of the points (for example using a rectangle of known dimentions I would be able to find the correct world coordinates). Now assume that those points are chosen by a user by clicking on the 2D original Depth Image. For those points I would only know the local coordinates, not the global ones. So, how will I know the corresponding 3D world coordinates? Is there a way to say "I got some 3D camera coordinates that belong to the ground plane, find me the rotation matrix that projects those points to an horizontal plane"

THE SOLUTION: (taken from an answer to my question at https://stackoverflow.com/questions/57820572/find-the-transformation-matrix-that-maps-3d-local-coordinates-to-global-coordina)

  1. Take the selected 3D points in camera reference frame, let's call them q'i.
  2. Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
  3. As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
  4. At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
  5. Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the
    camera optical axis. For this:
  6. Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
  7. o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi (cross product). j is the Y-axis and we are almost finished.
  8. Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
  9. Use these values of pi and qi to estimate R and t, as in the first part of the answer.

So looking for a solution on the Web I found that what I wish to do is called extrinsic calibration.

To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:

  • Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
  • Easier explanation: http://nghiaho.com/?page_id=671
  • Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_

So, to transform 3D points from the camera reference frame, do the following:

1 -

  1. Define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.

    2 -

  2. Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.

    3 -

  3. From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.

Then, for any 3D point in the camera reference frame p, as a 3x1 vector, you can obtain the corresponding world point, q with q = R.dot(p)+t.


THE PROBLEM WITH THIS SOLUTION:

In the first step, it only works if I have a standard for the location of the points (for example using a rectangle of known dimentions I would be able to find the correct world coordinates). Now assume that those points are chosen by a user by clicking on the 2D original Depth Image. For those points I would only know the local coordinates, not the global ones. So, how will I know the corresponding 3D world coordinates? Is there a way to say "I got some 3D camera coordinates that belong to the ground plane, find me the rotation matrix that projects those points to an horizontal plane"

THE SOLUTION: (taken from an answer to my question at https://stackoverflow.com/questions/57820572/find-the-transformation-matrix-that-maps-3d-local-coordinates-to-global-coordina)

  1. Take the selected 3D points in camera reference frame, let's call them q'i.
  2. Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
  3. As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
  4. At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
  5. Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the
    camera optical axis. For this:
  6. Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
  7. o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi (cross product). j is the Y-axis and we are almost finished.
  8. Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
  9. Use these values of pi and qi to estimate R and t, as in the first part of the answer.

So looking for a solution on the Web I found that what I wish to do is called extrinsic calibration.

To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:

  • Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
  • Easier explanation: http://nghiaho.com/?page_id=671
  • Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_

So, to transform 3D points from the camera reference frame, do the following:

  1. Define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
  2. Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
  3. From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.

Then, for any 3D point in the camera reference frame p, as a 3x1 vector, you can obtain the corresponding world point, q with q = R.dot(p)+t.


THE PROBLEM WITH THIS SOLUTION:

In the first step, it only works if I have a standard for the location of the points (for example using a rectangle of known dimentions I would be able to find the correct world coordinates). Now assume that those points are chosen by a user by clicking on the 2D original Depth Image. For those points I would only know the local coordinates, not the global ones. So, how will I know the corresponding 3D world coordinates? Is there a way to say "I got some 3D camera coordinates that belong to the ground plane, find me the rotation matrix that projects those points to an horizontal plane"

THE RESPECTIVE SOLUTION: (taken from an answer to my question at https://stackoverflow.com/questions/57820572/find-the-transformation-matrix-that-maps-3d-local-coordinates-to-global-coordina)

  1. Take the selected 3D points in camera reference frame, let's call them q'i.
  2. Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'i.
  3. As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points qi.
  4. At this point you have a set of 3D points, qi, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
  5. Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the
    camera optical axis. For this:
  6. Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
  7. o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi (cross product). j is the Y-axis and we are almost finished.
  8. Now, obtain the X-Y coordinates of the points qi in the world reference frame, by projecting them on i and j. That is, do the dot product between each qi and i to get the X values and the dot product between each qi and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates pi.
  9. Use these values of pi and qi to estimate R and t, as in the first part of the answer.