1D->2D pose estimation - detect 2D camera location

asked 2020-04-09 07:56:43 -0500

noshky gravatar image

I believe I am off the track. I need to achieve something like a 1D->2D pose estimation, which I was hoping to be able to reduce to the 3D pose estimation problem, to which widely used solutions exist, such as OpenCVs solvePnP() implementations.

What I am trying to solve: My camera is point _along a surface_, i.e. it doesn't sees the surface from the side and sees everything that is on the surface. That means, I already know the z-coordinate which is z=0. I have 2 cameras pointed at the surface and I want to detect objects on the surface by intersecting their projection lines. But for that, I need to know the 2D location of my cameras with respect to the 2D coordinate system of the surface.

The cameras are fixed, so I should be able to determine each of their location by placing objects on the surface in a known setup, and from the object positions on the image, determine the camera location. Although my camera images are 2D, I think I can treat them to be 1D, because the objects will be all on the same y-axis. (Assuming leveled and aligned camera setup).

By that, I believe that the problem I need to solve is somethingl ike a 1D->2D pose estimation.

I can't find a geometrical solution to figuring out the 2D location from a known set of 1D points, so I was trying to use solvePnP(), feeding it only the dimensions I need. I have calibrated my camera to get the camera matrix and distortion vector, and also applied solvePnP() in 3D space - which gave me okay results and showed me that my implementation seems alright.

But when I provide the 1D image points (i.e. points which all have the same y-coordinate), the result is not usable. I was lucky to use once points with slightly different y-coordinates, which actually produced a valid camera location in world coordinates (z was near zero). So I think this is exactly a limitation of the PnP solutions, that they need some assymetry in the image points. Using them as 1D points puts them all on the same line, which can't help the solvers.

But now, what can I do? Shouldn't the 1D-2D pose estimation be a more simple to solve problem than the 2D-3D one? Can anyone help me to think of a geometrical solution or guide me to some other way of interpreting the problem of locating objects on a surface?

Any feedback, hints, discussions are highly appreciated!

Below am providing some sketches to (hopefully) illustrate the setup and how I am trying to solve it with pose estimation.

The world, looking at the surface from the top, and from the side (as the camera).

  • C: The camera, with it's measured real world location around (20,-340).
  • o<N>: object points 1-4. These could be some pins or anything.
  • x/y/z: The world coordinate system, how I define it.
  • FOV ...
(more)
edit retag flag offensive close merge delete

Comments

4 points in a line is a degenerate configuration for PnP pose estimation problem. You are feeding it points on a line and behind it would have to solve for 3D translation + 3D rotation.

What do you want to estimate? Translation tx and ty? Orientation? If the camera image plane is parallel to the surface, probably you should look at affine transformation.

Eduardo gravatar imageEduardo ( 2020-04-09 11:55:32 -0500 )edit

Yeah, that approach just doesn't seem to work. I am seeking for tx/ty - rotation is not needed I believe. Though the camera image plane is orthogonal to the surface, i.e. the camera z-axis is _in/on_ the surface.

noshky gravatar imagenoshky ( 2020-04-09 12:13:56 -0500 )edit

Isn't actually all I want to do triangulation? I could know the angles to my known beacons, and from only 3 known beacons I should be able to determine the exact position. Does that fit with the camera and projection models?

noshky gravatar imagenoshky ( 2020-04-09 12:15:34 -0500 )edit

If you just want tx and ty, a single point should be enough. If the point is centered in the image, tx and ty are zero.

So, use the x and y coordinates, with the camera intrinsics transforms to the camera normalized frame (from pixel to meter). Scale by tz to have the final translation.


Else I did not understand your problem.

Eduardo gravatar imageEduardo ( 2020-04-11 07:52:32 -0500 )edit

Are you suggesting that I can figure out the camera x/y position by viewing one point? That doesn't make sense to me. I want to determine the x/y position of the camera on a surface. Surely, one point isn't enough and neither are 2. Essentially, I believe it comes down to the Snellius–Pothenot problem.

noshky gravatar imagenoshky ( 2020-04-11 08:00:10 -0500 )edit

Only if you don't care about the orientation, and if you already know tz.

Eduardo gravatar imageEduardo ( 2020-04-13 14:51:56 -0500 )edit