Hello!
TL;DR I need a function similar to solvePnP(), but that would be able to estimate the pose of a model using information from multiple cameras instead of only one camera
I am trying to find the pose (rotation and translation) of a simple object covered with markers, using n cameras placed around the object.
The pose of each camera is known: I already have a matrix Ci for each camera i such as for a point X=(x,y,z,1) in real world coordinates, Pi*X gives me the coordinates of that point in the camera's coordinate system.
The object I am trying to estimate is composed of m points, and I know the position of each of them in the object's coordinate system.
I am already able to find the object's coordinates in the plane of each cameras.
So if I put all this together, for each point j seen in a camera i I get this:
sij * Pij = Ci * A * Xj
where:
sij is an unknown scalar (it is here because we don't know how far from the camera the point found is) that multiplies the projection of the point j on the camera i (unknown)
Pij is the coordinates of the point j projected on the camera i: (x',y',1)T (known)
Ci is the matrix that describes the rotation and translation of the camera i (known)
A is the matrix I'm looking for, it describes the transformation between the object's coordinate system and real world coordinates (unknown)
Xj is the point j in the object's coordinate system: (x,y,z,1)T (known)
I will typically see 4 different points on 3 different cameras (the 12 points found are all differents), which would give me a set of 12 of those linear systems.
How do I find the matrix A that satisfies the best this set of linear systems ?
This problem looks like something that could be solved using DLT (https://en.wikipedia.org/wiki/Direct_linear_transformation), but I'm not able to transform my systems to fit the form shown on this wikipedia page.
My question is similar to this one : http://answers.opencv.org/question/131660/multi-view-solvepnp-routine/, but the answer there does not solve my problem because it requires that the points used to resolve the pose of the model are seen in multiple cameras.