Ask Your Question
0

Figure out location of computer screen in world coordinates

asked 2016-04-05 17:59:29 -0600

bfc_opencv gravatar image

I'm using OpenCV and have a ray as a vector in world coordinates. I want to figure out the point in world coordinates where this ray intersects my computer screen. How do I figure out the location of my computer screen in world coordinates?

edit retag flag offensive close merge delete

Comments

Could you add a little more information? Do you mean you are displaying an image and what to know where in the image the ray is? Or is the ray coming from something in front of the screen like an eye? Or what?

Tetragramm gravatar imageTetragramm ( 2016-04-05 18:50:14 -0600 )edit

I have a video of a person and a gaze vector originating from that person's eye. The center of the eye and gaze vector are in 3d world coordinates. I want to figure out where the user is looking, so consequently where the gaze ray intersects with the computer screen..

bfc_opencv gravatar imagebfc_opencv ( 2016-04-05 18:53:28 -0600 )edit

Alright, first step, do you know the coordinates of the corners of the screen in world coordinates? That's step one.

Tetragramm gravatar imageTetragramm ( 2016-04-05 19:34:06 -0600 )edit

No, that's where I'm struggling. I'm trying to figure out how to get those so that I can construct a plane and intersect the gaze vector (or ray) with it. Any ideas/pointers would be much appreciated!

bfc_opencv gravatar imagebfc_opencv ( 2016-04-05 19:50:44 -0600 )edit

Ok, Where is the origin of your world system? The camera? There are a couple of ways to do this. One is to carefully align the camera with the screen and measure the distance to the corners.

If you have another camera, you can do something more precise. Place a chessboard at a forty five degree angle in front of your primary camera and find the transformation from world coordinates to chessboard coordinates. Then use the second camera to take a picture (or pictures) with the first chessboard, and a chessboard pattern on the screen. You can then find the location of the screen in the chessboard coordinates and translate that to world coordinates.

Are you comfortable enough with the transformations to do that?

Tetragramm gravatar imageTetragramm ( 2016-04-05 20:33:19 -0600 )edit

The camera is the origin of my world system. If I'm using a laptop with a built in webcam, the z coordinate of the screen corners would be 0, correct?

For the second method, I'm assuming you're referring to the findChessBoardCorners() function OpenCV has. But I'm confused as to how the second camera's picture would help determine the location of the screen.

bfc_opencv gravatar imagebfc_opencv ( 2016-04-05 22:12:50 -0600 )edit

If you're using a laptop, then yes, you can likely assume zero. At least for the top corners. If the focal plane of the camera is tilted with respect to the screen the bottom corners could be pretty far off.

The calibrateCamera function returns the rvec and tvec of the camera with respect to the chessboard. So once you use multiple cameras and a chessboard in space and on the screen you have Primary Camera -> Chessboard in Space and Secondary Camera -> Chessboard in Space and Secondary Camera -> Chessboard on screen. Just invert the second one, and then apply them all in sequence to get Primary Camera -> Chessboard on Screen.

Tetragramm gravatar imageTetragramm ( 2016-04-05 23:53:29 -0600 )edit

Thanks! Could you just clarify how I would get the bottom corners using the first method? I have the intrinsic camera matrix.

bfc_opencv gravatar imagebfc_opencv ( 2016-04-06 12:13:10 -0600 )edit

Honestly, I can't think of a way without simply carefully measuring.

Wait, I have an idea. Get a mirror and set it up so that the camera is looking straight back into itself. Then show a chessboard on the screen and get the rotation and translation. Then measure the distance from mirror to camera, and you have everything you need. This is a bit hard to hold the mirror in place, but not impossible. Set the laptop in front of the bathroom mirror and carefully arrange the screen.

Tetragramm gravatar imageTetragramm ( 2016-04-06 18:48:51 -0600 )edit

How would getting the rvecs and tvecs help with the screen corners? I'm still a bit confused

One other point came up. I have fx, fy in pixel units. Is there a conversion to mm? I've looked this up but the answers have been inconsistent.

bfc_opencv gravatar imagebfc_opencv ( 2016-04-06 18:56:31 -0600 )edit

1 answer

Sort by » oldest newest most voted
2

answered 2016-06-04 13:37:22 -0600

Tetragramm gravatar image

updated 2016-06-12 00:33:06 -0600

I'm moving this to an answer so I can write longer.

Here is the setup you should be using. With the "real" chessboard being flatter than that one, I just grabbed it. Oh, and the picture on the screen should be shown in full screen mode so the corner of the board and screen match.

image description

The reason you need to do it this way is because when you find the location of the chessboard, it finds the position of the top left corner of the chessboard and the orientation of the plane of the chessboard. Since you are trying to find the top left corner of the screen and the plane of the screen, you need them to match up. So you put the screen chessboard flat on the screen with it's corner in the corner of the screen.

So you have two cameras and two chessboards. You should take a picture from both cameras with both chessboards in the same spot. For better results, move the secondary camera (the one taking this picture) around and average the results.

From these pictures you will get three transformations.

  1. From primary camera to real chessboard.
  2. From secondary camera to real chessboard.
  3. From secondary camera to screen chessboard.

What you want to find is from primary camera to screen chessboard. So you need to reverse the transformation from secondary camera to real chessboard, and now you have the three you need. primary->real->secondary->screen. And from that, obviously, you can get the inverse, if you need it.

EDIT: Code I used.

//Secondary Camera to Space Chessboard
<Snip Calculation>
Rodrigues(rvecs[0], R2);
tvecs[0].copyTo(T2);

//Secondary Camera to Screen Chessboard
Rodrigues(rvecs[0], R3);
R3 = R3.t();
T3 = -R3*tvecs[0];

//Primary Camera to Space Chessboard
flip(image, image, 1);
Rodrigues(rvecs[0], R1);
R1 = R1.t();
T1 = -R1*tvecs[0];

Mat translation(3, 1, CV_64F);
translation.setTo(0);
translation = R3*translation + T3;
translation = R2*translation + T2;
translation = R1*translation + T1;

This gives this, which is a little off, but probably because I only used one image for secondary camera and got bad camera matrix and distortion. The primary camera needs more images too, but I mentioned that below.

[-96.78414539354907;
 -471.6594258771706;
 16.61072172695242]
edit flag offensive delete link more

Comments

Thank you for that detailed answer, I am close and just have two small questions before I can finish this calibration.

  1. findChessBoardCorners() only gets the internal corners of the screen. If the corner of the chessboard and the screen should match, how can I get that top right corner?

  2. Once I've rotated and translated to get the first corner of the screen, what's the most precise way of getting the other 2 corners? I know the length in mm of a side of one of the chessboard squares. Do I divide that by the length in pixels to get a ratio, and then multiply it by 2880? (my screen is 2880x1800)

bfc_opencv gravatar imagebfc_opencv ( 2016-06-05 17:59:01 -0600 )edit

You know the size of a square, subtract one.

And yes, that is how I would find the other corners.

Tetragramm gravatar imageTetragramm ( 2016-06-05 21:07:22 -0600 )edit

Thank you for that, hopefully this is my last question about the rotation and translation.

I'm confused as to how to apply the rvec and tvec to get the transformation. Do I just do (0,0,0)*rvec + tvec. If this is the case, how do I invert the transformation, as I'll need this for something else as well?

After doing research online I realized I may have to use the Rodrigues() function to get the rotation matrix from rvec, augment that with the tvec, and then add a row of (0,0,0,1) to get a 4x4 transformation matrix. This way, I would be able to get the inverse. However, If that is the case, how do I multiply that by (0,0,0)? Am I just supposed to do (0,0,0,1)*(rotm|tvec, 0,0,0,1)?

I'm not sure which of ...(more)

bfc_opencv gravatar imagebfc_opencv ( 2016-06-07 11:36:07 -0600 )edit

Invert like this:

Rodrigues(rvec, R);
R = R.t();
t = (-R * tvec);

Same results, easier to understand and use.

Tetragramm gravatar imageTetragramm ( 2016-06-07 23:38:19 -0600 )edit

I’ve performed the calculations, and even after verifying the math independently, my results are quite far off.

Using a tape measure, I obtained (-165,5,0) as my upper left screen corner (in mm with origin as camera).

If I stay consistent with chessboard units (i.e. passing (0,0,0), (1,0,0) etc)) into calibrateCamera() as objectPoints, feeding (0,0,0) into the transform yields (-17,18,100) as the top left corner.

If I use mm as objectPoints for calibrateCamera, noting that my real chessboard has a square length 50 mm and screen chessboard has length 20 mm, it yields (-32,846,2114) as the top left corner.

This is how I got rvecs, tvecs. Pics

bfc_opencv gravatar imagebfc_opencv ( 2016-06-08 21:19:35 -0600 )edit

Ok, so, step one for debugging. Take each image, find the rvecs and tvecs for each chessboard and use drawAxis from the aruco module to verify that worked.

Secondly, use the code from that tutorial to draw the detected points on the image. Make sure it's finding the right ones, and it's got the right corner as (0,0).

Thirdly, both chessboards need to be in the same units. So if one is in mm, the other needs to be in mm also. Otherwise your coordinates aren't going to match up.

Tetragramm gravatar imageTetragramm ( 2016-06-08 22:28:20 -0600 )edit

I've kept the units consistent and done as you've said. The code finds the chessboard points, and draws the axes correct. Although I'm not sure about the blue one since I'm not familiar with the drawAxis function, maybe you can verify.

Original Pics

Pics of Axes

Pics of Chessboard Points

If you have any idea what might be wrong, or how to proceed, please let me know. Thank you!

bfc_opencv gravatar imagebfc_opencv ( 2016-06-10 00:24:00 -0600 )edit

All right, I see the problem. For your first and second pics of the axes, you can see that the axes are not representing the same thing. They are both in a different place, and rotated differently. So when you combine your transformations, you are assuming a 90 degree rotation and a step to the side that doesn't exist, so of course you get the wrong results.

Two ways to fix it. One: Alter your second picture's rvecs and tvecs so it aligns with the first. This will involve adding PI/2 to one of the rvec elements and 6*square mm to one of the tvecs (I think, may need to rotate that first).

Two: re-arrange the world points so that they line up the same as in the first picture. It's patterned, so it should be pretty easy to re-arrange.

Tetragramm gravatar imageTetragramm ( 2016-06-10 07:46:57 -0600 )edit

I made sure all 3 chessboards had the same axes, but actually rotated the first and third to match the second, as the second had x and y axes that matched my world system x and y, and its origin was closest to the top left corner of the chessboard.

However, my results are still far off from the tape measured (-165,5,0), as I’m getting (294.49, -60.72, 2030.44) after passing (0,0,0) and translating to the top left corner.

Any ideas what might be off? Is it the axes? The z axis of my world system is the opposite of that on the pics, but I don’t think that should affect the results like this. I’ve included updated pics below.

Chessboard Points

Axes

bfc_opencv gravatar imagebfc_opencv ( 2016-06-10 20:56:24 -0600 )edit

Is one of those mirrored? The chessboard on the chair is backwards in the two pictures.

Also, both of the cameras are calibrated with camera matrix and distortion matrix?

Tetragramm gravatar imageTetragramm ( 2016-06-10 23:48:09 -0600 )edit

I believe my laptops webcam mirrored the chessboard on the chair.

And I have the camera matrix and distortion matrix of both cameras via the calibrateCamera() function I used to get the rvecs and tvecs.

bfc_opencv gravatar imagebfc_opencv ( 2016-06-10 23:52:53 -0600 )edit

All right. Well, you need to fix the axes then. They were in the right position, though not the right orientation the first time.

So you're using just the one picture to get the camera and distortion matrix? I would move the chessboard around in front of the cameras and pass multiple sets like the tutorial shows to get a good camera matrix. Then pass that to the calibrateCamera or solvePnPRansac functions to get the rvec and tvec. It's more likely to be accurate if it already has good estimates for focal length and distortion.

Tetragramm gravatar imageTetragramm ( 2016-06-11 11:24:33 -0600 )edit

I fixed the axes on the first (mirrored) chessboard and my results are still far off.

I'm using multiple pics for each setup to get the camera and distoration matrix, I only included the first one for each in the links I showed you. But I'm confused, doesn't the calibrateCamera() function give you the camera matrix and distortion, along with the rvecs and tvecs, all in one go? Or is my logic there wrong?

Right now I'm generating objectPoints based on the mm of each square, then getting imagePoints with findChessBoardCorners(). Then I pass both into calibrateCamera() to get rvecs, tvecs. Then I use Rodgrigues on the rvec to get rot_mat, and then do rot_mat*[0,0,0]T + tvec. I take that result and pass it into the second (inverted) transform, and then the third after. Am I missing something

bfc_opencv gravatar imagebfc_opencv ( 2016-06-11 14:26:13 -0600 )edit

Possibly. Can you upload your pictures and code somewhere so I can take a look at it?

Tetragramm gravatar imageTetragramm ( 2016-06-11 15:25:10 -0600 )edit

Ok. Two things. One, you're inverting the wrong sets of rotations and translations. Invert Primary viewing Space and Secondary viewing Screen. (0,0,0) is the corner of the chessboard, so the rotation and translation is from chessboard space to camera space.

Second. There is not enough information present to get good results on the camera and distortion matrixes for the Primary camera. Move the chessboard around to different spots, different distances, ect.

Lastly, because of how the coordinate systems work, the images from the primary camera need to be mirrored before doing anything else. Otherwise you've got a left handed coordinate system, which is wrong.

The final calculation I got (using only the first image of each set) won't fit here, I'm editing the answer.

Tetragramm gravatar imageTetragramm ( 2016-06-12 00:29:16 -0600 )edit

Thanks for the update! I'll go ahead and try this out and report back, but just a few quick q's about what you said.

  1. If I move the chessboard around with the primary camera, will I have to take new pics with the secondary camera, as the chessboard is stationary in all the secondary camera pics.

  2. Why would the y coordinate be negative in your answer, as the y axis is positive moving down from the camera.

  3. The z coordinate is approximately 16. Would it be the same for all corners? Or should I just assume 0 for z?

bfc_opencv gravatar imagebfc_opencv ( 2016-06-12 00:40:08 -0600 )edit

Just use the new pics to calibrate the camera matrix and distortion matrix. You can use the pics you currently have to do the rest of the calculations.

This is just like the other rvec and tvecs. The chessboard corner is (0,0,0) and this is the rotation and translation from there to the camera. Keep in mind the Z-axes are in more or less opposite directions.

Not necessarily. If the camera is tilted in it's housing they could be different. Change the (0,0,0) you start with to whatever point in the plane of the screen you wish to the relationship to.

Tetragramm gravatar imageTetragramm ( 2016-06-12 01:43:04 -0600 )edit

Were you using mm or chessboard units in your code? I've been using mm since I need the final results in mm, and my results are way off. I even applied all the fixes you mentioned, including changing the order of transforms, more pics for primary camera, and averaging the results produced by the rvec, tvec pairs for the secondary camera. Here's my code and new pics, and clue what's wrong?

Primary cam New pics only used for new cam matrix and distortion.

Secondary cam to chessboard in space

Secondary cam to chessboard on screen

Updated Python code

bfc_opencv gravatar imagebfc_opencv ( 2016-06-12 19:54:39 -0600 )edit

I'm afraid I don't know what's wrong. I'm getting [241, 151, -155]. I'm getting pretty weird results for the Secondary Camera Matrix though. I doubt the Primary Camera Matrix is better. The two sets of pictures from the secondary (Screen and Space) are making a result that doesn't match. When I combine the two sets, I get better results, though still not good: [430, 10, -12]

I suggest doing a very thorough job calibrating the cameras. Set them to record video and move the chessboard around so it occupies every part of the screen at some point. Then find the chessboard and run a combined camera calibration on every frame, several hundred of them. Write the results out so you can just load them. Then use those and solvePnP to get your tvecs and rvecs.

Tetragramm gravatar imageTetragramm ( 2016-06-13 22:50:38 -0600 )edit

So I've been working on this, and I noticed the calibrateCamera function freezes if I try to pass more than about a 100 sets of objectPoints and imagePoints to it. Why is that? Should I just take batches of 100 random frames from the video and average the resulting cameraMatrix and distortion outputs?

bfc_opencv gravatar imagebfc_opencv ( 2016-06-16 21:49:59 -0600 )edit

There's no limit in the code, and last time I used it I passed about 1300 frames. Is there an assert failing? Does one of your frames at about 100 have a bad detection and not have more than 3 points?

You should be able to use the flag CALIB_USE_INTRINSIC_GUESS to use the passed in camera and distortion as a starting point and use the new points to update. So you can try that.

Tetragramm gravatar imageTetragramm ( 2016-06-16 23:28:54 -0600 )edit

I tried using that flag but the code still seems to hang on the calibrateCamera() step. At 150 images its been running for a while now and still no result. At about 750 frames (the length of the full video) I left it running all night and it still was on the same step. For reference, just 100 frames took 6 minutes.

The only thing I saw that might throw it off was that some frames had the chessboard corners found in a rotated order compared to others. Is that an issue?

Also, when I used CALIB_USE_INTRINSIC_GUESS for 100 frames (my guess being the results from the original calibration of 10 images), my results were quite different from when I didn't use the flag. Which is better?

bfc_opencv gravatar imagebfc_opencv ( 2016-06-17 00:11:33 -0600 )edit

Question Tools

2 followers

Stats

Asked: 2016-04-05 17:59:29 -0600

Seen: 1,719 times

Last updated: Jun 12 '16