2019-03-20 04:50:59 -0500 received badge ● Popular Question (source) 2018-05-08 08:33:14 -0500 marked best answer Can OpenCV read images with a different 'byte ordering'? I am working on a highly performance-critical image processing pipeline on a Jetson TX2 (with an ARM processor), which involves reading a set of images and then performing deep learning based object detection through Darknet. Darknet, written in C, has its own representation of how images are stored, which is different from how OpenCV's IplImage or a Python numpy array would store the images. Because of the requirements of my project, I am currently interfacing with this C code by first reading the images through Python (cv2.imread()) and then passing them to Darknet. But I can also call cvLoadImage() from C. The default way OpenCV is storing the data is as one contiguous block arranged column-wise, then row-wise, then channel-wise. On the other hand, the Darknet format needs to be arranged by channel first, then by column, then by row. The following picture illustrates the difference: This means I need to reorder the pixels from the OpenCV format to Darknet format, which is some additional overhead I would like to avoid. Is there a way to make OpenCV read the image differently, either through cv2.imread() or the IplImage format in cvLoadImage, so as to directly get the format Darknet is expecting? 2018-05-05 00:33:34 -0500 asked a question Can OpenCV read images with a different 'byte ordering'? Can OpenCV read images with a different 'byte ordering'? I am working on a highly performance-critical image processing 2018-03-14 18:55:12 -0500 received badge ● Notable Question (source) 2017-12-06 20:23:32 -0500 asked a question How to compute the covariance of an inter-camera relative pose measurement? How to compute the covariance of an inter-camera relative pose measurement? If I'm doing pose estimation using a single 2017-09-29 17:09:31 -0500 received badge ● Popular Question (source) 2017-08-14 08:57:34 -0500 received badge ● Famous Question (source) 2017-01-19 22:35:47 -0500 received badge ● Notable Question (source) 2016-08-23 18:50:32 -0500 asked a question Epipolar geometry: pose estimate detects translation in wrong axis I am trying to use epipolar geometry concepts through the findEssentialMat, recoverPose, solvePnP functions in OpenCV to estimate the relative motion between two images. I am currently using a simulator to get my camera images, so the distortion is zero and I know the camera parameters beforehand. I am using the OpenCV coordinate convention in this question. When two images are taken from positions displaced only along the X axis, essential matrix based estimation works perfectly, I get a translation matrix like [-0.9999, 0.0001, 0.0001]. But when the camera is moved along the forward/backward Z axis, I see the t matrix contains a higher than usual value on the Y axis, whereas X and Z are correct. Example results from solvePnP for these two cases are: Image 1 taken from (0, 0, 0) , image 2 from (-2, 0, 0): t = [-1.988, 0.023, 0.046] (Y and Z are acceptably close to 0) Image 1 taken from (0, 0, 0) , image 2 from (-2, 0, 4): t = [-2.028, -0.441, 4.0983] (Y is totally off) I don't understand where the 0.4m on the Y axis is coming from. The simulator is very accurate in terms of physics and there's absolutely no movement of that magnitude on the Y axis. The same behavior is reflected in the relative R/t output from the essential matrix decomposition. Any tips on solving this/suggestions for further debugging this issue would be helpful. EDIT: I was reading through David Nister's paper about the 5 point algorithm which is the core of findEssentialMat, and it says this: "The 5-point method significantly outperforms the other non-iterative methods for sideways motion. The 5-point results are quite good, while the results of the other non-iterative methods are virtually useless. As the noise grows large, the other methods are swamped and begin placing the epipole somewhere inside the image regardless of its true position. This phenomenon has been observed previously by e.g. [15]. It is particularly marked for the 6 and 8 point methods. For the forward motion cases, the results are quite different however. This is partly due to a slight deterioration of the results for the 5-point method, but mainly due to a vast improvement of the results for the other methods. In particular the 8-point method gives excellent results on forward motion". I wonder if the optimal solution is some kind of a 'best of both worlds' scenario? Has anyone worked on similar problems before, who can share some wisdom? Thanks. 2016-07-22 07:24:28 -0500 received badge ● Popular Question (source) 2016-05-26 04:34:39 -0500 marked best answer How do I cast Point2f to Point2d? I am trying to copy a vector of Point2f's into a vector of Point2d's. I am unable to directly cast the Point2f elements to Point2d, as in:  for (int i = 0 ; i < (int)vector1.size() ; i++) { vector2.push_back((cv::Point2d)vector1[i]); }  The usual std way does not work either. Example: std::vector vector2(vector1.begin(), vector1.end());  2016-02-20 19:07:58 -0500 commented question solvePnP fails in a specific case An update: I think I got confused because I was trying a bunch of different things when I mentioned it in the question, CV_EPNP did work when I tried now with solvePnPRansac. The accuracy is not as good as what CV_ITERATIVE used to give me with other datasets though. I have a Kalman filter running alongside, so I am trying to see if I can tweak it to make the estimates better. I grabbed a few more datasets today of the same scene with my reference at different baselines etc. CV_ITERATIVE still fails, RANSAC or not. If I give it an initial estimate of the extrinsics, it just returns [0 0 0]. POSIT fails too. Right next to where I was, there's a big set of solar panels like a ceiling(picture linked under). If I use that scene, ITERATIVE works again. http://imgur.com/a/FP6Y 2016-02-20 14:27:55 -0500 commented question solvePnP fails in a specific case Thanks for the ideas, Eduardo. I am trying them out and will let you know if I have any luck. Just one omore question, you mentioned that the singular points config. would be a failure case, which I am assuming is due to most of them being coplanar? When I was first developing this pipeline, I was running some tests using a simulator, where I could place models of buildings and look at it through simulated cameras. In that case, the cloud was almost flat, yet PNP worked perfectly even without initial estimates. Although the number of points I had in those point clouds were upwards of a 1000. Do you see anything strange in that case? 2016-02-19 21:21:42 -0500 asked a question solvePnP fails in a specific case I am using OpenCV's solvePnPRansac function to estimate the pose of my camera given a pointcloud made from tracked features. My pipeline starts with forming a point cloud from matched features between two cameras, and use that as a reference to estimate the pose of one of the cameras as it starts moving. I have tested this in multiple settings and it works as long as there are enough features to track while the camera is in motion. Strangely, during a test I did today, I encountered a failure case where solvePnP would just return junk values all the time. What's confusing here is that in this data set, my point cloud is much denser, it's reconstructed pretty accurately from the two views, the tracked number of points (currently visible features vs. features in the point cloud) at any given time was much higher than what I usually have, so theoretically it should have been a breeze for solvePnP, yet it fails terribly. I tried with CV_ITERATIVE, CV_EPNP and even the non RANSAC version of solvePnP. I was just wondering if I am missing something basic here? The scene I am looking at can be seen in these images (image 1 is the scene and feature matches between two perspectives, image 2 is the point cloud for reference) 2016-02-15 11:17:56 -0500 asked a question Is it possible to correct for changing exposure/contrast etc in images? I am trying to do feature detection and matching between two cameras for my application. Feature matching works perfectly when the cameras are closer to each other, but when the positions change such that the auto exposure of one of the cameras makes its images darker/lighter compared to the other, matching fails. I am trying to figure out if that's happening because of too much difference in the intensities (I believe). Is it possible to somehow 'normalize' the images with respect to each other so that this difference is eliminated and I can check if matching works then? Here's a pair of images where matching works: (blue circles are matched features) And a failure case: (notice left image darker than right because the camera moved into a different area) 2015-12-22 15:34:27 -0500 commented question OpenCV Kalman Filter used for position/orientation tracking: wrong results There's no transformation after the solvePnP: I use the T matrix directly and R converted to Euler angles. But my case is slightly different from the classic solvePnP: Neither the camera nor the object is really the origin, I am "localizing" this camera with respect to another main camera (that's the origin) in the scene. 2015-12-21 22:12:18 -0500 asked a question OpenCV Kalman Filter used for position/orientation tracking: wrong results Hi, I am following the Kalman Filter tutorial mentioned in the Real time pose estimation tutorial on OpenCV 2.4.11, but in my case, it is the camera that moves with respect to a certain origin in the world. When I solve the iterative PNP problem for my scene, I get R and T matrices that looks quite good in terms of accuracy. But when I pass them on to the Kalman Filter (transition and measurement matrices were used as mentioned in the tutorial), the prediction matrix comes out to be all zeros, and the corrected R matrix is very close to the input R matrix, and the corrected T matrix almost in all cases looks something like this. [4.7072399e+35; -0.33483902; -5.4207187e-33]  Whereas the input T from PNP is [-196.0274563197073; 2.626786046055864; 0.9834547400622746]  Am I missing some sort of change I need to make from the tutorial? Because it is still a pose estimation problem, I figured I did not have to change much code-wise: although my process noise and measurement noise would be different, I just wanted to test for some initial idea of how the KF is working. Any suggestions/pointers would be very helpful. 2015-12-11 23:31:23 -0500 asked a question OpenCV findEssentialMat() and recoverPose() sign convention? If I have two images L and R taken from left and right cameras, respectively, and I am calling the essential matrix function as follows: E = findEssentialMat(rightImagePoints, leftImagePoints, ...), and then subsequently recoverPose(E, rightImagePoints, leftImagePoints).. The sign in the translation vector I am getting in the end is not [-1, 0, 0] as it should be (because I am calculating pose of camera L from camera R) but [1, 0, 0].. can anyone explain why to me please? I naturally assumed it would be looking for a chirality between points1 and points2 if the functions are passed points in the order points1, points2. Which camera is being considered as origin in my function call? 2015-12-01 02:23:13 -0500 asked a question SolvePNP consistently gives completely wrong results Hello, I am trying to solve a pose estimation problem for a multiple camera system where one camera acts as the origin using OpenCV. So let's assume a case where camera 1 is at (0,0,0) and camera 2 exactly to the left of it by 4 meters; I build the 3D model of the scene using this metric information. Now camera 2 has displaced to the left by another 4 m, so the total is now 8 m, but we consider the change unknown to the code. Now I am running solvePnP on the new set of image points and the previously reconstructed 3D scene in the hopes of getting 8m as the answer: because the triangulation is done with P1 = [I|0], everything that opencv does should be with respect to camera 1, but I am getting extremely bad results. (please note that units are in cm) CV_EPNP: Translation matrix is [223.1857790065309; -45.44370476199799; 357.0838060856186] CV_ITERATIVE: Translation matrix is [142.2985654354176; 1001.722097074433; -6323.072276118809] CV_ITERATIVE RANSAC: Translation matrix is [0; 0; 0]  According to essential matrix based estimation, the translation vector turns up as [-0.9999999, 0.000001, 0.000002] etc. so the triangulation is being done properly. Also if I compare the scale of the reconstructed scene in both cases between the two cameras, the ratio of the scale factor is being computed as 0.5, so it should be obvious to OpenCV that the cameras' baseline has doubled. I have checked and rechecked my code and I am at my wits' end, I was hoping someone can throw some light on what might be happening here as I am clearly missing something. I am posting a snippet of my code here for reference. Thanks a lot for your time! CODE: Assume trainOld and trainNew are from camera 1 (static) from time instant 1 and 2, queryOld and queryNew are from camera 2 that's moving. I take trainNew into account and check for what points are still visible from the old set, and run PnP on the 3D points that are still visible and their corresponding projections in the camera frame. pointsObj[0] is the list of 3D points from time instant 1.  for(unsigned int i = 0; i < (unsigned int)trainOld.size(); i++) { vector::iterator iter = find(trainNew.begin(), trainNew.end(), trainOld[i]); size_t index = distance(trainNew.begin(), iter); if(index != trainNew.size()) { P3D.push_back(pointsObj[0][i]); P2D.push_back(queryNew[index]); } } Mat R, t; solvePnP(P3D, P2D, K, d, rvec, tvec, false, CV_ITERATIVE);  2015-11-22 00:49:58 -0500 asked a question How can I calculate positions of two moving cameras looking at the same scene if the initial baseline is known? Let's say I have two cameras adhering to the assumption that at any given moment, there is a finite number of features visible from both. Both of them start off at a certain baseline b between them, displaced in X (opencv convention). And then they are free to move in X, Y, Z and rotate about all axes, but there are features visible from both. Given that I know b, my final goal is to accurately calculate the subsequent positions of both cameras in real time (although for now, I don't need realtime performance) I have two approaches in mind for this: Approach 1: Triangulate the first pair of images (image0_L, image0_R), and pick a pair of points in the 3D set. Calculate the distance between those points and store it as d1. After the cameras have moved a little, we obtain image1_L, image1_R. Triangulate image1_L and image0_R and track that pair of points again and calculate the new distance d2. Dividing d1/d2 should tell us how much the camera1 has moved with respect to camera2. Repeat with image1_R and image0_L for getting camera2's position w.r.t. camera 1. This is pretty computationally expensive, and also feels like a very naive implementation. Approach 2: Construct the 3D point cloud from image0_L and image0_R. Match image1_L and image0_R to see how many features from image0_R are still visible. Take only those points that correlate to these "tracked, still visible" features from the original 3D point cloud, use those object points as reference and run PNP algorithm on the image points from image1_R and image1_L to obtain positions of both cameras. This is not as computationally intensive as approach 1, but it's giving me very erratic and unreliable results. I am not really knowledgeable at the theoretical aspects of CV theory, so I feel I am failing to spot a more elegant solution for this. Any suggestions or comments would be very helpful, thanks! 2015-11-19 22:10:33 -0500 commented question Pose estimation using PNP: Strange wrong results Could be, but it happens with every two pairs of images, which is strange. Anyway, I am appending that particular piece of my code to the question. 2015-11-18 12:12:46 -0500 commented question Pose estimation using PNP: Strange wrong results I've tried displaying the feature points, epipolar lines on the images and triangulated points through PCL and they all make sense. I've also noticed something curious here: let's say I have X number of 3D points from a correspondence, and if I run PNP on these X points and the corresponding image points, I get the correct answer. But if I take a subset of these X points (i.e., tracking only those points that are visible from the next pair of images) and then run PNP on those and the corresponding 2D points, I get this junk value. Another point is that if I let's say, track two 3D points in both case 1 and case 2, compute the distance between them in both cases and take the ratio of those distances, that ratio is in accordance with the change in baseline too. 2015-11-17 08:08:11 -0500 commented question Pose estimation using PNP: Strange wrong results Sorry, the 1,1,1 and 1,-1,1 are just the positions in the simulator world which I mentioned for clarity. Opencv does not use these values, and the projection matrices are computed using the R and t obtained through decomposing the essential matrix. 2015-11-16 21:27:48 -0500 asked a question Pose estimation using PNP: Strange wrong results Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts. So to start off, this is my first pair of images. I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (-1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get. R in euler angles: [-0.00462468, -0.0277675, 0.0017928] t matrix: [-0.999999598978524; -0.0002907901840156801; -0.0008470441900959029]  Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m). Now the R and t of left w.r.t. right become: R in euler angles: [0.0311084, -0.00627169, 0.00125991] t matrix: [-0.894611301085138; -0.4468450866008623; -0.0002975759140359637]  Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates. So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a similar and completely wrong result that looks like this: Pose according to final PNP calculation is: Rotation euler angles: [-9.68578, 15.922, -2.9001] Metric translation in m: [-1.944911461358863; 0.11026997013253; 0.6083336931263812]  Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks! EDIT: Code for PNP implementation Let's say pair 1 consists of queryImg1 and trainImg1; and pair 2 consists of queryImg2 and trainImg2 (2d vectors of points). Triangulation with pair 1 results in a vector of 3D points points3D. Iterate through trainImg1 and see if the same point exists in trainImg2 (because that camera does not move) If the same feature is tracked in trainImg2, find the corresponding match from queryImg2. Form vectors P3D_tracked (subset of tracked 3D points), P2D_tracked (subset of tracked 2d points). for(int i = 0; i < (int)trainImg1.size(); i++) { vector::iterator iter = find(trainImg2.begin(), trainImg2.end(), trainImg1[i]); size_t index = distance(trainImg2.begin(), iter ... 2015-08-29 19:06:44 -0500 received badge ● Critic (source) 2015-08-28 16:47:00 -0500 asked a question Is it possible to compute scale factor of subsequent stereo reconstructions if initial scale is known? I am trying to do relative pose estimation in a stereo setup using openCV, where I compute pose using the essential matrix and then try to reconstruct the scene using cv::triangulatePoints. As this gives me reconstruction up to an arbitrary scale, I thought I could calculate the subsequent scale factors if the initial baseline (~ scale) is known (and the cameras are purely translated in X). But even if the cameras do not move, and the scene does not change, if I take multiple pairs of images, each of the reconstruction is coming up with a different scale: this I know by taking two points and comparing the distances between them. The whole scene is either magnified or shrunk, the reconstruction itself is not wrong, per se. To get around this, I tried something naive: I got the ratio of scale change between pair1 and pair2, and tried this: 1. Say distance between 3d points A and B in pair 2 = s*distance between same points in pair 1, 2. Rescale whole scene obtained from pair2 by s. 3. Use perspective n point algorithm (solvePnP in opencv) with these rescaled 3d points and 2d correspondences from pair 2.  But the answer I am getting is s*t1, where t1 is the translation in the first case. This is confusing me greatly. Because the scene has not changed by much at all, the 2d point correspondences between 1 and 2 have not changed by much at all. So is it still impossible to determine the scale of a reconstruction even though I know the initial "scale" for sure (i thought this would be straightforward)? Why is the reconstruction giving me arbitrary results at every step, and how does PnP magically know that I have done this whole rescaling? 2015-08-24 00:58:58 -0500 asked a question Strange issue with stereo triangulation: TWO valid solutions I am currently using OpenCV for a pose estimation related work, in which I am triangulating points between pairs for reconstruction and scale factor estimation. I have encountered a strange issue while working on this, especially in the opencv functions recoverPose() and triangulatePoints(). Say I have camera 1 and camera 2, spaced apart in X, with cam1 at (0,0,0) and cam2 to the right of it (positive X). I have two arrays points1 and points2 that are the matched features between the two images. According to the OpenCV documentation and code, I have noted two points: recoverPose() assumes that points1 belong to the camera at (0,0,0). triangulatePoints() is called twice: one from recoverPose() to tell us which of the four R/t combinations is valid, and then again from my code, and the documentation says: cv::triangulatePoints(P1, P2, points1, points2, points3D) : points1 -> P1 and points2 -> P2.  Hence, as in the case of recoverPose(), it is safe to assume that P1 is [I|0] and P2 is [R|t]. What I actually found: It doesn't work that way. Although my camera1 is at 0,0,0 and camera2 is at 1,0,0 (1 being up to scale), the only correct configuration is obtained if I run recoverPose(E, points2, points1...) triangulatePoints([I|0], [R|t], points2, points1, pts3D)  which should be incorrect, because points2 is the set from R|t, not points1. I tested an image pair of a scene in my room where there are three noticeable objects after triangulation: a monitor and two posters on the wall behind it. Here are the point clouds resulting from the triangulation (excuse the MS Paint) If I do it the OpenCV's prescribed way: (poster points dispersed in space, weird looking result) If I do it my (wrong?) way: Can anybody share their views about what's going on here? Technically, both solutions are valid because all points fall in front of both cameras: and I didn't know what to pick until I rendered it as a pointcloud. Am I doing something wrong, or is it an error in the documentation? I am not that knowledgeable about the theory of computer vision so it might be possible I am missing something fundamental here. Thanks for your time! 2015-08-23 21:55:03 -0500 asked a question Question about coordinate system in opencv's five-point.cpp Hi, I am trying to use the functions from five-point.cpp: the findEssentialMat and recoverPose() functions. I have a very fundamental question about them. I have image1 from a location and image2 taken from some distance to its right, i.e., displaced only in positive X. After feature detection and matching, I have two sets of points: points1, points2. I initially thought because I need the pose of camera providing points2 with respect to the camera providing points1, my arguments should be:  Mat E = findEssentialMat(pts2, pts1, focal1, focal2, pp1, pp2, CV_RANSAC, 0.99, 1, mask); recoverPose(E, pts2, pts1, rvec, tvec, focal1, focal2, pp1, pp2, mask);  This gave me the expected answers [1, 0, 0]. But then I noticed that inside recoverpose(), while checking for triangulation and making sure all points are in front of the camera, the code is assuming that the projection matrix of the camera providing the first argument of points is [I | 0], so what I had done should be wrong: so I swapped the arguments from pts2, pts1 to pts1, pts2: now the result became [-1 0 0]. If camera1 was being considered as the origin, why is recoverPose's triangulation thinking having the second camera to the left is the correct configuration? And why is incorrectly letting camera2 be the origin giving me the result I expected? The documentation clearly states whenever triangulatePoints(P1, P2, points1, points2, points3D) is called, P1 is for points1 and P2 is for points2. 2015-08-23 18:41:56 -0500 asked a question cv::TriangulatePoints drifting for points from same pose Hi, I am encountering a strange issue with the triangulation function of openCV. For my two-view pose estimation tests, I was doing this: Case 1: A single camera taking pictures of a scene from two views, then computing the essential matrix and triangulating the scene: which worked reasonably well. Case 2: I am trying to do this with two cameras (both calibrated), but triangulation is failing. For image pairs from the same poses, the triangulated points should be more or less the same but now, they are drifting. Example: t1 = [0.9950924870426178, 0.0518147862900856, -0.08429810290303305] (cam 2 at 0, 0, 0) Homogenized feature match: (-0.1558, -0.1504, 1); (-0.0574, -0.1619, 1); Triangulated point: [-4.8177, -4.6760, 30.92588] t2 = [0.9962326376676536, 0.05676863562110455, -0.08508433760191333] (approx. the same as before) Homogenized feature match: (-0.1557, -0.1504, 1); (-0.0574, -0.1619, 1); Triangulated point: [-2.3599, -2.3053, 15.1603]  What am I doing wrong here? For the next set of points, triangulatePoints comes up with another weird estimate. 2015-08-10 21:36:17 -0500 edited question findEssentialMat() pose estimation: wrong translation vector in some cases I am currently using epipolar geometry based pose estimation for estimating pose of one camera w.r.t another, with non-zero baseline between the cameras. I am using the five-point algorithm (implemented as findEssentialMat in opencv) to determine the up-to-scale translation and rotation matrix between the two cameras. I have found two interesting problems when working with this, it would be great if someone can share their views: I don't have a great theoretical background in computer vision: If the rotation of the camera is along the Z axis i.e., parallel to the scene and the translation is non-zero, the translation between camera1 and camera2 (which is along X in the real world) is wrongly estimated to be along Z. Example case: cam1 and cam2 spaced by approx 0.5 m on the X axis, cam2 rotated clockwise by 45 deg. Image pair Output: Translation vector is [-0.02513, 0.0686, 0.9973] (wrong, should be along X) Rotation Euler angles: [-7.71364, 6.0731, -43.7583] (correct)  The geometry between image1 and image2 is not the exact inverse of the geometry between image2 and image1. Again while an image1<=>image2 correspondence produces the correct translation, image2<=>image1 is way off (rotation values are close though). Example below, where camera2 was displaced along X and rotated for ~30 degrees along Y Image 1 to image 2 Output: Rotation [-1.578, 24.94, -0.1631] (Close) Translation [-0.0404, 0.035, 0.998] (Wrong) Image 2 to image 1 Output: Rotation [2.82943, -30.3206, -3.32636] Translation [0.99366, -0.0513, -0.0999] (Correct) Looks like it has no issues figuring the rotations out but the translations are a hit or miss. As to question 1, I was initially concerned because because the rotation is along the Z axis, the points might appear to be all coplanar. But the five point algorithm paper particularly states: "The 5-point method is essentially unaffected by the planar degeneracy and still works". Thank you for your time!