20190320 04:50:59 0500  received badge  ● Popular Question (source) 
20180508 08:33:14 0500  marked best answer  Can OpenCV read images with a different 'byte ordering'? I am working on a highly performancecritical image processing pipeline on a Jetson TX2 (with an ARM processor), which involves reading a set of images and then performing deep learning based object detection through Darknet. Darknet, written in C, has its own representation of how images are stored, which is different from how OpenCV's IplImage or a Python numpy array would store the images. Because of the requirements of my project, I am currently interfacing with this C code by first reading the images through Python (cv2.imread()) and then passing them to Darknet. But I can also call cvLoadImage() from C. The default way OpenCV is storing the data is as one contiguous block arranged columnwise, then rowwise, then channelwise. On the other hand, the Darknet format needs to be arranged by channel first, then by column, then by row. The following picture illustrates the difference: This means I need to reorder the pixels from the OpenCV format to Darknet format, which is some additional overhead I would like to avoid. Is there a way to make OpenCV read the image differently, either through cv2.imread() or the IplImage format in cvLoadImage, so as to directly get the format Darknet is expecting? 
20180505 00:33:34 0500  asked a question  Can OpenCV read images with a different 'byte ordering'? Can OpenCV read images with a different 'byte ordering'? I am working on a highly performancecritical image processing 
20180314 18:55:12 0500  received badge  ● Notable Question (source) 
20171206 20:23:32 0500  asked a question  How to compute the covariance of an intercamera relative pose measurement? How to compute the covariance of an intercamera relative pose measurement? If I'm doing pose estimation using a single 
20170929 17:09:31 0500  received badge  ● Popular Question (source) 
20170814 08:57:34 0500  received badge  ● Famous Question (source) 
20170119 22:35:47 0500  received badge  ● Notable Question (source) 
20160823 18:50:32 0500  asked a question  Epipolar geometry: pose estimate detects translation in wrong axis I am trying to use epipolar geometry concepts through the findEssentialMat, recoverPose, solvePnP functions in OpenCV to estimate the relative motion between two images. I am currently using a simulator to get my camera images, so the distortion is zero and I know the camera parameters beforehand. I am using the OpenCV coordinate convention in this question. When two images are taken from positions displaced only along the X axis, essential matrix based estimation works perfectly, I get a translation matrix like [0.9999, 0.0001, 0.0001]. But when the camera is moved along the forward/backward Z axis, I see the t matrix contains a higher than usual value on the Y axis, whereas X and Z are correct. Example results from solvePnP for these two cases are:
I don't understand where the 0.4m on the Y axis is coming from. The simulator is very accurate in terms of physics and there's absolutely no movement of that magnitude on the Y axis. The same behavior is reflected in the relative R/t output from the essential matrix decomposition. Any tips on solving this/suggestions for further debugging this issue would be helpful. EDIT: I was reading through David Nister's paper about the 5 point algorithm which is the core of findEssentialMat, and it says this: "The 5point method significantly outperforms the other noniterative methods for sideways motion. The 5point results are quite good, while the results of the other noniterative methods are virtually useless. As the noise grows large, the other methods are swamped and begin placing the epipole somewhere inside the image regardless of its true position. This phenomenon has been observed previously by e.g. [15]. It is particularly marked for the 6 and 8 point methods. For the forward motion cases, the results are quite different however. This is partly due to a slight deterioration of the results for the 5point method, but mainly due to a vast improvement of the results for the other methods. In particular the 8point method gives excellent results on forward motion". I wonder if the optimal solution is some kind of a 'best of both worlds' scenario? Has anyone worked on similar problems before, who can share some wisdom? Thanks. 
20160722 07:24:28 0500  received badge  ● Popular Question (source) 
20160526 04:34:39 0500  marked best answer  How do I cast Point2f to Point2d? I am trying to copy a vector of Point2f's into a vector of Point2d's. I am unable to directly cast the Point2f elements to Point2d, as in: The usual std way does not work either. 
20160220 19:07:58 0500  commented question  solvePnP fails in a specific case An update:

20160220 14:27:55 0500  commented question  solvePnP fails in a specific case Thanks for the ideas, Eduardo. I am trying them out and will let you know if I have any luck. Just one omore question, you mentioned that the singular points config. would be a failure case, which I am assuming is due to most of them being coplanar? When I was first developing this pipeline, I was running some tests using a simulator, where I could place models of buildings and look at it through simulated cameras. In that case, the cloud was almost flat, yet PNP worked perfectly even without initial estimates. Although the number of points I had in those point clouds were upwards of a 1000. Do you see anything strange in that case? 
20160219 21:21:42 0500  asked a question  solvePnP fails in a specific case I am using OpenCV's solvePnPRansac function to estimate the pose of my camera given a pointcloud made from tracked features. My pipeline starts with forming a point cloud from matched features between two cameras, and use that as a reference to estimate the pose of one of the cameras as it starts moving. I have tested this in multiple settings and it works as long as there are enough features to track while the camera is in motion. Strangely, during a test I did today, I encountered a failure case where solvePnP would just return junk values all the time. What's confusing here is that in this data set, my point cloud is much denser, it's reconstructed pretty accurately from the two views, the tracked number of points (currently visible features vs. features in the point cloud) at any given time was much higher than what I usually have, so theoretically it should have been a breeze for solvePnP, yet it fails terribly. I tried with CV_ITERATIVE, CV_EPNP and even the non RANSAC version of solvePnP. I was just wondering if I am missing something basic here? The scene I am looking at can be seen in these images (image 1 is the scene and feature matches between two perspectives, image 2 is the point cloud for reference) 
20160215 11:17:56 0500  asked a question  Is it possible to correct for changing exposure/contrast etc in images? I am trying to do feature detection and matching between two cameras for my application. Feature matching works perfectly when the cameras are closer to each other, but when the positions change such that the auto exposure of one of the cameras makes its images darker/lighter compared to the other, matching fails. I am trying to figure out if that's happening because of too much difference in the intensities (I believe). Is it possible to somehow 'normalize' the images with respect to each other so that this difference is eliminated and I can check if matching works then? Here's a pair of images where matching works: (blue circles are matched features) And a failure case: (notice left image darker than right because the camera moved into a different area) 
20151222 15:34:27 0500  commented question  OpenCV Kalman Filter used for position/orientation tracking: wrong results There's no transformation after the solvePnP: I use the T matrix directly and R converted to Euler angles. But my case is slightly different from the classic solvePnP: Neither the camera nor the object is really the origin, I am "localizing" this camera with respect to another main camera (that's the origin) in the scene. 
20151221 22:12:18 0500  asked a question  OpenCV Kalman Filter used for position/orientation tracking: wrong results Hi, I am following the Kalman Filter tutorial mentioned in the Real time pose estimation tutorial on OpenCV 2.4.11, but in my case, it is the camera that moves with respect to a certain origin in the world. When I solve the iterative PNP problem for my scene, I get R and T matrices that looks quite good in terms of accuracy. But when I pass them on to the Kalman Filter (transition and measurement matrices were used as mentioned in the tutorial), the prediction matrix comes out to be all zeros, and the corrected R matrix is very close to the input R matrix, and the corrected T matrix almost in all cases looks something like this. Whereas the input T from PNP is Am I missing some sort of change I need to make from the tutorial? Because it is still a pose estimation problem, I figured I did not have to change much codewise: although my process noise and measurement noise would be different, I just wanted to test for some initial idea of how the KF is working. Any suggestions/pointers would be very helpful. 
20151211 23:31:23 0500  asked a question  OpenCV findEssentialMat() and recoverPose() sign convention? If I have two images L and R taken from left and right cameras, respectively, and I am calling the essential matrix function as follows: E = findEssentialMat(rightImagePoints, leftImagePoints, ...), and then subsequently recoverPose(E, rightImagePoints, leftImagePoints).. The sign in the translation vector I am getting in the end is not [1, 0, 0] as it should be (because I am calculating pose of camera L from camera R) but [1, 0, 0].. can anyone explain why to me please? I naturally assumed it would be looking for a chirality between points1 and points2 if the functions are passed points in the order points1, points2. Which camera is being considered as origin in my function call? 
20151201 02:23:13 0500  asked a question  SolvePNP consistently gives completely wrong results Hello, I am trying to solve a pose estimation problem for a multiple camera system where one camera acts as the origin using OpenCV. So let's assume a case where camera 1 is at (0,0,0) and camera 2 exactly to the left of it by 4 meters; I build the 3D model of the scene using this metric information. Now camera 2 has displaced to the left by another 4 m, so the total is now 8 m, but we consider the change unknown to the code. Now I am running solvePnP on the new set of image points and the previously reconstructed 3D scene in the hopes of getting 8m as the answer: because the triangulation is done with P1 = [I0], everything that opencv does should be with respect to camera 1, but I am getting extremely bad results. (please note that units are in cm) According to essential matrix based estimation, the translation vector turns up as [0.9999999, 0.000001, 0.000002] etc. so the triangulation is being done properly. Also if I compare the scale of the reconstructed scene in both cases between the two cameras, the ratio of the scale factor is being computed as 0.5, so it should be obvious to OpenCV that the cameras' baseline has doubled. I have checked and rechecked my code and I am at my wits' end, I was hoping someone can throw some light on what might be happening here as I am clearly missing something. I am posting a snippet of my code here for reference. Thanks a lot for your time! CODE: Assume trainOld and trainNew are from camera 1 (static) from time instant 1 and 2, queryOld and queryNew are from camera 2 that's moving. I take trainNew into account and check for what points are still visible from the old set, and run PnP on the 3D points that are still visible and their corresponding projections in the camera frame. pointsObj[0] is the list of 3D points from time instant 1. 
20151122 00:49:58 0500  asked a question  How can I calculate positions of two moving cameras looking at the same scene if the initial baseline is known? Let's say I have two cameras adhering to the assumption that at any given moment, there is a finite number of features visible from both. Both of them start off at a certain baseline b between them, displaced in X (opencv convention). And then they are free to move in X, Y, Z and rotate about all axes, but there are features visible from both. Given that I know b, my final goal is to accurately calculate the subsequent positions of both cameras in real time (although for now, I don't need realtime performance) I have two approaches in mind for this: Approach 1: Triangulate the first pair of images (image0_L, image0_R), and pick a pair of points in the 3D set. Calculate the distance between those points and store it as d1. After the cameras have moved a little, we obtain image1_L, image1_R. Triangulate image1_L and image0_R and track that pair of points again and calculate the new distance d2. Dividing d1/d2 should tell us how much the camera1 has moved with respect to camera2. Repeat with image1_R and image0_L for getting camera2's position w.r.t. camera 1. This is pretty computationally expensive, and also feels like a very naive implementation. Approach 2: Construct the 3D point cloud from image0_L and image0_R. Match image1_L and image0_R to see how many features from image0_R are still visible. Take only those points that correlate to these "tracked, still visible" features from the original 3D point cloud, use those object points as reference and run PNP algorithm on the image points from image1_R and image1_L to obtain positions of both cameras. This is not as computationally intensive as approach 1, but it's giving me very erratic and unreliable results. I am not really knowledgeable at the theoretical aspects of CV theory, so I feel I am failing to spot a more elegant solution for this. Any suggestions or comments would be very helpful, thanks! 
20151119 22:10:33 0500  commented question  Pose estimation using PNP: Strange wrong results Could be, but it happens with every two pairs of images, which is strange. Anyway, I am appending that particular piece of my code to the question. 
20151118 12:12:46 0500  commented question  Pose estimation using PNP: Strange wrong results I've tried displaying the feature points, epipolar lines on the images and triangulated points through PCL and they all make sense. I've also noticed something curious here: let's say I have X number of 3D points from a correspondence, and if I run PNP on these X points and the corresponding image points, I get the correct answer. But if I take a subset of these X points (i.e., tracking only those points that are visible from the next pair of images) and then run PNP on those and the corresponding 2D points, I get this junk value. Another point is that if I let's say, track two 3D points in both case 1 and case 2, compute the distance between them in both cases and take the ratio of those distances, that ratio is in accordance with the change in baseline too. 
20151117 08:08:11 0500  commented question  Pose estimation using PNP: Strange wrong results Sorry, the 1,1,1 and 1,1,1 are just the positions in the simulator world which I mentioned for clarity. Opencv does not use these values, and the projection matrices are computed using the R and t obtained through decomposing the essential matrix. 
20151116 21:27:48 0500  asked a question  Pose estimation using PNP: Strange wrong results Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts. So to start off, this is my first pair of images. I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get. Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m). Now the R and t of left w.r.t. right become: Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates. So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a similar and completely wrong result that looks like this: Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks! EDIT: Code for PNP implementation Let's say pair 1 consists of queryImg1 and trainImg1; and pair 2 consists of queryImg2 and trainImg2 (2d vectors of points). Triangulation with pair 1 results in a vector of 3D points points3D.

20150829 19:06:44 0500  received badge  ● Critic (source) 
20150828 16:47:00 0500  asked a question  Is it possible to compute scale factor of subsequent stereo reconstructions if initial scale is known? I am trying to do relative pose estimation in a stereo setup using openCV, where I compute pose using the essential matrix and then try to reconstruct the scene using cv::triangulatePoints. As this gives me reconstruction up to an arbitrary scale, I thought I could calculate the subsequent scale factors if the initial baseline (~ scale) is known (and the cameras are purely translated in X). But even if the cameras do not move, and the scene does not change, if I take multiple pairs of images, each of the reconstruction is coming up with a different scale: this I know by taking two points and comparing the distances between them. The whole scene is either magnified or shrunk, the reconstruction itself is not wrong, per se. To get around this, I tried something naive: I got the ratio of scale change between pair1 and pair2, and tried this: But the answer I am getting is s*t1, where t1 is the translation in the first case. This is confusing me greatly. Because the scene has not changed by much at all, the 2d point correspondences between 1 and 2 have not changed by much at all. So is it still impossible to determine the scale of a reconstruction even though I know the initial "scale" for sure (i thought this would be straightforward)? Why is the reconstruction giving me arbitrary results at every step, and how does PnP magically know that I have done this whole rescaling? 
20150824 00:58:58 0500  asked a question  Strange issue with stereo triangulation: TWO valid solutions I am currently using OpenCV for a pose estimation related work, in which I am triangulating points between pairs for reconstruction and scale factor estimation. I have encountered a strange issue while working on this, especially in the opencv functions recoverPose() and triangulatePoints(). Say I have camera 1 and camera 2, spaced apart in X, with cam1 at (0,0,0) and cam2 to the right of it (positive X). I have two arrays points1 and points2 that are the matched features between the two images. According to the OpenCV documentation and code, I have noted two points:
Hence, as in the case of recoverPose(), it is safe to assume that P1 is [I0] and P2 is [Rt]. What I actually found: It doesn't work that way. Although my camera1 is at 0,0,0 and camera2 is at 1,0,0 (1 being up to scale), the only correct configuration is obtained if I run which should be incorrect, because points2 is the set from Rt, not points1. I tested an image pair of a scene in my room where there are three noticeable objects after triangulation: a monitor and two posters on the wall behind it. Here are the point clouds resulting from the triangulation (excuse the MS Paint) If I do it the OpenCV's prescribed way: (poster points dispersed in space, weird looking result) If I do it my (wrong?) way: Can anybody share their views about what's going on here? Technically, both solutions are valid because all points fall in front of both cameras: and I didn't know what to pick until I rendered it as a pointcloud. Am I doing something wrong, or is it an error in the documentation? I am not that knowledgeable about the theory of computer vision so it might be possible I am missing something fundamental here. Thanks for your time! 
20150823 21:55:03 0500  asked a question  Question about coordinate system in opencv's fivepoint.cpp Hi, I am trying to use the functions from fivepoint.cpp: the findEssentialMat and recoverPose() functions. I have a very fundamental question about them. I have image1 from a location and image2 taken from some distance to its right, i.e., displaced only in positive X. After feature detection and matching, I have two sets of points: points1, points2. I initially thought because I need the pose of camera providing points2 with respect to the camera providing points1, my arguments should be: This gave me the expected answers [1, 0, 0]. But then I noticed that inside recoverpose(), while checking for triangulation and making sure all points are in front of the camera, the code is assuming that the projection matrix of the camera providing the first argument of points is [I  0], so what I had done should be wrong: so I swapped the arguments from pts2, pts1 to pts1, pts2: now the result became [1 0 0]. If camera1 was being considered as the origin, why is recoverPose's triangulation thinking having the second camera to the left is the correct configuration? And why is incorrectly letting camera2 be the origin giving me the result I expected? The documentation clearly states whenever triangulatePoints(P1, P2, points1, points2, points3D) is called, P1 is for points1 and P2 is for points2. 
20150823 18:41:56 0500  asked a question  cv::TriangulatePoints drifting for points from same pose Hi, I am encountering a strange issue with the triangulation function of openCV. For my twoview pose estimation tests, I was doing this: Case 1: A single camera taking pictures of a scene from two views, then computing the essential matrix and triangulating the scene: which worked reasonably well. Case 2: I am trying to do this with two cameras (both calibrated), but triangulation is failing. For image pairs from the same poses, the triangulated points should be more or less the same but now, they are drifting. Example: What am I doing wrong here? For the next set of points, triangulatePoints comes up with another weird estimate. 
20150810 21:36:17 0500  edited question  findEssentialMat() pose estimation: wrong translation vector in some cases I am currently using epipolar geometry based pose estimation for estimating pose of one camera w.r.t another, with nonzero baseline between the cameras. I am using the fivepoint algorithm (implemented as findEssentialMat in opencv) to determine the uptoscale translation and rotation matrix between the two cameras. I have found two interesting problems when working with this, it would be great if someone can share their views: I don't have a great theoretical background in computer vision:
Image pair Output:
Image 1 to image 2 Output: Rotation [1.578, 24.94, 0.1631] (Close) Translation [0.0404, 0.035, 0.998] (Wrong) Image 2 to image 1 Output: Rotation [2.82943, 30.3206, 3.32636] Translation [0.99366, 0.0513, 0.0999] (Correct) Looks like it has no issues figuring the rotations out but the translations are a hit or miss. As to question 1, I was initially concerned because because the rotation is along the Z axis, the points might appear to be all coplanar. But the five point algorithm paper particularly states: "The 5point method is essentially unaffected by the planar degeneracy and still works". Thank you for your time! 