Ask Your Question
4

Inconsistent Disparity Map Coloration Between Frames.

asked 2019-07-20 20:04:31 -0600

JackInaBox gravatar image

updated 2019-07-22 14:14:40 -0600

Hi,

I'm running into a bit of trouble with depth maps computation. I'm taking live feed from a set of stereo cameras and computing disparity maps in real time. But somehow the computed values for each pixel (or block of pixels, for that matter) seems to be shifting constantly between frames, which resulted very inconsistent depth estimation across frames.

For my process, first I have performed stereo calibration on the cameras (with an error around 0.57), then using calibration result, I have managed to rectify stereo images successfully. The rectified images are then fed through a stereoBM object (and a stereo matcher) for disparity map generation, and the result is then smoothed out with a weighted least square filter.

I have attached a gif demonstrating the issue: https://i.imgur.com/yi87y2G.mp4

I am still very new to this field and would appreciate any pointers. Also if I have used any incorrect terms, or have failed to provide sufficient explanations, please feel free to correct me.

edit retag flag offensive close merge delete

Comments

1

Disparity maps are not necessarily the best thing next to depth maps.

sjhalayka gravatar imagesjhalayka ( 2019-07-20 21:15:51 -0600 )edit
2

@sjhalayka interesting. Can you elaborate?

JackInaBox gravatar imageJackInaBox ( 2019-07-22 10:54:45 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
4

answered 2019-07-22 13:59:37 -0600

updated 2019-07-22 14:09:21 -0600

You are seeing calculated depths, which, frame to frame, are calculated as very different disparities from your stereo cameras. Some have the appearance of a moving pattern of depth changes across the scene.

Your depth calculation problems are a direct result of several things:

  1. Pulsed lighting
  2. Short exposure
  3. Uncoordinated shutters
  4. Large low contrast areas in scene

Pulsed lighting and short exposure combine to create a problem where each camera picks up a different overall scene illumination intensity. Also, frame to frame, images from an individual camera's scene also vary wildly in intensity.

To correct the pulsed lighting and short exposure problem, change the lighting to constant, e.g. incandescent, so that the lighting intensity level is constant relative to a camera frame period. Alternatively, increase the exposure time to greater than the period of the lighting pulse period - typically, an exposure of 30ms or so is generally long enough to smooth these out - with a corresponding decrease in aperture, ISO, or gain setting, so that the images aren't blown out.

Uncoordinated shutters cause multiple problems. If they are offset in time but have the same frame period, then the two cameras (with the pulsed lighting and short exposure time) have a much different scene intensity. In the disparity algorithm in low contrast areas especially (more on this in a moment), this intensity offset is calculated as a difference in depth from the camera, resulting in a lot of the massive frame to frame fluctuation in calculated depthin your video. As an aside, another problem with uncoordinated shutters, is that a moving object captured at different times by left and right cameras will have a different depth calculation, depending on direction and speed of movement left or right.

The second major uncoordinated shutter problem is if the two cameras are running on internal frame timers; one may be taking exposures at a rate that is a few percentage different from the other. This will create a periodic pattern to the depth fluctuations already intrinsic in the short exposure and uncoordinated shutter problem. I think that is what you see with the pattern of disparity errors repeating about every second or so - it's a multiple beat frequency effect of the two out of sync frame timers and the lighting intensity changes.

To coordinate frame start times, the exposures of the cameras should be slaved to a common externally generated trigger signal. With these changes, any pulsed lighting intensity change will become much less significant frame-to-frame.

Large low contrast areas in the scene, means big areas of the scene (larger than the disparity comparison block size) have no high-contrast spots or edges. Without high-contrast spots or edges in the image which are unique, crisply located, and easy for the algorithm to identify the common feature in both cameras, the disparity algorithm has to resort to a local intensity comparison between the left and right frames to compute disparity. With short exposure pulsed lighting uncoordinated shutters, this is a recipe for ... (more)

edit flag offensive delete link more

Comments

1

@opalmirror thank you for the thoughtful reply. It's definitely a good knowledge gain for me and it will take me a while to fully grasp the different pointers here. I'm working with some hardware constraints, namely I actually have no control over what the hardware does (I'm sourcing video input similar to how I would get video feed from a web cam). So I'm not quite certain how I would control exposure or shutters. So I'm curious as to whether there is a solution that's purely based on software. Regardless, your reply is definitely appreciated.

JackInaBox gravatar imageJackInaBox ( 2019-07-25 20:37:59 -0600 )edit

@JackInaBox this is a good discussion. If your real subject is just the scene as pictured, you will have errors. - Perhaps you can reduce those errors by averaging the depth algorithm input over many frames? Try a decaying averaging algorithm on the input images. - If you are interested in objects that are moving in and out of the scene, like persons, then presuming they are lit with enough contrast in their surface features, they will probably have more accurate depth accuracy than the low-feature flat surfaces do. - You may be able to average frames of visiting objects when you can deduce that little motion is happening.

opalmirror gravatar imageopalmirror ( 2019-07-29 15:04:37 -0600 )edit

@opalmirror After sitting on this question for a while I have decided to move forward and take your answer. Thank you sir for the good advices. I am thinking of a way to do depth estimation purely based on outlines, since their uniqueness is more pronounced and I think they should be invariant despite difficult to control lighting and environment parameters. Once I estimated depth on the edges I might be able to interpolate depth values across the enclosed area? And maybe to add some noise averaging ontop over frames? Would that be a doable thing?

JackInaBox gravatar imageJackInaBox ( 2019-07-31 14:13:07 -0600 )edit

@JackInaBox thank you for the correct answer vote!

Your ideas sound worth trying.

I haven't used outlines to solve distance. I would suppose it may have an error effect which would vary based on the shape of the cross section of the object. If the object is relatively flat and face on to the camera (straight line cross section), this would be accurate. If it is circular cross section, you might be able to make some assumptions about the error. If the object is faceted (like a block) the assumptions will be complicated depending on rotation and whether the cameras see the same feature. With all these visual correspondence things, reflective surfaces (glossy glancing surfaces or images) introduce their own complications.

opalmirror gravatar imageopalmirror ( 2019-07-31 16:16:50 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-07-20 19:59:58 -0600

Seen: 1,334 times

Last updated: Jul 22 '19