Getting to grips with video stabilisation in python

asked 2017-11-23 11:59:56 -0600

dbrb2
1 ●1

updated 2017-11-24 08:41:30 -0600

I've been trying to get to grips with video stabilisation using opeCV and Python. The below seems to run, and to successfully keep track of points from frame to frame. However, my attempts to apply the offset between frames to prevent jitter fails completely - not with an error, just without any obviously useful effect.

I suspect I am doing something very obviously wrong here, but am getting square eyes, and would appreciate any guidance!

import numpy as np
import cv2
import sys

vid=sys.argv[1]
border_crop=10
show_points=True
inter_frame_delay=20
smoothing_window=100

rolling_trajectory_list=[]


cap = cv2.VideoCapture(vid)

# params for ShiTomasi corner detection
feature_params = dict( maxCorners = 50,
                       qualityLevel = 0.3,
                       minDistance = 7,
                       blockSize = 7 )

# Parameters for lucas kanade optical flow
lk_params = dict( winSize  = (15,15),
                  maxLevel = 4,
                  criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
transformation_matrix_avg = cv2.estimateRigidTransform(old_frame, old_frame, False)


rows,cols = old_gray.shape
print "Video resolution: "+str(cols)+"*"+str(rows)
raw_input("Press Enter to continue...")

points_to_track = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)

print "Trackable points detected in first frame:"
print points_to_track

frame_mask = np.zeros_like(old_frame)

while(1):
    ret,frame = cap.read()
    if not ret:
        break

    #Read a frame and convert it to greyscale
    new_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    #calculate optical flow between the latest frame (new_gray) and the last one we examined
    print "Searching for optical flow between this frame and the last..."
    new_points, matched, err = cv2.calcOpticalFlowPyrLK(old_gray, new_gray, points_to_track, None, **lk_params)

    # Select good tracked points - matched==1 if the point has been found in the new frame
    new = new_points[matched==1]
    old = points_to_track[matched==1]

    print "Old point coordinates:"
    print old

    print "New point coordinates:"
    print new

    # This should return a transformation matrix mapping the points in "new" to "old"
    transformation_matrix = cv2.estimateRigidTransform(new, old, False)
    print "Transform from new frame to old frame..."
    print transformation_matrix
    # Not sure about this...trying to create an smoothed average of the frame movement over the last X frames
    rolling_trajectory_list.append(transformation_matrix)
    if len(rolling_trajectory_list) > smoothing_window:
        rolling_trajectory_list.pop(0)

    transformation_matrix_avg=sum(rolling_trajectory_list)/len(rolling_trajectory_list)

    print "Average transformation over last "+str(smoothing_window)+" frames:"
    print transformation_matrix_avg
    #Apply the transformation to the frame
    stabilized_frame = cv2.warpAffine(frame,transformation_matrix_avg,(cols,rows),flags=cv2.INTER_NEAREST|cv2.WARP_INVERSE_MAP)
    cropped_stabilized_frame = stabilized_frame[border_crop:rows-border_crop, border_crop:cols-border_crop]

    if show_points:
        for point in new:
            corner_x=point[0]
            corner_y=point[1]
            frame = cv2.circle(frame,(corner_x,corner_y),2,(0,255,0),-1)

        for point in old:
            corner_x=point[0]
            corner_y=point[1]
            frame = cv2.circle(frame,(corner_x,corner_y),2,(255,255,255),-1)

    cv2.imshow('original frame',frame)
    cv2.imshow('stabilised frame',stabilized_frame)
    cv2.imshow('cropped stabilised frame',cropped_stabilized_frame)
    cv2.waitKey(inter_frame_delay)

    old_gray = new_gray.copy()
    points_to_track = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)

raw_input("Press Enter to continue...")
cv2.destroyAllWindows()
cap.release()

edit retag flag offensive close merge delete

Comments

I'm using python 3.5 and OpenCV 3.3.1.on raspberry pi 3. Change this vid=sys.argv[1] to vid=sys.argv[0]

supra56 ( 2017-11-25 06:22:57 -0600 )edit

Video resolution: 480*640 Press Enter to continue... Trackable points detected in first frame: [[[ 576. 232.]]

 [[ 396.  244.]]

 [[ 419.  252.]]]
Searching for optical flow between this frame and the last...
Old point coordinates:
[[ 576.  232.]
 [ 396.  244.]
 [ 419.  252.]]
New point coordinates:
[[ 576.35992432  231.73887634]
 [ 396.36810303  243.77870178]
 [ 419.3380127   251.82849121]]
Transform from new frame to old frame...
[[  9.99931348e-01  -3.35097346e-04  -2.42246632e-01]
 [  3.35097346e-04   9.99931348e-01   7.91289805e-02]]
Average transformation over last 100 frames:
[[  9.99931348e-01  -3.35097346e-04  -2.42246632e-01]
 [  3.35097346e-04   9.99931348e-01   7.91289805e-02]]
Searchi

supra56 ( 2017-11-26 06:32:26 -0600 )edit

Is that what the output look like? I got it working as above.

supra56 ( 2017-11-26 06:36:46 -0600 )edit

Yep. That's the output....but the stabilisation isn't currently having any useful effect. My latest attempt lives here, taking on board the suggestions below. http://bbarker.co.uk/stabilize_video_...

dbrb2 ( 2017-11-26 09:52:32 -0600 )edit

@dbrb2. I'm going to attempt the link you posted.

supra56 ( 2017-11-26 12:10:57 -0600 )edit

Video resolution: 640*480 Press Enter to continue... Trackable points detected in first frame: [[[ 631. 207.]]

 [[ 432.  223.]]

 [[ 439.  220.]]

 [[ 213.   81.]]]
Searching for optical flow between this frame and the last...
Old point coordinates:
[[ 631.  207.]
 [ 432.  223.]
 [ 439.  220.]
 [ 213.   81.]]
New point coordinates:
[[ 632.31616211  206.50134277]
 [ 433.11505127  222.3952179 ]
 [ 440.26553345  219.28543091]
 [ 213.00798035   81.24056244]]
Transform from new frame to old frame...
[[ 0.99808415 -0.00279836  0.40730982]
 [ 0.00279836  0.99808415 -0.45865991]]
Average transformation over last 100 frames:
[[ 0.99808415 -0.00279836  0.40730982]
 [ 0.00279836  0.99808415 -0.

supra56 ( 2017-11-26 12:59:44 -0600 )edit

This is second link you posted

supra56 ( 2017-11-26 13:00:43 -0600 )edit

@drb2. Both source code are working. But if I move my eyes or move my head. I got an error .

Traceback (most recent call last):
  File "/home/pi/opencv3.3.1_projects/temp/test7.py", line 70, in <module>
    transformation_matrix_avg=sum(rolling_trajectory_list)/len(rolling_trajectory_list)
TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

supra56 ( 2017-11-26 13:07:10 -0600 )edit

add a comment

answered 2017-11-25 20:37:13 -0600

Tetragramm

7376 ●13 ●37

So I believe I see your problem.

You are applying the average motion to each new frame. Simplifying to one dimension, if your motion were

0, -5, 0, -5, 0, -5, 0, -5

Your average motion would be -2.5 You then correct every frame by +2.5, and everything is still jittery. What you need is to correct everything by the difference from -2.5, or rather, the difference from your average transformation matrix.

Hopefully that's enough to get you started.

edit flag offensive delete link

Comments

Thanks,

I think you may be right -in fact I had been trying that (see below). I suspect my current problem is that I am calculating the delta from the average by subtracting the transformation matrices, and I suspect that is not valid...I likely have to break the matrices down into their comonent translations, then find the delta, then recreate the new tranbsform...

    transformation_matrix_avg=sum(rolling_trajectory_list)/len(rolling_trajectory_list)
    print "Average transformation over last "+str(smoothing_window)+" frames:"
    print transformation_matrix_avg

    correction_delta=transformation_matrix_avg-transformation_matrix
    print "Delta from average from current frame"
    print correction_delta

dbrb2 ( 2017-11-26 03:35:24 -0600 )edit

That's correct. That is not a valid way to combine matrices.

Try limiting it to just translation to start, to make sure your process and understanding is correct. Then add in the more complicated components, so all you have to check is that math.

Tetragramm ( 2017-11-26 21:30:29 -0600 )edit

Ok - I tried the below, just correcting for X and Y translation. Very odd behaviour...Any ideas? I;ve updated link above

    x_shift=transformation_matrix[0,2]
        y_shift=transformation_matrix[1,2]

        rolling_x_shift.append(y_shift)
        rolling_y_shift.append(x_shift)

        if len(rolling_x_shift) > smoothing_window:
            rolling_x_shift.pop(0)
            rolling_y_shift.pop(0)

        x_shift_avg=sum(rolling_x_shift)/len(rolling_x_shift)
        y_shift_avg=sum(rolling_y_shift)/len(rolling_y_shift)    
        correction_delta=transformation_matrix_avg-transformation_matrix
stabilized_frame = cv2.warpAffine(frame,correction_transformation,(cols,rows),flags=cv2.INTER_NEAREST|cv2.WARP_INVERSE_MAP)

dbrb2 ( 2017-11-27 04:04:46 -0600 )edit

The code at the link looks reasonable. What kind of odd behavior?

Tetragramm ( 2017-11-27 17:38:49 -0600 )edit

Well, despite a definite bias on my part to want to see some degree of improvement in the amount of video shake, objectively when testing it against various video samples, the output looks as poor as the input. Taking as an example this video clip from an unrelated git project: https://github.com/francocurotto/Vide... my three video windows show original, stabilised, and cropped...all of which look pretty much the same :-)

dbrb2 ( 2017-11-27 17:43:59 -0600 )edit

You're still missing a step. Your final shift is relative to the previous image. But you've shifted that image.

So for example (0, 0, 0, 0, 5, 0, 0, 0). You shift the 5 down to match the 0s. But then the next shift is -5, which you shift up, and suddenly you get the exact same jitter, just one frame later. Think through this carefully, there's a lot of gotchas here.

Also, try more than 50 points. That could very well be part of the problem.

Tetragramm ( 2017-11-27 19:02:55 -0600 )edit

Ah yes OK. So taking your example above, the average transform should be very close to zero - the point is not moving except for the jitter. So when we get to the first offset point, we shift down by 5. The next frame shows the point back where it should be - which is an offset of 5 from the frame before. So in fact here we need the cumulative transform- +5,-5, which gives us zero again.

So the more generic transform will be (cumulative average shift)-(cumulative actual shift)

The corrected code at the link above now works as expected!

dbrb2 ( 2017-11-28 08:18:52 -0600 )edit

Good! Now, of course, for longer videos or videos with differing motion, the simple average may not work, so you would need to decay older measurements and so forth. Lots of ways of doing that, especially Finite Impulse Response filters, (FIR). Can't help you much past here, but googling estab and FIR should point you at some useful papers.

Tetragramm ( 2017-11-28 17:42:23 -0600 )edit

add a comment

Getting to grips with video stabilisation in python

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Getting to grips with video stabilisation in python edit

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Getting to grips with video stabilisation in python