Ask Your Question

Simple augmented reality using opencv

asked 2016-10-23 08:17:25 -0500

cornBuddy gravatar image

Hi everyone! I'm totally newbie at opencv and computer vision at all and I face some problems with it. I'm trying to build simple program that's detecting planar object and drawing cube (or at least 3d axis) on it. I used this guide to build it, and here is my code:

import sys
import numpy as np
import cv2

CGAUSIAN = (5, 5)

DELTA = 0.01

BLUE = (255, 0, 0)
GREEN = (0, 255, 0)
RED = (0, 0, 255)
BLACK = (0, 0, 0)

def show(image):
    cv2.namedWindow('image', cv2.WINDOW_NORMAL)
    cv2.imshow('image', image)

def filt(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edged = cv2.Canny(image, CANNY_LOW, CANNY_HIGH)
    blured = cv2.GaussianBlur(edged, GAUSIAN, 5)
    return blured

def search_for_table_corners(raw_image):
    filtered = filt(raw_image)
     _, cnts, _ = cv2.findContours(filtered,
            cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
    for cnt in cnts:
        cnt_len = cv2.arcLength(cnt, IS_CLOSED)
        approx = cv2.approxPolyDP(cnt, DELTA * cnt_len, IS_CLOSED)
        if len(approx) == 4:
            cv2.drawContours(raw_image, [approx], -1, BLACK, 4)
            return np.float32(approx)
    return None

def draw(img, corners, imgpts):
    corner = tuple(corners[0].ravel())
    img = cv2.line(img, corner, tuple(imgpts[0].ravel()), BLUE, 5)
    img = cv2.line(img, corner, tuple(imgpts[1].ravel()), GREEN, 5)
    img = cv2.line(img, corner, tuple(imgpts[2].ravel()), RED, 5)
    return img

def generate_camera_matrix(image):
    h, w = image.shape[:2]
    # let it be full frame matrix
    sx, sy = (36, 24)
    # focus length
    f = 50
    fx = w * f / sx
    fy = h * f / sy
    cx = w / 2
    cy = h / 2
    mtx = np.zeros((3, 3), np.float32)
    mtx[0, 0] = fx # [ fx  0  cx ]
    mtx[0, 2] = cx # [  0 fy  cy ]
    mtx[1, 1] = fy # [  0  0   1 ]
    mtx[1, 2] = cy
    mtx[2, 2] = 1
    return mtx

def generate_distorsions():
    return np.zeros((1, 4), np.float32)

def get_object_points(corners):
    x1, y1 = corners[0][0]
    x2, y2 = corners[1][0]
    x3, y3 = corners[2][0]
    x4, y4 = corners[3][0]
    return np.float32([
        # hardbone
        [x2, y2, 0],
        [x1, y1, 0],
        [x3, y3, 0],
        [x4, y4, 0],

def generate_axis(a):
    axis = np.float32([[a,0,0], [0,a,0], [0,0,-a]]).reshape(-1,3)
    return axis

def get_corners_subpixels(raw_image, corners):
            30, 0.001)
    gray = cv2.cvtColor(raw_image, cv2.COLOR_BGR2GRAY)
    corners_subpxs = cv2.cornerSubPix(gray, corners,
            (11, 11), (-1, -1), criteria)
    return corners_subpxs

def estimate_pose(raw_image, table_corners):
    print('table_corners:\n', table_corners, '\n', '-' * 70)
    object_points = get_object_points(table_corners)
    print('object_points:\n', object_points, '\n', '-' * 70)
    corners_subpxs = get_corners_subpixels(raw_image, table_corners)
    print('corners_subpxs:\n', corners_subpxs, '\n', '-' * 70)
    camera_matrix = generate_camera_matrix(raw_image)
    print('camera_matrix:\n', corners_subpxs, '\n', '-' * 70)
    distorsions = generate_distorsions()
    rotation_vec, translation_vec = cv2.solvePnPRansac(object_points,
            corners_subpxs, camera_matrix, distorsions, iterationsCount=500,
    print('rotation_vec:\n', rotation_vec, '\n', '-' * 70)
    print('translation_vec:\n', translation_vec, '\n', '-' * 70)
    return rotation_vec, translation_vec, camera_matrix, distorsions, \

def create_canvas(image):
    return image.copy()

def get_projection_points(raw_image, table_corners):
    rvecs, tvecs, mcam, dist, corn2 = estimate_pose(raw_image, table_corners)
    size = round(raw_image.shape ...
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-10-23 09:52:34 -0500

Tetragramm gravatar image

You seem to have a fundamental understanding of how solvePnP works. HERE is a tutorial that should explain things better. In short, you have both your imagePoints and your worldPoints coming from the same place. They're actually the same numbers. That can't work.

Your image points are correct. The location in pixels inside the image.

Your world points should be the actual corners of the table. So for a pretend table that is 1x2 meters, you have [0,0,0], [1,0,0], [0,2,0],[1,2,0]. And then you need to make the order of your image points and world points match. In otherwords, the point in image space that represents the corner you called [0,0,0] is always paired with the pixel coordinates of that corner.

Lastly, and least important, you don't have enough point to use RANSAC. Just use solvePnP with the SOLVEPNP_P3P flag.

edit flag offensive delete link more


Thanks for the answer! So how can I found the actual corners of the table if I have only the 2d image?

cornBuddy gravatar imagecornBuddy ( 2016-10-23 11:37:00 -0500 )edit

Well, you have to get out a tape-measure. Or if you know the width/height ratio of the table, you can just use unitless numbers. For example, if you know it's a square table, you can just use 1x1.

This is why most augmented reality stuff right now uses markers of some kind, like this: ARUCO

Tetragramm gravatar imageTetragramm ( 2016-10-23 17:11:43 -0500 )edit

Well, I've got your point, you explain it pretty clearly. Is there totally another way to do the same task in opencv?

cornBuddy gravatar imagecornBuddy ( 2016-10-24 03:33:21 -0500 )edit

ARUCO or another marker system is the best/simplest way. Pretty easy. If you have a stereo camera you can know distances and model the world that way. That needs a lot of processing power though. Or some other way of knowing where the camera is, combined with motion to model the world. That last needs a very precise locating system and a lot of processing power.

Tetragramm gravatar imageTetragramm ( 2016-10-24 17:18:45 -0500 )edit

I don't have a stereo camera. The whole task is find planar rectangle (table, for example) from web-camera and draw 3d cube on it. It means that when I'm moving camera around planar object, cube projection is changing too. I've tried to find a table using cascades, but, well, I didn't train the network correctly, so I decided to find planar rectangle using algorithm above. Is there another way to do this task without any additional devices such as stereo-camera?

cornBuddy gravatar imagecornBuddy ( 2016-10-25 02:59:57 -0500 )edit

Not unless you know the aspect ratio (and preferably the exact size) of the rectangle. If you don't, you don't have enough information to find orientation. Knowing things about the rectangle is basically the same as a marker system, just not as easy, and you can't tell which side you're on.

There are lots of different size and shape rectangles that can match what you see at different orientations. So you can't tell which is correct without some outside knowledge, whether it's the size, another camera, or a marker.

Tetragramm gravatar imageTetragramm ( 2016-10-25 10:58:54 -0500 )edit

Question Tools

1 follower


Asked: 2016-10-23 08:17:25 -0500

Seen: 1,223 times

Last updated: Oct 23 '16