Ask Your Question
0

2x slower performance in C++ vs. Python for detectAndCompute with ORB features

asked 2017-10-20 16:39:10 -0600

shlady gravatar image

Hi, I'm trying to write code for finding regions of an image with a moving object. Generally, my algorithm is:

  1. Find keypoints in two successive frames
  2. Match them
  3. Use RANSAC method to find a homography
  4. Find outliers

I had this written in Python as a first shot on my own. However, it is a requirement on the final board for it to be in C++. So, I started transferring it over so we could test it on our processor. For a simple benchmark, I got an FPS across 400 frames of a large-ish resolution video. When I compared the two, C++ (only detectAndCompute) was coming in, at best, around 0.07 sec/frame or 14 FPS. However, Python, using detectAndCompute, was coming in at about 0.041 seconds per frame or 24 FPS.

I read that the Python bindings just generate a wrapper from the headers, so I figure that if I could know how it is calling the function, I could speed up my C++ code. Are there any suggestions on how to find that or others things that I am doing incorrectly that could be causing this?

Using OpenCV 3.3 on Ubuntu 16.04. Compiled with g++, and Python is 2.7.

Python:

import cv2
import numpy as np
import math
from matplotlib import pyplot as plt
from time import time
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('path', help='path to input video')
parser.add_argument('-s', '--save', default=0, help='save a video?')
args = parser.parse_args()

vid_path = args.path

orb = cv2.ORB_create()

img2 = cv2.imread(vid_path + '/00000001.jpg')
h, w, c = img2.shape

s = time()
for i in range(2,401):
    img1 = img2.copy()
    img2 = cv2.imread(vid_path + '/' + format(i, '08d') + '.jpg')

    kp1, des1 = orb.detectAndCompute(img1,None)
    kp2, des2 = orb.detectAndCompute(img2,None)

t2 = time()
print('Frame Time: \t' + str((t2-s) / 399))
cv2.destroyAllWindows()

C++:

#include <opencv2/videoio.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/features2d.hpp>

#include <iostream>
#include <string>
#include <vector>
#include <time.h>

using namespace cv;

int main(int argc, char const *argv[]) {

  if(argc != 5){
    std::cout << "Usage: ./orb [video_path] [distance_threshold] [max_iterations] [n_bins]" << std::endl;
    return -1;
  }

  // ----- parse command line arguments -----
  std::string vid_path    = argv[1];
  double DIST_THRESH      = atof(argv[2]);
  int MAX_ITER            = atoi(argv[3]);
  int N_BINS              = atoi(argv[4]);

  Ptr<ORB> orb = ORB::create();
  Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming");

  Mat img2 = imread(vid_path + "/00000001.jpg");
  Mat img1;

  int h = img2.rows;
  int w = img2.cols;

  clock_t start = clock();
  for (int i=2; i<401; i++) {
    img1 = img2.clone();
    std::ostringstream formatted_number;
    formatted_number << std::setw(8) << std::setfill('0') << i;
    img2 = imread(vid_path + "/" + formatted_number.str() + ".jpg");

    std::vector<KeyPoint> kp1, kp2;
    Mat des1, des2;
    std::vector<DMatch> matches;

    orb->detectAndCompute(img1, noArray(), kp1, des1);
    orb->detectAndCompute(img2, noArray(), kp2, des2);
    // matcher->match(des1, des2, matches);

  }
  clock_t end = clock();
  double f_time = ((double)(end - start)) / CLOCKS_PER_SEC / 399;
  std::cout << "Frame Time: \t" << f_time << " seconds" << std::endl;

}
edit retag flag offensive close merge delete

Comments

time.time() measures "wall time", clock() measures "cpu" time.

try cv2.getTickCount() / cv2.getTickFrequency() for a fair comparison.

last, you should not include the imread() parts in your timing

berak gravatar imageberak ( 2017-10-20 17:27:41 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2017-10-20 17:44:01 -0600

shlady gravatar image

@berak Thanks, this worked and now I'm getting the same speeds on both. I started a time immediately before the detect and compute methods and added it to a counter immediately after. The division by 399 was because I had a 400 frame video I was comparing them on so, for 400 frames it takes 399 comparisons. I did not understand the difference between clock and wall time though and it made all the difference. Why shouldn't the imread() functions be included? Is it just because it is not part of what I asked about or is it an inherently variable process?

edit flag offensive delete link more

Question Tools

2 followers

Stats

Asked: 2017-10-20 16:39:10 -0600

Seen: 1,493 times

Last updated: Oct 20 '17