Ask Your Question
1

Uploading gpumat is too slow in jetson TX1

asked 2017-10-10 19:23:46 -0600

ossyaritoori gravatar image

Hi. I'm now working on jetson TX1 with cuda8.0. I compiled opencv3.1 with cuda and it looks successful.

However, when I tried some samples, I found cuda calclation pretty slow mostly because of uploading images.

For example, I tried surf feature extraction and matching from opencv sample cord. The result is like below:

upLoad = 39.9022

Device 0:  "NVIDIA Tegra X1"  3995Mb, sm_53, Driver/Runtime ver.8.0/8.0
FOUND 158 keypoints on first image
FOUND 137 keypoints on second image

Findcuda = 0.000123487  Extraction = 0.0952315
Matching = 0.00152424 Download = 0.00137919

This means uploading two images took about 40sec! Are there any solutions? Thank you.

I put my code here.

#include <iostream>

#include "opencv2/opencv_modules.hpp"

#ifdef HAVE_OPENCV_XFEATURES2D

#include "opencv2/core.hpp"
#include "opencv2/features2d.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/cudafeatures2d.hpp"
#include "opencv2/xfeatures2d/cuda.hpp"

using namespace std;
using namespace cv;
using namespace cv::cuda;

static void help()
{
    cout << "\nThis program demonstrates using SURF_CUDA features detector, descriptor extractor and BruteForceMatcher_CUDA" << endl;
    cout << "\nUsage:\n\tsurf_keypoint_matcher --left <image1> --right <image2>" << endl;
}

int main(int argc, char* argv[])
{
    if (argc != 5)
    {
        help();
        return -1;
    }

    GpuMat img1, img2;
    cv::Mat raw1,raw2;
    raw1 = imread(argv[2], IMREAD_GRAYSCALE);
    raw2= imread(argv[4], IMREAD_GRAYSCALE);
    int64 t0 = cv::getTickCount();
    for (int i = 1; i < argc; ++i)
    {
        if (string(argv[i]) == "--left")
        {
            img1.upload(raw1);
            CV_Assert(!img1.empty());
        }
        else if (string(argv[i]) == "--right")
        {
            img2.upload(raw2);
            CV_Assert(!img2.empty());
        }
        else if (string(argv[i]) == "--help")
        {
            help();
            return -1;
        }
    }

    int64 t1 = cv::getTickCount();
    cout <<  " upLoad = " << (t1-t0)/cv::getTickFrequency() << endl;

    cv::cuda::printShortCudaDeviceInfo(cv::cuda::getDevice());

    int64 t2 = cv::getTickCount();
    SURF_CUDA surf;

    // detecting keypoints & computing descriptors
    GpuMat keypoints1GPU, keypoints2GPU;
    GpuMat descriptors1GPU, descriptors2GPU;
    surf(img1, GpuMat(), keypoints1GPU, descriptors1GPU);
    surf(img2, GpuMat(), keypoints2GPU, descriptors2GPU);

    int64 t3 = cv::getTickCount();
    cout << "FOUND " << keypoints1GPU.cols << " keypoints on first image" << endl;
    cout << "FOUND " << keypoints2GPU.cols << " keypoints on second image" << endl;

        cout << " Findcuda = " << (t2-t1)/cv::getTickFrequency() << " Extraction = " << (t3-t2)/cv::getTickFrequency() << endl;

    // matching descriptors
    Ptr<cv::cuda::DescriptorMatcher> matcher = cv::cuda::DescriptorMatcher::createBFMatcher(surf.defaultNorm());
    vector<DMatch> matches;
    matcher->match(descriptors1GPU, descriptors2GPU, matches);
    int64 t4 = cv::getTickCount();

    // downloading results
    vector<KeyPoint> keypoints1, keypoints2;
    vector<float> descriptors1, descriptors2;
    surf.downloadKeypoints(keypoints1GPU, keypoints1);
    surf.downloadKeypoints(keypoints2GPU, keypoints2);
    surf.downloadDescriptors(descriptors1GPU, descriptors1);
    surf.downloadDescriptors(descriptors2GPU, descriptors2);
    int64 t5 = cv::getTickCount();


    cout <<  " Matching = " << (t4-t3)/cv::getTickFrequency() <<  " Download = " << (t5-t4)/cv::getTickFrequency() << endl;

    // drawing the results
    Mat img_matches;
    drawMatches(Mat(img1), keypoints1, Mat(img2), keypoints2, matches, img_matches);

    namedWindow("matches", 0);
    imshow("matches", img_matches);
    waitKey(0);

    return 0;
}

#else

int main()
{
    std::cerr << "OpenCV was built without xfeatures2d module" << std::endl;
    return 0;
}

#endif
edit retag flag offensive close merge delete

Comments

OpenCV does a GPU initialization the first time the GPU is actually called. SO please, do a GPU command first, before starting your timing. Initialization is now included because the first GPU call is is the upload function.

StevenPuttemans gravatar imageStevenPuttemans ( 2017-10-11 08:25:19 -0600 )edit
1

Hi Steven. I have tried to call

cv::cuda::printShortCudaDeviceInfo(cv::cuda::getDevice());

Function at first. But it still take the same time in the uploading process. Is it what you mean?

ossyaritoori gravatar imageossyaritoori ( 2017-10-11 14:00:23 -0600 )edit

Hmm that is weird indeed. Could you simply do the upload a 1000 times, time it and divide? I am curious if the overhead averages out or not.

StevenPuttemans gravatar imageStevenPuttemans ( 2017-10-12 04:12:35 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2017-10-12 04:22:26 -0600

ossyaritoori gravatar image

Hi steven! May be I solved this problem. I called next function at first.

  cv::cuda::GpuMat test;
  test.create(1, 1, CV_8U);

Now I can afford GPU power !!

edit flag offensive delete link more

Comments

Well then probably the getDevice just probes your busses and not the card itself, not generating an initialization. Lets turn this into an answer!

StevenPuttemans gravatar imageStevenPuttemans ( 2017-10-12 04:24:55 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2017-10-10 19:23:46 -0600

Seen: 1,167 times

Last updated: Oct 10 '17