Ask Your Question

dtmoodie's profile - activity

2016-11-30 10:01:44 -0500 commented question Using thrust with cv::cuda::GpuMat

I've used GpuMatBeginItr<cv::vec3b> on CV_8UC3 images without a problem, similarly with CV_32FC3.

2016-06-03 13:58:46 -0500 answered a question How to save a video in mp4 format

If you have opencv compiled with gstreamer you can try a gstreamer pipeline like the following (not tested, approximately what you would need to do):

std::string pipeline = "appsrc ! videoconvert ! avenc_h264 ! matroskamux ! filesink location=test.mp4";
cv::VideoWriter cam(pipeline);
cam << image;

Quick explanation of what's going on. VideoWriter can be constructed with a gstreamer pipeline string that will use the gstreamer backend for encoding and muxing your file. The pipeline consists of the following elements:

appsrc <-- entrance of the opencv mat into the gstreamer pipeline
videoconver <--- convert rgb8 packed raw image into YUV for encoding
avenc_h264 <---- encode the video into an h264 format
matroskamux <---- mux the video into an mp4 compatible format
filesink <---- save to disk

I can verify and update when I get to my desktop.

2016-06-03 13:44:25 -0500 commented question Parallel version of SIFT?

What is your end goal? Are you launching an application, processing only a single image and then closing the application? If you want to do batches of images or multiple images in a row then the GPU can be a good solution. Do you mean you only want to process one image at a time thus not in parallel but multiple images per application launch?

2016-01-04 09:20:41 -0500 asked a question What is the reason behind completely hiding implementations in the API?

The coding style guide discusses using virtual interfaces for classes such that the interface is exposed but the implementation is hidden.

I understand the use of this in simplifying the API, but this can be cumbersome when I want to make modifications and changes.

For example, the cv::cuda::pyrLKSparseOpticalFlow code. It has a virtual interface and then a hidden implementation which is created with createPyrLKSparseOpticalFlow.

Why not have an exposed implementation that is in a header that isn't necessarily included with the module level cudaOptFlow.hpp header? So we could have something like:

  • cudaOptFlow.hpp

  • cudaOptFlow

-- pyrLKOptFlow.hpp <-- interface, included by cudaOptFlow.hpp

-- pyrLKOptFlow_impl.hpp <-- implementation, not included by cudaOptFlow.hpp

By exposing the interface in this fashion, I can optionally access and inherit from the implementation. When including the top level cudaOptFlow.hpp file, only the interface is exposed and the creation function. So it should be just as fast to compile code using the interface.

I can see why using the implementation could be problematic with the two first points of the coding style guide:

  • We want our API to stay stable when the implementation changes.

  • We want to preserve not only source-level compatibility, but binary-level compatibility as well.

But I believe using just the interface should achieve this, and I think the application developer should have the choice of using the implementation and dealing with updating things accordingly if he wants the extra control.

2015-10-08 04:01:45 -0500 received badge  Teacher (source)
2015-10-07 08:01:51 -0500 answered a question How can Caffe be interfaced using OpenCV

If you want to use the version of caffe that you built, you can use something like this to interface it with your code: Mainly look at the wrapInput() for wrapping a cv::cuda::GpuMat into a caffe::blob, and doProcess for actually processing an image with caffe.

2015-10-06 10:08:50 -0500 asked a question Slow initial call in a batch of cuda sparse pyrLK optical flow operations.

I'm writing a program that registers a frame to a previous set of frames using optical flow to track key points. My keyframes are stored in a circular buffer fashion and then optical flow is called starting on the oldest frame in the buffer and moving towards the newer frames. I'm doing this on a Windows 7x64 computer with NVidia drivers 353.90 on a GTX Titan X.

Because of the architecture of the program, there may be a delay between batches of operations as new images are loaded, etc. IE the stream queue would look like:

opt flow (20 ms)
opt flow (1 ms)
opt flow (1 ms)
opt flow  (1 ms)
opt flow (1 ms)
opt flow (20 ms)
opt flow  (1 ms)
opt flow (1 ms)

I'm running all of this on a stream, however for the sake of measuring time, I'm calling stream.waitForCompletion(). Ideally when this is working correctly, I'll be able to take out all of the synchronization.

I'm also familiar with the fact that first launches should take longer as the driver compiles code. However I was under the impression that this would just be the first launch, not the first launch in each batch of launches. Is there any way to reduce that 20 ms first call to optical flow to something more reasonable?
Should I setup two streams, so that the memory transfers are on one and I have one stream dedicated to optical flow?

[EDIT] I've tested if it could be a wddm driver queue issue similar to this: by manually flushing the queue with a cudaEventQuery on one of my events, however this doesn't seem to do anything. If I remove synchronization, my second call to optical flow will cost 20ms.

2015-09-02 10:22:24 -0500 commented question Using thrust with cv::cuda::GpuMat

Sure I'll look into it.

2015-09-01 10:46:19 -0500 asked a question Using thrust with cv::cuda::GpuMat

I just struggled with this for a bit so I wanted to post it somewhere where it may be helpful for someone else.

This function can be used for creating a thrust iterator that correctly indexes a cv::cuda::GpuMat.

struct step_functor : public thrust::unary_function<int, int>
    int columns;
    int step;
    step_functor(int columns_, int step_) : columns(columns_), step(step_)  {   };
    __host__ __device__
    int operator()(int x) const
        int row = x / columns;
        int idx = (row * step) + x % columns;
        return idx;

template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor,     thrust::counting_iterator<int>>>  GpuMatBeginItr(cv::cuda::GpuMat mat)
    return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0)),
           step_functor(mat.cols, mat.step / sizeof(T))));

template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor, thrust::counting_iterator<int>>>  GpuMatEndItr(cv::cuda::GpuMat mat)
    return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0)),
          step_functor(mat.cols*mat.rows, mat.step / sizeof(T))));

Thus performing thrust operations on rows / columns is as easy as:

cv::cuda::GpuMat d_test(h_test);

auto keyBegin = GpuMatBeginItr<int>(d_test.col(4));
auto keyEnd = GpuMatEndItr<int>(d_test.col(4));
auto valueBegin = GpuMatBeginItr<int>(d_test.col(5));

thrust::sort_by_key(keyBegin, keyEnd, valueBegin);
2015-08-08 11:03:12 -0500 commented question Opencv 3.0.0 build winder Windows 8 with Gstreamer1.0 support

Hello, I'm running into the same problem and it looks like no one bothered to make scripts for finding gstreamer. I'm currently working on editing OpenCVFindLibsVideo.cmake to find gstreamer properly on windows. I wanted to check if you came to the same conclusion and thus have already done this.

2015-05-01 13:28:59 -0500 received badge  Enthusiast
2015-04-29 16:08:21 -0500 commented question How do you debug C++ applications on Linux?

Thanks, these will both be very useful to me.

2015-04-29 16:05:39 -0500 commented answer calcOpticalFlowPyrLK and goodFeaturesToTrack doesn't work properly in Ros

You could consider using something like CMT ( to tracking the face once your cascade classifier has locked on.

2015-04-21 11:50:59 -0500 answered a question calcOpticalFlowPyrLK and goodFeaturesToTrack doesn't work properly in Ros

prevGrey's scope is within the imageDetect scope, thus it is destroyed on every iteration of the function. Either define prevGrey as static or make it a member variable that is updated at each iteration.

2015-04-21 10:33:01 -0500 asked a question Qt OpenGL window, viewing pixel values and threading.

I've compiled OpenCV 3.0 from source with QT and OpenGL.

I really like the ability to zoom in in a QT based window to view a pixel's value. I also really like the ability to view GpuMat's without downloading them with a cv::WINDOW_OPENGL window.

Is there any way to access both of these functions at the same time? I have it compiled with QT's OpenGL module, so I thought it was using that backend.

I've also noticed that if I call cv::imshow on a cv::WINDOW_NORMAL in a background thread, that it works correctly, but with a cv::WINDOW_OPENGL it throws an exception in opengl.cpp (358) due to gl::GenBuffers returning a null bufId. Is there any way to fix this? I don't have a whole lot of experience with threading opengl applications so I'm not sure if that's just a general limitation or something to due with how my opengl contexts are being accessed.

2015-04-08 15:55:08 -0500 asked a question How do you debug C++ applications on Linux?

I split my time between Windows development and Linux development. I'd like to know what types of debugging solutions are out there that work well for viewing cv::Mat data. I've grown rather dependent on Image Watch and I'd like to have similar functionality in Linux.

I made this python script that can dump an image and display it with a cv::imshow window, which is quite nice but there are some issues with it that I'm trying to sort out.

It's hard for me to believe that with so much Linux development of OpenCV that there isn't already a well known solution similar to Image Watch.

2015-02-02 14:44:00 -0500 commented question OpenCV conflict with QIMage in Ubuntu

Did you compile opencv against Qt? Do you have multiple versions of Qt installed? If so are you using the correct version that opencv compiled against?

2015-02-02 14:07:39 -0500 commented question opencv with gpu crash in debug mode

Do the cuda examples work? Do the opencv cuda examples work? Try different drivers, many times when you have a cuda issue on such a simple section of code, it's because a bad driver. You also didn't mention which gpu you had and the driver version.

2015-02-02 13:56:32 -0500 commented answer Unknown CUDA error : Gpu API call

If opencv is built without cuda, any cuda function call will throw an error clearly saying that it wasn't built with cuda.

2015-02-02 13:55:43 -0500 commented question Unknown CUDA error : Gpu API call

Do the cuda test programs work? Are you in debug mode? I had an odd issue with the ubuntu repository drivers that sounded similar to that. I fixed it by manually installing the newest drivers directly from nvidia, via their ".sh" installer.
Also sometimes I have errors with cuda functions in debug mode that don't appear in release mode.

2015-02-02 13:47:59 -0500 received badge  Editor (source)
2015-02-02 13:45:56 -0500 asked a question Non-continuous GpuMat after cv::cuda::transpose

Opencv3.0x64 Cuda6.5 Windows7x64

I have a continuous cv::cuda::GpuMat of homogeneous points organized in col major format. IE:

X1, X2, X3 .....
Y1, Y2, Y3 .....
1,  1,  1  .....

I am trying to transpose this and manipulate it into this format:


To do this, I should be able to transpose the matrix, reshape it then transpose it again. Unfortunately when I do this I seem to run into a bug with cv::cuda::transpose. After transposing the first matrix, I should have a "diff" matrix like this:

X1, Y1, 1,
X2, Y2, 1,

With floating point values, this should give a step size of 12 bytes and be continuous. Unfortunately the resultant matrix has a step size of 512 bytes and is non-continuous. Because of this, I cannot reshape the matrix.

Is this a bug with cv::cuda::transpose? I've tried the following:

diff = diff.reshape(1,1);
diff = diff.clone().reshape(1,1);
diff = diff.colRange(0,3).clone();
diff = diff.reshape(1,1);

No matter what, reshape throws the exception:

error: (-13) The matrix is not continuous, thus its number of rows can not be changed in function cv::cuda::GpuMat::reshape"

I've downloaded it and the downloaded copy is correct with a step size of 12. Is this some kind of alignment requirement in cuda? How can I reshape the matrix to what I need without downloading it to the CPU?

2015-02-02 11:54:48 -0500 answered a question stitching 2 images together in CUDA like hconcat?

You can allocate a matrix of the correct size and then copy the images into the new matrix manually with the following:

cv::cuda::GpuMat NewImg(img1.rows, img1.cols + img2.cols, img1.type);
img2.copyTo(NewImg(cv::Rect(img1.cols,img1.rows, img2.cols,img2.rows)));
2014-11-11 09:24:15 -0500 answered a question How to convert 2d into 3D image

I believe Q is either H1 or H2 from stereoRectifyUncalibrated depending on which frame imgDisparity8U is with respect to.

2014-11-11 08:53:40 -0500 asked a question cv::cuda::warpPerspective corrupting GPU memory?


I have two warping functions to transform an image between two planes.

One of them projects each point from plane 1 to plane 2 by calculating the mapping for each point and then using cv::cuda::remap. This version works fine with everything else.

I recently created a version that uses cv::cuda::warpPerspective by sampling the mapping between the two planes and calculating a homography. This appears to work at first, but after a few iterations I start having issues with other cuda calls.

In particular cv::cuda::GpuMat::upload throws:

cstr_=0x000000003373bb04 "an illegal memory access was encountered"

Now it is possible that these are operating in parallel since the upload is running on a separate thread than the warpPerspective. I'm currently exploring restructuring so that the upload doesn't occur in parallel. However that doesn't explain why this works when I call cv::cuda::remap but not when I call cv::cuda::warpPerspective. Is this a bug in warpPerspective?

System: Windows 7x64 Opencv 3.0 alpha built from source CUDA 6.5 Quadro K6000, Driver 340.84

2014-11-03 08:57:24 -0500 commented question flann RadiusSearch

Check the shape of the cv::Mat you are passing into the tree constructor. I recently had the issue where the matrix after reshape was a single row. In your case I would have needed to use cv::Mat(scheme_pts).reshape(1,scheme_pts.size())

2014-08-23 04:42:21 -0500 received badge  Nice Question (source)
2014-08-22 14:15:45 -0500 received badge  Student (source)
2014-08-22 14:03:44 -0500 asked a question createsamples missing in 3.0

I just built opencv 3.0 from the master repository and I cannot find createsamples. I've checked the apps folder in the source and I only see traincascade, nothing related to create samples.

Has this program been removed from opencv 3.0? If so how are we supposed to generate a .vec file for traincacade?