Ask Your Question

sammoes's profile - activity

2016-12-04 21:44:34 -0500 answered a question SVM trainAuto v/s train

You don't need to first use train. TrainAuto searches the SVM parameter space using cross validation. It doesn't search the parameter space (if you have any) of your features. Try stepping into trainAuto with a debugger to see what it gets you. I think the main problem is that trainAuto can still give you a local optimum. See the libsvm website for advice on a good approach.

2016-12-04 21:40:09 -0500 commented question How to read images from a camera buffer

See if you can build with gstreamer or add codecs for your OS.

2016-12-04 21:35:30 -0500 answered a question build without highgui or gstreamer

Check highgui.hpp: if there is nothing you need, then see if CMake allows you to not build highgui, if it doesn't, then just let CMake prepare the build files, then you can exclude the highgui module when compiling OpenCV but that will cause a lot of problems for you to solve, make sure to not build any example apps. It would be best to let CMake exclude highgui

2016-12-04 21:29:19 -0500 commented question opencv with gpu crash in debug mode

why not move to OpenCV3 ? Which CUDA version do you have? Only 1 CUDA? Driver up to date?

2016-12-04 20:50:34 -0500 commented question Why is the CUDA version slower than the OpenCL version?

I mistakenly merged the original channels, not the output channels. By reusing the input channels for output, this saves a little bit of initialization. I also see that my data only exists in one channel, this saves a little more. The cuda version now times in at the low 4. seconds. Still slower than OpenCL unfortunately.

2016-12-04 19:28:12 -0500 received badge  Editor (source)
2016-12-04 19:21:58 -0500 commented question Why is the CUDA version slower than the OpenCL version?

Do you mean, to get to know which function call is the slowest?

cv::cuda::normalize(d_matarr[1], d_sdst, 0, 255, NORM_MINMAX, 0, noArray());// <- slowest call

this one stands out (~64ms). (I step through it repeatedly using the profiler of VS2015). It did occur to me that the CUDAFilters dll is 300MB, but the lookup time should be marginal. What else could it be? I don't think there is much reason for this normalize call to be exceptionally slow on CUDA8, I mean, it's an NVidia GPU and I try all three memory options. Same goes for the other functions compared to OpenCL1.2. I add a detailed timing image.

2016-12-03 07:11:36 -0500 asked a question Why is the CUDA version slower than the OpenCL version?

Hi, I have written a CUDA (8 on my machine) version of a program and compared it to an OpenCL(1.2) / T-API version. The former clocks in quite a bit slower even when using Unified Memory (UM). Could someone advise please? The normalize() function is multi-channel in the T-API, but underneath probably isn't. I had expected Shared Virtual Memory (UM in CUDA) to be faster, which I can't do with my PC because it is limited to OpenCL1.2... I read somewhere it can depend on the size or complexity of the filters, whether pixels are reread etc. but that would be the same for the CL version, wouldn't it?

CUDA (5-6 sec.)

ma = HostMem::getAllocator(HostMem::PAGE_LOCKED);
    prev_frame = GpuMat(read_frame);

for (int i = 0; i < 100; i++) {
    d_out = ImEnhance(prev_frame);

ma = HostMem::getAllocator(HostMem::WRITE_COMBINED);
prev_frame = GpuMat(read_frame);

for (int i = 0; i < 100; i++) {
    d_out = ImEnhance(prev_frame);

ma = HostMem::getAllocator(HostMem::SHARED);
prev_frame = GpuMat(read_frame);

for (int i = 0; i < 100; i++) {
    d_out = ImEnhance(prev_frame);


GpuMat ImEnhance(GpuMat frm){
GpuMat HSV;
//    cuda::GpuMat d_hdst, d_sdst, d_vdst;
cuda::GpuMat d_matarr[3];
cv::Ptr<cv::cuda::Filter> blur = cv::cuda::createGaussianFilter(frm.type(), frm.type(),Size(3,3), 9, 9);
cv::Ptr<cv::cuda::Filter> blur2 = cv::cuda::createGaussianFilter(frm.type(), frm.type(),Size(9,9), 1, 1);
cv::cuda::cvtColor(frm, HSV, COLOR_BGR2HSV, 0);
cuda::split(HSV, d_matarr);
//cv::cuda::normalize(d_matarr[0], d_matarr[0], 0, 255, NORM_MINMAX, 0, noArray());
//cv::cuda::normalize(d_matarr[1], d_matarr[1], 0, 255, NORM_MINMAX, 0, noArray());// <- slowest call
cv::cuda::normalize(d_matarr[2], d_matarr[2], 0, 255, NORM_MINMAX, 0, noArray()); //<- my data lives in the third channel only
cv::cuda::merge(d_matarr, 3, HSV);
cv::cuda::cvtColor(HSV, frm, COLOR_HSV2BGR, 0);
blur->apply(frm, HSV);
cv::cuda::addWeighted(frm, 1.5, HSV, -1.0, 0.0, frm, -1);
frm.convertTo(frm, -1, 2, 0);
blur2->apply(frm, frm);
return frm;

OpenCL (3-4 sec)

UMat ImEnhance(UMat frm) {

UMat HSV2;
UMat HSV3;
cvtColor(frm, HSV, COLOR_BGR2HSV);
normalize(HSV, HSV2, 0, 255, NORM_MINMAX);
cvtColor(HSV2, HSV3, COLOR_HSV2BGR);
UMat img2;
GaussianBlur(HSV3, img2, Size(3, 3), 9, 9);
addWeighted(HSV3, 1.5, img2, -1.0, 0.0, frm);
UMat img3;
frm.convertTo(img3, -1, 2, 0);
GaussianBlur(img3, frm, Size(9, 9), 1, 1);
return frm;

Note. The profile image I made is no longer correct. The code here on the forum changed from my initial question. This code is the most optimized version without changing OpenCV3's source code. It must be the CPU GPU data transfers that take up nearly all the time spent. My test data were 2200x1600 images,

2016-02-27 05:52:58 -0500 asked a question 3.1 samples CUDA, compiling with CUDA

Hello, I have just started using OpenCV 3.1 (not the contrib branch) and haven't really kept up with OpenCv for a year or more. Now I have built and installed with CUDA:

 Device 0: "GeForce GTX 950M"
 CUDA Driver Version / Runtime Version          7.50 / 7.50
 CUDA Capability Major/Minor version number:    5.0
 Total amount of global memory:                 4096 MBytes (4294836224 bytes)
 GPU Clock Speed:                               1.12 GHz
 Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
 Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       49152 bytes
 Total number of registers available per block: 65536
 Warp size:                                     32
 Maximum number of threads per block:           1024
 Maximum sizes of each dimension of a block:    1024 x 1024 x 64
 Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             512 bytes
 Concurrent copy and execution:                 Yes with 1 copy engine(s)
 Run time limit on kernels:                     Yes
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Concurrent kernel execution:                   Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support enabled:                No
 Device is using TCC driver mode:               No
 Device supports Unified Addressing (UVA):      Yes
 Device PCI Bus ID / PCI location ID:           1 / 0
 Compute Mode:
     Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) `
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 7.50, CUDA Runtime Version = 7.50, NumDevs = 1

Unfortunately, many error related to using the GPU occur:

Are there dependencies for the samples in /bin? (i.e. caltech images)

opencv_test_cudaarithm :

[----------] 8 tests from CUDA_ImgProc/Demosaicing
[ RUN      ] CUDA_ImgProc/Demosaicing.BayerBG2BGR/0
.../opencv/modules/cudaimgproc/test/test_color.cpp:2360: Failure
Value of: img.empty()
  Actual: true
Expected: false
Can't load input image


OpenCV version: 3.1.0
OpenCV VCS version: 3.1.0
Build type: release
Parallel framework: pthreads
CPU features: 
OpenCL is disabled
[==========] Running 11 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from detect/CalTech
[ RUN      ] detect/CalTech.HOG/0
.../opencv/modules/cudaobjdetect/test/test_objdetect.cpp:236: Failure
Value of: img.empty()
  Actual: true
 Expected: false

I have had to sudo cp libippicv.a /usr/local/lib/ in order to make a simple cuda_test.cpp:

CFLAGS = `pkg-config --cflags opencv`
LIBS = `pkg-config --libs opencv`
INC = -I/usr/local/cuda-7.5/targets/x86_64-linux/include

% : %.cpp
    g++ $(INC) $(CFLAGS) $(LIBS) -o [email protected] $<

But this still resulted in a large number of errors:

: : cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference to ncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference tomemSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st)' /tmp/ccJkx6c5.o: In function NCVVector<HaarFeature64>::copySolid(NCVVector<HaarFeature64>&, CUstream_st*, unsigned long) const': cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x3d): undefined reference tocv::format(char const, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x82): undefined reference to cv ... (more)