sammoes's profile - activity

overview network karma followed questions activity

2016-12-04 21:44:34 -0600	answered a question	SVM trainAuto v/s train You don't need to first use train. TrainAuto searches the SVM parameter space using cross validation. It doesn't search the parameter space (if you have any) of your features. Try stepping into trainAuto with a debugger to see what it gets you. I think the main problem is that trainAuto can still give you a local optimum. See the libsvm website for advice on a good approach.
2016-12-04 21:40:09 -0600	commented question	How to read images from a camera buffer See if you can build with gstreamer or add codecs for your OS.
2016-12-04 21:35:30 -0600	answered a question	build without highgui or gstreamer Check highgui.hpp: if there is nothing you need, then see if CMake allows you to not build highgui, if it doesn't, then just let CMake prepare the build files, then you can exclude the highgui module when compiling OpenCV but that will cause a lot of problems for you to solve, make sure to not build any example apps. It would be best to let CMake exclude highgui
2016-12-04 21:29:19 -0600	commented question	opencv with gpu crash in debug mode why not move to OpenCV3 ? Which CUDA version do you have? Only 1 CUDA? Driver up to date?
2016-12-04 20:50:34 -0600	commented question	Why is the CUDA version slower than the OpenCL version? I mistakenly merged the original channels, not the output channels. By reusing the input channels for output, this saves a little bit of initialization. I also see that my data only exists in one channel, this saves a little more. The cuda version now times in at the low 4. seconds. Still slower than OpenCL unfortunately.
2016-12-04 19:28:12 -0600	received badge	● Editor (source)
2016-12-04 19:21:58 -0600	commented question	Why is the CUDA version slower than the OpenCL version? Do you mean, to get to know which function call is the slowest? `cv::cuda::normalize(d_matarr[1], d_sdst, 0, 255, NORM_MINMAX, 0, noArray());// <- slowest call` this one stands out (~64ms). (I step through it repeatedly using the profiler of VS2015). It did occur to me that the CUDAFilters dll is 300MB, but the lookup time should be marginal. What else could it be? I don't think there is much reason for this normalize call to be exceptionally slow on CUDA8, I mean, it's an NVidia GPU and I try all three memory options. Same goes for the other functions compared to OpenCL1.2. I add a detailed timing image.
2016-12-03 07:11:36 -0600	asked a question	Why is the CUDA version slower than the OpenCL version? Hi, I have written a CUDA (8 on my machine) version of a program and compared it to an OpenCL(1.2) / T-API version. The former clocks in quite a bit slower even when using Unified Memory (UM). Could someone advise please? The normalize() function is multi-channel in the T-API, but underneath probably isn't. I had expected Shared Virtual Memory (UM in CUDA) to be faster, which I can't do with my PC because it is limited to OpenCL1.2... I read somewhere it can depend on the size or complexity of the filters, whether pixels are reread etc. but that would be the same for the CL version, wouldn't it? CUDA (5-6 sec.) ma = HostMem::getAllocator(HostMem::PAGE_LOCKED); cv::Mat::setDefaultAllocator(ma); prev_frame = GpuMat(read_frame); for (int i = 0; i < 100; i++) { d_out = ImEnhance(prev_frame); }//time ma = HostMem::getAllocator(HostMem::WRITE_COMBINED); cv::Mat::setDefaultAllocator(ma); prev_frame = GpuMat(read_frame); for (int i = 0; i < 100; i++) { d_out = ImEnhance(prev_frame); }//time ma = HostMem::getAllocator(HostMem::SHARED); cv::Mat::setDefaultAllocator(ma); prev_frame = GpuMat(read_frame); for (int i = 0; i < 100; i++) { d_out = ImEnhance(prev_frame); }//time` : GpuMat ImEnhance(GpuMat frm){ GpuMat HSV; // cuda::GpuMat d_hdst, d_sdst, d_vdst; cuda::GpuMat d_matarr[3]; cv::Ptr<cv::cuda::Filter> blur = cv::cuda::createGaussianFilter(frm.type(), frm.type(),Size(3,3), 9, 9); cv::Ptr<cv::cuda::Filter> blur2 = cv::cuda::createGaussianFilter(frm.type(), frm.type(),Size(9,9), 1, 1); cv::cuda::cvtColor(frm, HSV, COLOR_BGR2HSV, 0); cuda::split(HSV, d_matarr); //cv::cuda::normalize(d_matarr[0], d_matarr[0], 0, 255, NORM_MINMAX, 0, noArray()); //cv::cuda::normalize(d_matarr[1], d_matarr[1], 0, 255, NORM_MINMAX, 0, noArray());// <- slowest call cv::cuda::normalize(d_matarr[2], d_matarr[2], 0, 255, NORM_MINMAX, 0, noArray()); //<- my data lives in the third channel only cv::cuda::merge(d_matarr, 3, HSV); cv::cuda::cvtColor(HSV, frm, COLOR_HSV2BGR, 0); blur->apply(frm, HSV); cv::cuda::addWeighted(frm, 1.5, HSV, -1.0, 0.0, frm, -1); frm.convertTo(frm, -1, 2, 0); blur2->apply(frm, frm); d_hdst.release(); d_sdst.release(); d_vdst.release(); HSV.release(); return frm; } OpenCL (3-4 sec) `UMat ImEnhance(UMat frm) { UMat HSV; UMat HSV2; UMat HSV3; cvtColor(frm, HSV, COLOR_BGR2HSV); normalize(HSV, HSV2, 0, 255, NORM_MINMAX); cvtColor(HSV2, HSV3, COLOR_HSV2BGR); UMat img2; GaussianBlur(HSV3, img2, Size(3, 3), 9, 9); addWeighted(HSV3, 1.5, img2, -1.0, 0.0, frm); img2.release(); UMat img3; frm.convertTo(img3, -1, 2, 0); GaussianBlur(img3, frm, Size(9, 9), 1, 1); HSV.release(); HSV2.release(); HSV3.release(); img2.release(); img3.release(); return frm; }` Note. The profile image I made is no longer correct. The code here on the forum changed from my initial question. This code is the most optimized version without changing OpenCV3's source code. It must be the CPU GPU data transfers that take up nearly all the time spent. My test data were 2200x1600 images,
2016-02-27 05:52:58 -0600	asked a question	3.1 samples CUDA, compiling with CUDA Hello, I have just started using OpenCV 3.1 (not the contrib branch) and haven't really kept up with OpenCv for a year or more. Now I have built and installed with CUDA: Device 0: "GeForce GTX 950M" CUDA Driver Version / Runtime Version 7.50 / 7.50 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 4096 MBytes (4294836224 bytes) GPU Clock Speed: 1.12 GHz Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support enabled: No Device is using TCC driver mode: No Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) ` deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.50, CUDA Runtime Version = 7.50, NumDevs = 1 Unfortunately, many error related to using the GPU occur: Are there dependencies for the samples in /bin? (i.e. caltech images) opencv_test_cudaarithm : `[----------] 8 tests from CUDA_ImgProc/Demosaicing [ RUN ] CUDA_ImgProc/Demosaicing.BayerBG2BGR/0 .../opencv/modules/cudaimgproc/test/test_color.cpp:2360: Failure Value of: img.empty() Actual: true Expected: false Can't load input image : :` Similarly `./opencv_test_cudaobjdetect CTEST_FULL_OUTPUT OpenCV version: 3.1.0 OpenCV VCS version: 3.1.0 Build type: release Parallel framework: pthreads CPU features: OpenCL is disabled [==========] Running 11 tests from 5 test cases. [----------] Global test environment set-up. [----------] 7 tests from detect/CalTech [ RUN ] detect/CalTech.HOG/0 .../opencv/modules/cudaobjdetect/test/test_objdetect.cpp:236: Failure Value of: img.empty() Actual: true Expected: false : :` I have had to sudo `cp libippicv.a /usr/local/lib/` in order to make a simple cuda_test.cpp: CFLAGS = `pkg-config --cflags opencv` LIBS = `pkg-config --libs opencv` INC = -I/usr/local/cuda-7.5/targets/x86_64-linux/include % : %.cpp g++ $(INC) $(CFLAGS) $(LIBS) -o $@ $< But this still resulted in a large number of errors: : : cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x258): undefined reference to `ncvDebugOutput(cv::String const&)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI21HaarClassifierNode128E9copySolidERS1_P11CUstream_stm]+0x300): undefined reference to`memSegCopyHelper(void, NCVMemoryType, void const, NCVMemoryType, unsigned long, CUstream_st)' /tmp/ccJkx6c5.o: In function `NCVVector<HaarFeature64>::copySolid(NCVVector<HaarFeature64>&, CUstream_st*, unsigned long) const': cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x3d): undefined reference to`cv::format(char const, ...)' cascadeclassifier_nvidia_api.cpp:(.text._ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm[_ZNK9NCVVectorI13HaarFeature64E9copySolidERS1_P11CUstream_stm]+0x82): undefined reference to `cv ...` (more)