2019-12-27 07:50:22 -0600 | received badge | ● Popular Question (source) |
2018-07-19 09:13:06 -0600 | received badge | ● Notable Question (source) |
2016-08-08 05:36:40 -0600 | received badge | ● Popular Question (source) |
2014-11-21 09:57:37 -0600 | received badge | ● Nice Answer (source) |
2014-11-19 11:39:12 -0600 | commented question | gpu::convolve and gpu::filter2D vs cv::filter2D, opencv 2.4.9 Did you ever figure out how to make this work? Seems like a pretty huge issue. If nothing else, it needs better documentation. |
2014-11-19 11:38:53 -0600 | answered a question | gpu::convolve and gpu::filter2D vs cv::filter2D, opencv 2.4.9 Did you ever figure out how to make this work? Seems like a pretty huge issue. If nothing else, it needs better documentation. |
2014-09-13 03:29:12 -0600 | commented answer | resize and remap functions utterly wrong Wowza! Great reply. I hear you on the resize, it certainly can be ambiguous. But I ran through the logical options, and the opencv resize results still didn't make sense. Sounds like there needs to be a flag for remap to turn off the weighting table, and get more accurate results. I just made my own version of the remap function, with only ~30% more time taken on a 10000x10000 float image (using TBB and a 64 core machine). Another oddity, on the same run, I tried the "convertMaps" function, but the "performant" maps HURT performance by a factor of 4 (making it more than twice slower than mine)! Not sure if/how opencv uses threads, but this suggests to me that the performance improvement(?) may not be worth it to the typical user. |
2014-09-13 03:14:57 -0600 | received badge | ● Supporter (source) |
2014-09-12 20:48:33 -0600 | received badge | ● Student (source) |
2014-09-12 17:42:34 -0600 | asked a question | resize and remap functions utterly wrong As far as I can tell, both remap and resize functions are implemented incorrectly (at least with default bilinear interpolation). Consider the following code: And output: The results for resize and remap should both be smooth in 1/3 intervals - I don't know what the heck opencv is doing. To me, these seem completely inaccurate results. Please enlighten me! |
2014-09-02 16:06:20 -0600 | received badge | ● Scholar (source) |
2014-08-29 08:54:16 -0600 | commented answer | GPU Cuda initialization much slower with opencv libraries Thanks Steve, it says I need >50 points to mark my own answer as a solution, so I'll have to come back later and do it. Yeah the GUI stuff was the final straw that gave back all the performance I needed, woohoo! |
2014-08-29 05:06:11 -0600 | received badge | ● Teacher (source) |
2014-08-29 02:11:46 -0600 | received badge | ● Self-Learner (source) |
2014-08-28 16:17:38 -0600 | answered a question | GPU Cuda initialization much slower with opencv libraries Problem solved! For anyone wanting to know how to speed up opencv initialization, here ya go:
These three changes took my start time from ~7.5 seconds down to ~0.7 seconds (almost the same as it is without opencv at all). Here's the cmake flags I changed to do the above: Hope this helps someone out in the future - there certainly is little information out there about this. |
2014-08-28 12:48:58 -0600 | commented question | GPU Cuda initialization much slower with opencv libraries Another update - compiling with static libs shaved off another 3 seconds! I'm getting there... |
2014-08-28 09:14:24 -0600 | commented question | GPU Cuda initialization much slower with opencv libraries Another update - opencv 2.4.9 is slightly slower by about 0.1 seconds, so that didn't help. |
2014-08-27 12:37:11 -0600 | commented question | GPU Cuda initialization much slower with opencv libraries Reporting back, setting CUDA_ARCH_BIN dropped about 2 seconds off the initialization time, but I'm still looking at around a ~6 second startup lag. If opencv isn't compiling PTX now, what on earth is it doing? |
2014-08-26 12:05:35 -0600 | commented question | GPU Cuda initialization much slower with opencv libraries Another thought, the documentation makes me believe that the cuda code is precompiled by default for compute capabilities 1.1 and 1.3, and that perhaps if I add CUDA_ARCH_BIN=3.5 to the CMake defines it'll precompile the cuda kernels for my K20c. I'm trying it out and will report back if it helps. |
2014-08-26 10:18:31 -0600 | received badge | ● Editor (source) |
2014-08-26 09:52:40 -0600 | commented question | GPU Cuda initialization much slower with opencv libraries Thanks for the reply Steven. Unfortunately, I don't have the luxury of that startup lag being acceptable. According to the opencv documentation, it could be doing the JIT PTX compilation, and that CUDA_DEVCODE_CACHE should be used to cache the PTX code for future use, but that feature does not seem to be working. Has anyone ever even tried this? Google fails me (or maybe I fail Google). |
2014-08-25 17:26:42 -0600 | asked a question | GPU Cuda initialization much slower with opencv libraries Hello all, Prereqs for posting, my environment: Linux x86 64, OpenCV 2.4.6.1, CUDA 5.0, Tesla Kepler K20c GPU I've got a simple C++ application to benchmark cuda performance. It makes and times the following calls once each in order: With just the cuda libraries linking, this takes ~10s of milliseconds for each call except for the malloc, which is about 0.25 seconds. Fine...no biggie, it's all part of GPU startup costs. Here's the weird part - if I include libopencv_gpu.so and libopencv_core.so in the linker list (-lopencv_gpu -lopencv_core), without changing code whatsoever, those timings go through the roof. The cudaSetDevice call takes ~2.5 seconds, and the malloc takes ~5 seconds. Calls after that seem to be just as fast, but a ~7.5 second startup cost is ridiculous considering it's only ~.5 seconds without the opencv libraries. Another oddity, taking out libopencv_gpu and just leaving the core library still has an effect: the set device call still takes ~2.5 seconds, and the malloc takes ~.7 seconds. What gives? This affects more than my benchmark app, and it is repeatable. Does anyone have any insight on how opencv is destroying my startup performance? I tried setting CUDA_DEVCODE_CACHE to /tmp/devcode, thinking it was PTX compilations, but nothing was made in the directory - am I using it wrong? Any help would be great. Thanks! |