Guidance OpenCV reduction based algorithms
[ SHORT ] Are you aware of any particular algorithms that would benefit from the work-group reduce add/min/max ? - maybe something SIMD oriented that would though require moderate thread communication / map-reduce type operation.
[ DETAIL ] I'm working to improve the work-group reduce functions in a Linux OpenCL stack for GPGPUs. The work-group functions are briefly described here [1] - in short they enable add/min/max collaborative work between threads in the same work-group. For instance work-group_reduce_add with 4 local threads would do (1, 2, 3, 4, 5, 6, 7, 8) => {(10, 10, 10, 10), (26, 26, 26, 26)}, while work-group_reduce_min with 2 local threads would be {(1, 1), (3, 3), (5, 5), (7, 7)}. I'm looking at OpenCV as a source for real world problems that would benefit from the work-group reduce functions.
may be you can make an issue
but I'm looking into guidance on what algorithms would suit work-group reduce OP... how can I turn that into an issue ?
sometime issue are only question