Revision history [back]

OpenCV reduction based operations

I'm working to improve the work-group reduce functions in a Linux OpenCL stack for GPGPUs. The work-group functions are briefly described here [1] - in short they enable add/min/max collaborative work between threads in the same work-group. For instance workgroup_reduce_add with 4 local threads would do (1, 2, 3, 4, 5, 6, 7, 8) => {(10, 10, 10, 10), (26, 26, 26, 26)}, while workgroup_reduce_min with 2 local threads would be {(1, 1), (3, 3), (5, 5), (7, 7)}.

Are you aware of any particular algorithms that would benefit from the workgroup reduce add/min/max ? - maybe something SIMD oriented that would though require moderate thread communication.

[1] https://software.intel.com/en-us/articles/using-opencl-20-work-group-functions

Guidance OpenCV reduction based operationsalgorithms

[ SHORT ] Are you aware of any particular algorithms that would benefit from the work-group reduce add/min/max ? - maybe something SIMD oriented that would though require moderate thread communication / map-reduce type operation.

[ DETAIL ] I'm working to improve the work-group reduce functions in a Linux OpenCL stack for GPGPUs. The work-group functions are briefly described here [1] - in short they enable add/min/max collaborative work between threads in the same work-group. For instance ~~workgroup_reduce_add~~ work-group_reduce_add with 4 local threads would do (1, 2, 3, 4, 5, 6, 7, 8) => {(10, 10, 10, 10), (26, 26, 26, 26)}, while ~~workgroup_reduce_min~~ work-group_reduce_min with 2 local threads would be {(1, 1), (3, 3), (5, 5), (7, ~~7)}.~~

Are you aware of any particular algorithms 7)}. I'm looking at OpenCV as a source for real world problems that would benefit from the ~~workgroup~~ work-group reduce ~~add/min/max ? - maybe something SIMD oriented that would though require moderate thread communication.~~functions.

[1] https://software.intel.com/en-us/articles/using-opencl-20-work-group-functions