1 | initial version |
As berak pointed out you need to keep your data on the GPU, performing some operations on the CPU and some on the GPU without asynchronous operations and a well thought through processing pipeline will probably be slower even with a top of the range GPU.
If your GPU is the M1200, then you have only 80.2 GB/s bandwidth and 1399 GFLOPS which is really pretty slow for a modern GPU. The bandwidth is especially bad if you have the overhead of transferring between the host and the device for every operation. A mid range card like the 1060 has over twice the bandwidth and nearly three times the processing power.
Check out this performance comparison to get an idea of the operations which will benefit most from GPU acceleration and note that this is excluding the costly transfers between the host and the device. According to this spreadsheet If you had a better card (1060) without the overhead then you may see a 2x speedup for the Gaussian filter operation.
2 | No.2 Revision |
As berak pointed out you need to keep your data on the GPU, performing some operations on the CPU and some on the GPU without asynchronous operations and a well thought through processing pipeline will probably be slower even with a top of the range GPU.
If your GPU is the M1200, then you have only 80.2 GB/s bandwidth and 1399 GFLOPS which is really pretty slow for a modern GPU. The bandwidth is especially bad if you have the overhead of transferring between the host and the device for every operation. A mid range desktop card like the 1060 GTX1060 has over twice the bandwidth and nearly three times the processing power.
Check out this performance comparison to get an idea of the operations which will benefit most from GPU acceleration and note that this is excluding the costly transfers between the host and the device. According to this spreadsheet If you had a better card (1060) without the overhead then you may see a 2x speedup for the Gaussian filter operation.