Hi Adam, in these measurements, how did you ensure synchronisation between GPU and CPU? The last plot would be consistent with the hypothesis that there is no sync, and hence part of the operations are measured in your copy-to-host timings. Regards, Jakob