Some operations e.g. matrix multiplication or convolution can be done faster on custom accelerators. It would be good if such support can be done in general, so that users would be able just to switch the device and get the advantage of the acceleration. Is there any plan for that?