### BIG gpu matrix division

Hi there,

I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:

Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.

After tracing down to the source I found

```
const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
```

and

```
const dim3 block(Policy::block_size_x, Policy::block_size_y)
```

Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy

```
struct DefaultTransformPolicy
{
enum {
block_size_x = 32,
block_size_y = 8,
shift = 4
};
};
```

which looks don't fit well with my case because 91,204 already exceeds the limit of ~~512.~~65536.

I was wondering how this policy is decided.
Is it possible to override it within my own code, without rebuilding the library?