Revision history [back]

BIG gpu matrix division

Hi there,

I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:

Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.

After tracing down to the source I found

const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));

and

const dim3 block(Policy::block_size_x, Policy::block_size_y)

Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy

struct DefaultTransformPolicy
{
    enum {
        block_size_x = 32,
        block_size_y = 8,
        shift = 4
    };
};

which looks don't fit well with my case.

I was wondering how is this policy decided. Is it possible to override it within my own code, without rebuilding the library?

BIG gpu matrix division

Hi there,

I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:

Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.

After tracing down to the source I found

const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));

and

const dim3 block(Policy::block_size_x, Policy::block_size_y)

Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy

struct DefaultTransformPolicy
{
    enum {
        block_size_x = 32,
        block_size_y = 8,
        shift = 4
    };
};

which looks don't fit well with my ~~case.~~case because 91,204 already exceeds the limit of 512.

I was wondering how is this policy decided. Is it possible to override it within my own code, without rebuilding the library?

BIG gpu matrix division

Hi there,

I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:

Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.

After tracing down to the source I found

const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));

and

const dim3 block(Policy::block_size_x, Policy::block_size_y)

Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy

struct DefaultTransformPolicy
{
    enum {
        block_size_x = 32,
        block_size_y = 8,
        shift = 4
    };
};

which looks don't fit well with my case because 91,204 already exceeds the limit of 512.

I was wondering how is this policy is decided. Is it possible to override it within my own code, without rebuilding the library?

BIG gpu matrix division

Hi there,

I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:

Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.

After tracing down to the source I found

const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));

and

const dim3 block(Policy::block_size_x, Policy::block_size_y)

Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy

struct DefaultTransformPolicy
{
    enum {
        block_size_x = 32,
        block_size_y = 8,
        shift = 4
    };
};

which looks don't fit well with my case because 91,204 already exceeds the limit of ~~512.~~65536.

I was wondering how this policy is decided. Is it possible to override it within my own code, without rebuilding the library?