Hi,

I am faciong a qweird problem. I have two matrix, input and output. For each element i in a row r in the input matrix, it sums all element before i in that row and put the sum in i th column and r th row in output matrix. When I do this for small matrix, it gives me no problem. But for a big matrix (604x454), the output matrix contains GRARBAGE value in all output matrix rows except the first row.

```
"__kernel void "
" v2_integral_cols_sum(__global uchar *src,\n"
"int rows,int cols,__global int *lm_sum, int pixels,int steps,int o_steps)\n"
"{\n"
"int gid=get_global_id(0);\n"
"if(gid>=pixels)"
"return;\n"
"else \n"
"{"
"int x = gid % steps;\n"
"int y = gid / steps;\n"
"int sum=0;\n"
"for (int i=0;i<=x;i++)"
"{\n"
"sum=sum+src[y * steps + i];\n"
"}\n"
"lm_sum[y*o_steps + x]=sum;\n"
"}"
"}\n"
```

interestingly if I use the printf statement after the output assignment statement as in this block,I get correct output matrix. I do not find any explanation of this weird behavior. Anyone can help me?

```
"lm_sum[y*o_steps + x]=sum;\n"
"if(gid==640)\n"
"{"
"printf(\"lm_sum [%d %d %d] %d \",x,y,gid,lm_sum[y * o_steps+ x]);\n"
"}\n"
```