Read vector into CUDA shared memory

Posted by Ben on Stack Overflow See other posts from Stack Overflow or by Ben
Published on 2010-05-31T07:21:33Z Indexed on 2010/05/31 7:22 UTC
Read the original article Hit count: 400

Filed under:

I am new to CUDA and programming GPUs. I need each thread in my block to use a vector of length ndim. So I thought I might do something like this:

extern __shared__ float* smem[];
...
if (threadIddx.x == 0) {
   for (int d=0; d<ndim; ++d) {
       smem[d] = vector[d];
   }
}
__syncthreads();
...

This works fine. However, I seems wasteful that a single thread should do all loading, so I changed the code to

if (threadIdx.x < ndim) {
   smem[threadIdx.x] = vector[threadIdx.x];
}

__syncthreads();

which does not work. Why? It gives different results than the above code even when ndim << blockDim.x.

© Stack Overflow or respective owner

Related posts about cuda