Read vector into CUDA shared memory
Posted
by Ben
on Stack Overflow
See other posts from Stack Overflow
or by Ben
Published on 2010-05-31T07:21:33Z
Indexed on
2010/05/31
7:22 UTC
Read the original article
Hit count: 400
cuda
I am new to CUDA and programming GPUs. I need each thread in my block to use a vector of length ndim
. So I thought I might do something like this:
extern __shared__ float* smem[];
...
if (threadIddx.x == 0) {
for (int d=0; d<ndim; ++d) {
smem[d] = vector[d];
}
}
__syncthreads();
...
This works fine. However, I seems wasteful that a single thread should do all loading, so I changed the code to
if (threadIdx.x < ndim) {
smem[threadIdx.x] = vector[threadIdx.x];
}
__syncthreads();
which does not work. Why? It gives different results than the above code even when ndim << blockDim.x.
© Stack Overflow or respective owner