Read vector into CUDA shared memory
- by Ben
I am new to CUDA and programming GPUs. I need each thread in my block to use a vector of length ndim. So I thought I might do something like this:
extern __shared__ float* smem[];
...
if (threadIddx.x == 0) {
for (int d=0; d<ndim; ++d) {
smem[d] = vector[d];
}
}
__syncthreads();
...
This works fine. However, I seems wasteful that…