Why is rvalue write in shared memory array serialised?
        Posted  
        
            by 
                CJM
            
        on Ask Ubuntu
        
        See other posts from Ask Ubuntu
        
            or by CJM
        
        
        
        Published on 2012-04-05T21:25:39Z
        Indexed on 
            2012/04/05
            23:43 UTC
        
        
        Read the original article
        Hit count: 330
        
I'm using CUDA 4.0 on a GPU with computing capability 2.1. One of my device functions is the following:
device void test(int n, int* itemp) // itemp is shared memory pointer
{
    const int tid = threadIdx.x; const int bdim = blockDim.x;
int i, j, k; bool flag = 0;
itemp[tid] = 0;
for(i=tid; i<n; i+=bdim)
{   // { code that produces some values of "flag" }
}
itemp[tid] = flag;
}
Each thread is checking some conditions and producing a 0/1 flag. Then each thread is writing flag at the tid-th location of a shared int array. The write statement "itemp[tid] = flag;" gets serialized -- though "itemp[tid] = 0;" is not. This is causing huge performance lag which technically should not be there -- I want to avoid it. Please help.
© Ask Ubuntu or respective owner