Optimizing Vector elements swaps using CUDA

Posted by Orion Nebula on Stack Overflow See other posts from Stack Overflow or by Orion Nebula
Published on 2010-05-20T11:07:41Z Indexed on 2010/05/20 11:10 UTC
Read the original article Hit count: 260

Filed under:
|
|
|

Hi all,

Since I am new to cuda .. I need your kind help I have this long vector, for each group of 24 elements, I need to do the following: for the first 12 elements, the even numbered elements are multiplied by -1, for the second 12 elements, the odd numbered elements are multiplied by -1 then the following swap takes place:

Graph: because I don't yet have enough points, I couldn't post the image so here it is:

http://www.freeimagehosting.net/image.php?e4b88fb666.png

I have written this piece of code, and wonder if you could help me further optimize it to solve for divergence or bank conflicts ..

//subvector is a multiple of 24, Mds and Nds are shared memory

_shared_ double Mds[subVector];

_shared_ double Nds[subVector];

int tx = threadIdx.x;
int tx_mod = tx ^ 0x0001;
int  basex = __umul24(blockDim.x, blockIdx.x);

 Mds[tx] = M.elements[basex + tx];
__syncthreads();

// flip the signs 
 if (tx < (tx/24)*24 + 12)
 {  
    //if < 12 and even
    if ((tx & 0x0001)==0)
    Mds[tx] = -Mds[tx];
 }
 else
 if (tx < (tx/24)*24 + 24)
 {
    //if >12 and < 24 and odd
    if ((tx & 0x0001)==1)
    Mds[tx] = -Mds[tx];
 }

__syncthreads();

 if (tx < (tx/24)*24 + 6)
 {  
//for the first 6 elements .. swap with last six in the 24elements group (see graph)
    Nds[tx] = Mds[tx_mod + 18];
    Mds [tx_mod + 18] = Mds [tx];
    Mds[tx] = Nds[tx];
 }
 else
 if (tx < (tx/24)*24 + 12)
 {
    // for the second 6 elements .. swp with next adjacent group (see graph)
    Nds[tx] = Mds[tx_mod + 6];
    Mds [tx_mod + 6] = Mds [tx];
    Mds[tx] = Nds[tx];
 }   

__syncthreads();

Thanks in advance ..

© Stack Overflow or respective owner

Related posts about cuda

Related posts about optimization