Optimize code performance when odd/even threads are doing different things in CUDA

Posted by Orion Nebula on Stack Overflow See other posts from Stack Overflow or by Orion Nebula
Published on 2010-05-18T11:29:15Z Indexed on 2010/05/20 12:10 UTC
Read the original article Hit count: 352

Filed under:
|
|
|
|

Hi all!

I have two large vectors, I am trying to do some sort of element multiplication, where an even-numbered element in the first vector is multiplied by the next odd-numbered element in the second vector .... and where the odd-numbered element in the first vector is multiplied by the preceding even-numbered element in the second vector

Ex.

vector 1 is V1(1) V1(2) V1(3) V1(4)

vector 2 is V2(1) V2(2) V2(3) V2(4)

V1(1) * V2(2)

V1(3) * V2(4)

V1(2) * V2(1)

V1(4) * V2(3)

I have written a Cuda code to do this: (Pds has the elements of the first vector in shared memory, Nds the second Vector)

//instead of using %2 .. i check for the first bit to decide if number is odd/even --> faster

if ((tx & 0x0001) ==  0x0000)
    Nds[tx+1] = Pds[tx] * Nds[tx+1];
else
    Nds[tx-1] = Pds[tx] * Nds[tx-1];
__syncthreads();

Is there anyway to further accelerate this code or avoid divergence ?

Thanks

© Stack Overflow or respective owner

Related posts about cuda

Related posts about odd