CUDA threads output different value
- by kar
HAi,
I wrote a cuda program , i have given the kernel function below. The device memory is
allocated through CUDAMalloc();
the value of *md is 10;
__global__ void add(int *md)
{
int x,oper=2;
x=threadIdx.x;
* md = *md*oper;
if(x==1)
{
*md = *md*0;
}
if(x==2)
{
*md = *md*10;
}
if(x==3)
{
*md = *md+1;
}
if(x==4)
{
*md = *md-1;
}
}
executed the above code
add<<<1,5>>(*md) , add<<<1,4>>>(*md)
for <<<1,5>>> the output is 19
for <<<1,4>>> the output is 21
1) I have doubt that cudaMalloc() will allocate in device main memory?
2) Why the last thread alone is executed always in the above program?
Thank you