How to copy the memeory allocated in device function back to main memory

Posted by xhe8 on Stack Overflow See other posts from Stack Overflow or by xhe8
Published on 2012-10-24T04:45:32Z Indexed on 2012/10/24 5:01 UTC
Read the original article Hit count: 118

Filed under:

I have a CUDA program containing a host function and a device function Execute(). In the host function, I allocate a global memory output which will then be passed to the device function and used to store the address of the global memory allocated within the device function. I want to access the in-kernel allocated memory in the host function. The following is the code:

#include <stdio.h>
typedef struct                      
{
  int             * p;            
  int              num;            
} Structure_A;

\__global__ void Execute(Structure_A *output);

int main(){

    Structure_A *output;
    cudaMalloc((void***)&output,sizeof(Structure_A)*1);
    dim3 dimBlockExecute(1,1);
    dim3 dimGridExecute(1,1);
    Execute<<<dimGridExecute,dimBlockExecute>>>(output);
    Structure_A * output_cpu;
    int * p_cpu;
    cudaError_t err;

    output_cpu= (Structure_A*)malloc(1);
    err=cudaMemcpy(output_cpu,output,sizeof(Structure_A),cudaMemcpyDeviceToHost);    
    if( err != cudaSuccess)
    {
        printf("CUDA error a: %s\n", cudaGetErrorString(err));
        exit(-1);
    }
    p_cpu=(int *)malloc(1);
    err=cudaMemcpy(p_cpu,output_cpu[0].p,sizeof(int),cudaMemcpyDeviceToHost);    
    if( err != cudaSuccess)
    {
        printf("CUDA error b: %s\n", cudaGetErrorString(err));
        exit(-1);
    }   
    printf("output=(%d,%d)\n",output_cpu[0].num,p_cpu[0]);
    return 0;
}

\__global__ void Execute(Structure_A *output){

    int thid=threadIdx.x;

    output[thid].p= (int*)malloc(thid+1);

    output[thid].num=(thid+1);

    output[thid].p[0]=5;
} 

I can compile the program. But when I run it, I got a error showing that there is a invalid argument in the following memory copy function. "err=cudaMemcpy(p_cpu,output_cpu[0].p,sizeof(int),cudaMemcpyDeviceToHost);"

CUDA version is 4.2. CUDA card: Tesla C2075 OS: x86_64 GNU/Linux

© Stack Overflow or respective owner

Related posts about cuda