I am just starting to learn how to use CUDA. I am trying to run some simple example code:
float *ah, *bh, *ad, *bd;
ah = (float *)malloc(sizeof(float)*4);
bh = (float *)malloc(sizeof(float)*4);
cudaMalloc((void **) &ad, sizeof(float)*4);
cudaMalloc((void **) &bd, sizeof(float)*4);
... initialize ah ...
/* copy array on device */
cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice);
cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice);
cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost);
When I run in emulation mode (nvcc -deviceemu) it runs fine (and actually copies the array).
But when I run it in regular mode, it runs w/o error, but never copies the data. It's as if the cudaMemcpy lines are just ignored.
What am I doing wrong?
Thank you very much,
Jason