trouble calculating offset index into 3D array
- by Derek
Hello,
I am writing a CUDA kernel to create a 3x3 covariance matrix for each location in the rows*cols main matrix. So that 3D matrix is rows*cols*9 in size, which i allocated in a single malloc accordingly. I need to access this in a single index value
the 9 values of the 3x3 covariance matrix get their values set according to the appropriate row r and column c from some other 2D arrays.
In other words - I need to calculate the appropriate index to access the 9 elements of the 3x3 covariance matrix, as well as the row and column offset of the 2D matrices that are inputs to the value, as well as the appropriate index for the storage array.
i have tried to simplify it down to the following:
//I am calling this kernel with 1D blocks who are 512 cols x 1row. TILE_WIDTH=512
int bx = blockIdx.x;
int by = blockIdx.y;
int tx = threadIdx.x;
int ty = threadIdx.y;
int r = by + ty;
int c = bx*TILE_WIDTH + tx;
int offset = r*cols+c;
int ndx = r*cols*rows + c*cols;
if((r < rows) && (c < cols)){ //this IF statement is trying to avoid the case where a threadblock went bigger than my original array..not sure if correct
d_cov[ndx + 0] = otherArray[offset];
d_cov[ndx + 1] = otherArray[offset]
d_cov[ndx + 2] = otherArray[offset]
d_cov[ndx + 3] = otherArray[offset]
d_cov[ndx + 4] = otherArray[offset]
d_cov[ndx + 5] = otherArray[offset]
d_cov[ndx + 6] = otherArray[offset]
d_cov[ndx + 7] = otherArray[offset]
d_cov[ndx + 8] = otherArray[offset]
}
When I check this array with the values calculated on the CPU, which loops over i=rows, j=cols, k = 1..9
The results do not match up.
in other words d_cov[i*rows*cols + j*cols + k] != correctAnswer[i][j][k]
Can anyone give me any tips on how to sovle this problem? Is it an indexing problem, or some other logic error?