I've heard conflicting information from different sources, and I'm not really sure which one to believe. As such, I'll post what I understand and ask for corrections. Let's say I want to use a 2D matrix. There are three ways that I can do this (at least that I know of).
1:
int i;
char **matrix;
matrix = malloc(50 * sizeof(char *));
for(i = 0; i < 50; i++)
matrix[i] = malloc(50);
2:
int i;
int rowSize = 50;
int pointerSize = 50 * sizeof(char *);
int dataSize = 50 * 50;
char **matrix;
matrix = malloc(dataSize + pointerSize);
char *pData = matrix + pointerSize - rowSize;
for(i = 0; i < 50; i++) {
pData += rowSize;
matrix[i] = pData;
}
3:
//instead of accessing matrix[i][j] here, we would access matrix[i * 50 + j]
char *matrix = malloc(50 * 50);
In terms of memory usage, my understanding is that 3 is the most efficient, 2 is next, and 1 is least efficient, for the reasons below:
3: There is only one pointer and one allocation, and therefore, minimal overhead.
2: Once again, there is only one allocation, but there are now 51 pointers. This means there is 50 * sizeof(char *) more overhead.
1: There are 51 allocations and 51 pointers, causing the most overhead of all options.
In terms of performance, once again my understanding is that 3 is the most efficient, 2 is next, and 1 is least efficient. Reasons being:
3: Only one memory access is needed. We will have to do a multiplication and an addition as opposed to two additions (as in the case of a pointer to a pointer), but memory access is slow enough that this doesn't matter.
2: We need two memory accesses; once to get a char *, and then to the appropriate char. Only two additions are performed here (once to get to the correct char * pointer from the original memory location, and once to get to the correct char variable from wherever the char * points to), so multiplication (which is slower than addition) is not required. However, on modern CPUs, multiplication is faster than memory access, so this point is moot.
1: Same issues as 2, but now the memory isn't contiguous. This causes cache misses and extra page table lookups, making it the least efficient of the lot.
First and foremost: Is this correct? Second: Is there an option 4 that I am missing that would be even more efficient?