I wrote traditional matrix multiplication in c++ and tried to measure and compare its theoretical and practical FLOP. As I know inner loop of MM has 2 operation therefore simple MM theoretical Flops is 2*n*n*n (2n^3) but in practice I get something like 4n^3 + number of operation which is 2 i.e. 6n^3 also if I just try to add up only one array a[i][j]++ practical flops then calculate like 3n^3 and not n^3 as you see again it is 2n^3 +1 operation and not 1 operation * n^3 . This is in case if I use 1D array in three nested loops as Matrix multiplication and compare flop, practical flop is the same (near) the theoretical flop and depend exactly as the number of operation in inner loop.I could not find the reason for this behaviour. what is the reason in both case?
I know that theoretical flop is not the same as practical one because of some operations like load etc.
system specification:
Intel core2duo E4500 3700g memory L2 cache 2M x64 fedora 17
sample results:
Matrix matrix multiplication 512*512
Real_time: 1.718368 Proc_time: 1.227672 Total flpops: 807,107,072 MFLOPS: 657.429016
Real_time: 3.608078 Proc_time: 3.042272 Total flpops: 807,024,448 MFLOPS: 265.270355
theoretical flop: 2*512*512*512=268,435,456
Practical flops= 6*512^3 =807,107,072
Using 1 dimensional array float d[size][size]:512 or any size
for (int j = 0; j < size; ++j) {
for (int k = 0; k < size; ++k) {
d[k]=d[k]+e[k]+f[k]+g[k]+r;
}
}
Real_time: 0.002288 Proc_time: 0.002260 Total flpops: 1,048,578 MFLOPS: 464.027161
theroretical flop: *4n^2=4*512^2=1,048,576*
practical flop : 4n^2+overhead (other operation?)=1,048,578
3 loop version:
Real_time: 1.282257 Proc_time: 1.155990 Total flpops: 536,872,000 MFLOPS: 464.426117
theoretical flop:4n^3 = 536,870,912
practical flop: *4n^3=4*512^3+overheads(other operation?)=536,872,000*
thank you