theoretical and practical matrix multiplication FLOP

Posted by mjr on Stack Overflow See other posts from Stack Overflow or by mjr
Published on 2012-12-16T10:49:09Z Indexed on 2012/12/16 11:04 UTC
Read the original article Hit count: 475

I wrote traditional matrix multiplication in c++ and tried to measure and compare its theoretical and practical FLOP. As I know inner loop of MM has 2 operation therefore simple MM theoretical Flops is 2*n*n*n (2n^3) but in practice I get something like 4n^3 + number of operation which is 2 i.e. 6n^3 also if I just try to add up only one array a[i][j]++ practical flops then calculate like 3n^3 and not n^3 as you see again it is 2n^3 +1 operation and not 1 operation * n^3 . This is in case if I use 1D array in three nested loops as Matrix multiplication and compare flop, practical flop is the same (near) the theoretical flop and depend exactly as the number of operation in inner loop.I could not find the reason for this behaviour. what is the reason in both case?

I know that theoretical flop is not the same as practical one because of some operations like load etc.

system specification: Intel core2duo E4500 3700g memory L2 cache 2M x64 fedora 17

sample results:

Matrix matrix multiplication 512*512

Real_time: 1.718368 Proc_time: 1.227672 Total flpops: 807,107,072 MFLOPS: 657.429016 Real_time: 3.608078 Proc_time: 3.042272 Total flpops: 807,024,448 MFLOPS: 265.270355

theoretical flop: 2*512*512*512=268,435,456 Practical flops= 6*512^3 =807,107,072

Using 1 dimensional array float d[size][size]:512 or any size

for (int j = 0; j < size; ++j) {
    for (int k = 0; k < size; ++k) {
        d[k]=d[k]+e[k]+f[k]+g[k]+r;
    }
}

Real_time: 0.002288 Proc_time: 0.002260 Total flpops: 1,048,578 MFLOPS: 464.027161

theroretical flop: *4n^2=4*512^2=1,048,576* practical flop : 4n^2+overhead (other operation?)=1,048,578

3 loop version:

Real_time: 1.282257 Proc_time: 1.155990 Total flpops: 536,872,000 MFLOPS: 464.426117 theoretical flop:4n^3 = 536,870,912 practical flop: *4n^3=4*512^3+overheads(other operation?)=536,872,000* thank you

© Stack Overflow or respective owner

Related posts about c++

Related posts about Performance