The best way to predict performance without actually porting the code?
- by ardiyu07
I believe there are people with the same experience with me,
where he/she must give a (estimated) performance report of
porting a program from sequential to parallel with some
designated multicore hardwares, with a very few amount of
time given.
For instance, if a 10K LoC sequential program was given and
executes on Intel i7-3770k (not vectorized) in 100 ms, how
long would it take to run if one parallelizes the code to a Tesla
C2075 with NVIDIA CUDA, given that all kinds of
parallelizing optimization techniques were done? (but you're
only given 2-4 days to report the performance? assume that
you didn't know the algorithm at all. Or perhaps it'd be
safer if we just assume that it's an impossible situation
to finish the job)
Therefore, I'm wondering, what most likely be the fastest
way to give such performance report? Is it safe to calculate
solely by the hardware's capability, such as GFLOPs peak and
memory bandwidth rate? Is there a mathematical way to
calculate it? If there is, please prove your method with
the corresponding problem description and the algorithm, and
also the target hardwares' specifications.
Or perhaps there already exists such tool to (roughly)
estimate code porting?
(Please don't the answer: 'kill yourself is the fastest way.')