Memory Bandwidth Performance for Modern Machines
- by porgarmingduod
I'm designing a real-time system that occasionally has to duplicate a large amount of memory. The memory consists of non-tiny regions, so I expect the copying performance will be fairly close to the maximum bandwidth the relevant components (CPU, RAM, MB) can do. This led me to wonder what kind of raw memory bandwidth modern commodity machine can muster?
My aging Core2Duo gives me 1.5 GB/s if I use 1 thread to memcpy() (and understandably less if I memcpy() with both cores simultaneously.) While 1.5 GB is a fair amount of data, the real-time application I'm working on will have have something like 1/50th of a second, which means 30 MB. Basically, almost nothing. And perhaps worst of all, as I add multiple cores, I can process a lot more data without any increased performance for the needed duplication step.
But a low-end Core2Due isn't exactly hot stuff these days. Are there any sites with information, such as actual benchmarks, on raw memory bandwidth on current and near-future hardware?
Furthermore, for duplicating large amounts of data in memory, are there any shortcuts, or is memcpy() as good as it will get?
Given a bunch of cores with nothing to do but duplicate as much memory as possible in a short amount of time, what's the best I can do?