SPARC M7 Chip - 32 cores - Mind Blowing performance
- by Angelo-Oracle
The M7 Chip
Oracle just announced its Next Generation Processor at the HotChips HC26 conference.
As the Tech Lead in our Systems Division's Partner group, I had a front row seat to the extraordinary price performance advantage of Oracle current T5 and M6 based systems. Partner after partner tested these systems and were impressed with it performance. Just read some of the quotes to see what our partner has been saying about our hardware.
We just announced our next generation processor, the M7. This has 32 cores (up from 16-cores in T5 and 12-cores in M6). With 20 nm technology this is our most advanced processor. The processor has more cores than anything else in the industry today. After the Sun acquisition Oracle has released 5 processors in 4 years and this is the 6th.
The S4 core
The M7 is built using the foundation of the S4 core. This is the next generation core technology. Like its predecessor, the S4 has 8 dynamic threads. It increases the frequency while maintaining the Pipeline depth. Each core has its own fine grain power estimator that keeps the core within its power envelop in 250 nano-sec granularity. Each core also includes Software in Silicon features for Application Acceleration Support. Each core includes features to improve Application Data Integrity, with almost no performance loss. The core also allows using part of the Virtual Address to store meta-data. User-Level Synchronization Instructions are also part of the S4 core. Each core has 16 KB Instruction and 16 KB Data L1 cache.
The Core Clusters
The cores on the M7 chip are organized in sets of 4-core clusters. The core clusters share L2 cache. All four cores in the complex share 256 KB of 4 way set associative L2 Instruction Cache, with over 1/2 TB/s of throughput. Two cores share 256 KB of 8 way set associative L2 Data Cache, with over 1/2 TB/s of throughput. With this innovative Core Cluster architecture, the M7 doubles core execution bandwidth. to maximize per-thread performance.
The Chip
Each M7 chip has 8 sets of these core-clusters. The chip has 64 MB on-chip L3 cache. This L3 caches is shared among all the cores and is partitioned into 8 x 8 MB chunks. Each chunk is 8-way set associative cache. The aggregate bandwidth for the L3 cache on the chip is over 1.6TB/s.
Each chip has 4 DDR4 memory controllers and can support upto 16 DDR4 DIMMs, allowing for 2 TB of RAM/chip. The chip also includes 4 internal links of PCIe Gen3 I/O controllers.
Each chip has 7 coherence links, allowing for 8 of these chips to be connected together gluelessly. Also 32 of these chips can be connected in an SMP configuration. A potential system with 32 chips will have 1024 cores and 8192 threads and 64 TB of RAM.
Software in Silicon
The M7 chip has many built in Application Accelerators in Silicon. These features will be exposed to our Software partners using the SPARC Accelerator Program. The M7 has built-in logic to decompress data at the speed of memory access. This means that applications can directly work on compressed data in memory increasing the data access rates. The VA Masking feature allows the use of part of the virtual address to store meta-data.
Realtime Application Data Integrity
The Realtime Application Data Integrity feature helps applications safeguard against invalid, stale memory reference and buffer overflows. The first 4-bits if the Pointer can be used to store a version number and this version number is also maintained in the memory & cache lines. When a pointer accesses memory the hardware checks to make sure the two versions match. A SEGV signal is raised when there is a mismatch. This feature can be used by the Database, applications and the OS.
M7 Database In-Memory Query Accelerator
The M7 chip also includes a In-Silicon Query Engines. These accelerate tasks that work on In-Memory Columnar Vectors. Oracle In-Memory options stores data in Column Format. The M7 Query Engine can speed up In-Memory Format Conversion, Value and Range Comparisons and Set Membership lookups. This engine can work on Compressed data - this means not only are we accelerating the query performance but also increasing the memory bandwidth for queries.
SPARC Accelerated Program
At the Hotchips conference we also introduced the SPARC Accelerated Program to provide our partners and third part developers access to all the goodness of the M7's SPARC Application Acceleration features. Please get in touch with us if you are interested in knowing more about this program.