AMD just announced their Magny-Cours1 12-core processor.
The second-most interesting thing here is AMD’s retreat to MCM-based designs, as Jon Stokes describes. Intel’s made their first “multicore” chips by gluing two normal chips together; AMD made fun of them for not actually “designing” anything. But right now, AMD is behind in the processor wars, and to catch up they’re taking the time-saving and cost-saving measure of … gluing two normal chips together. In the fullness of time all statements eventually become ironic.
The most interesting thing here is the directory-based cache coherence scheme. Quoting Jon Stokes again:
The solution that AMD has adopted with Istanbul and Magny-Cours involves setting aside 1MB of each chip’s 6MB cache to store a directory of the contents of the other chips’ caches, so that by consulting this local directory each chip can avoid broadcasting a significant number of traffic-increasing snoop requests to the other chips.
Directory-based coherence is nothing new, but I think it’s new to commodity workstation- and server-class processors.
Starting with this column, I’m going to explain what I’m talking about. I should aim these columns toward an audience less specialized than me in architecture, rather than assume the audience knows more than me. I hope this will make my columns more accessible, and also clarify my knowledge of architecture. “To explain is to understand”.
Cache coherence is too big a topic for me to cover right now, so I’ll glide over it, and caches as well. There are two main schemes for cache coherence: snoopy and directory.
Snoopy coherence (or snoopy-bus, pic unrelated) has all processors connected to a shared bus to memory. Whenever a processor reads or writes from main memory, the other processors “snoop” to make sure their own caches are up to date.
In directory coherence, there is a single directory (centralized or distributed) that tracks the location and status of each line of memory. When a processor wants to read or write memory, it consults the directory first, then sends a (usually) targeted message to any processors that already have that line.
Directory coherence involves more messages than snoopy bus, but almost all of them are targeted point-to-point, rather than broadcast. In these days of growing wire delay, broadcast buses are a bad idea (just ask Michael Taylor). From Jon Stokes’s article, it sounds like AMD was also concerned about the limited off-chip bandwidth. Broadcasting between sockets must be several times slower than broadcasting between cores.
- My high school French tells me that’s pronounced very much like “many core”. Nice branding, AMD. ↩