Archive for September, 2009

Thoughts on Magny-Cours

Monday, September 21st, 2009

AMD just announced their Magny-Cours1 12-core processor.

The second-most interesting thing here is AMD’s retreat to MCM-based designs, as Jon Stokes describes. Intel’s made their first “multicore” chips by gluing two normal chips together; AMD made fun of them for not actually “designing” anything. But right now, AMD is behind in the processor wars, and to catch up they’re taking the time-saving and cost-saving measure of … gluing two normal chips together. In the fullness of time all statements eventually become ironic.

The most interesting thing here is the directory-based cache coherence scheme. Quoting Jon Stokes again:

The solution that AMD has adopted with Istanbul and Magny-Cours involves setting aside 1MB of each chip’s 6MB cache to store a directory of the contents of the other chips’ caches, so that by consulting this local directory each chip can avoid broadcasting a significant number of traffic-increasing snoop requests to the other chips.

Directory-based coherence is nothing new, but I think it’s new to commodity workstation- and server-class processors.


Starting with this column, I’m going to explain what I’m talking about. I should aim these columns toward an audience less specialized than me in architecture, rather than assume the audience knows more than me. I hope this will make my columns more accessible, and also clarify my knowledge of architecture. “To explain is to understand”.

Cache coherence is too big a topic for me to cover right now, so I’ll glide over it, and caches as well. There are two main schemes for cache coherence: snoopy and directory.

Snoopy coherence (or snoopy-bus, pic unrelated) has all processors connected to a shared bus to memory. Whenever a processor reads or writes from main memory, the other processors “snoop” to make sure their own caches are up to date.

In directory coherence, there is a single directory (centralized or distributed) that tracks the location and status of each line of memory. When a processor wants to read or write memory, it consults the directory first, then sends a (usually) targeted message to any processors that already have that line.

Directory coherence involves more messages than snoopy bus, but almost all of them are targeted point-to-point, rather than broadcast. In these days of growing wire delay, broadcast buses are a bad idea (just ask Michael Taylor). From Jon Stokes’s article, it sounds like AMD was also concerned about the limited off-chip bandwidth. Broadcasting between sockets must be several times slower than broadcasting between cores.

  1. My high school French tells me that’s pronounced very much like “many core”. Nice branding, AMD.