<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe Auricchio &#187; Architecture</title>
	<atom:link href="http://joe.definitelynotsafe.com/category/architecture/feed" rel="self" type="application/rss+xml" />
	<link>http://joe.definitelynotsafe.com</link>
	<description>Missing the point since 1986</description>
	<lastBuildDate>Fri, 09 Dec 2011 05:54:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Thoughts on Magny-Cours</title>
		<link>http://joe.definitelynotsafe.com/thoughts-on-magny-cours</link>
		<comments>http://joe.definitelynotsafe.com/thoughts-on-magny-cours#comments</comments>
		<pubDate>Mon, 21 Sep 2009 23:15:23 +0000</pubDate>
		<dc:creator>Joe Auricchio</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://joe.definitelynotsafe.com/?p=370</guid>
		<description><![CDATA[AMD just announced their new top-of-the-line 12-core processor. It seems to use directory cache coherence, rather than the snoopy bus traditional in workstation- and server-class processors.]]></description>
			<content:encoded><![CDATA[<p>AMD just <a href="http://arstechnica.com/hardware/news/2009/09/amd-makes-tradeoffs-in-upcoming-12-core-server-cpu.ars">announced</a> their Magny-Cours<sup class='footnote'><a href='#fn-370-1' id='fnref-370-1'>1</a></sup> 12-core processor.</p>
<p>The second-most interesting thing here is AMD&#8217;s retreat to MCM-based designs, as Jon Stokes describes. Intel&#8217;s made their first &#8220;multicore&#8221; chips by gluing two normal chips together; AMD made fun of them for not actually &#8220;designing&#8221; anything. But right now, AMD is behind in the processor wars, and to catch up they&#8217;re taking the time-saving and cost-saving measure of &#8230; gluing two normal chips together. In the fullness of time all statements eventually become ironic.</p>
<p>The most interesting thing here is the directory-based cache coherence scheme. Quoting Jon Stokes again:</p>
<blockquote><p>The solution that AMD has adopted with Istanbul and Magny-Cours involves setting aside 1MB of each chip&#8217;s 6MB cache to store a directory of the contents of the other chips&#8217; caches, so that by consulting this local directory each chip can avoid broadcasting a significant number of traffic-increasing snoop requests to the other chips.</p></blockquote>
<p>Directory-based coherence is <a href="http://portal.acm.org/citation.cfm?id=325132">nothing</a> <a href="http://portal.acm.org/citation.cfm?doid=633625.52432">new</a>, but I think it&#8217;s new to commodity workstation- and server-class processors.</p>
<hr />
<p>Starting with this column, I&#8217;m going to explain what I&#8217;m talking about. I should aim these columns toward an audience <em>less</em> specialized than me in architecture, rather than assume the audience knows <em>more</em> than me. I hope this will make my columns more accessible, and also clarify my knowledge of architecture. &#8220;To explain is to understand&#8221;.</p>
<p>Cache coherence is too big a topic for me to cover right now, so I&#8217;ll glide over it, and caches as well. There are two main schemes for cache coherence: snoopy and directory.</p>
<p>Snoopy coherence (or snoopy-bus, <a href="http://www.schulzmuseum.org/images/snoopy-bus.jpg">pic unrelated</a>) has all processors connected to a shared bus to memory. Whenever a processor reads or writes from main memory, the other processors &#8220;snoop&#8221; to make sure their own caches are up to date.</p>
<p>In directory coherence, there is a single directory (centralized or distributed) that tracks the location and status of each line of memory. When a processor wants to read or write memory, it consults the directory first, then sends a (usually) targeted message to any processors that already have that line.</p>
<p>Directory coherence involves more messages than snoopy bus, but almost all of them are targeted point-to-point, rather than broadcast. In these days of growing wire delay, broadcast buses are a bad idea (just ask <a href="http://groups.csail.mit.edu/cag/raw/">Michael Taylor</a>). From Jon Stokes&#8217;s article, it sounds like AMD was also concerned about the limited off-chip bandwidth. Broadcasting between sockets must be several times slower than broadcasting between cores.
<div class='footnotes'>
<div class='footnotedivider'></div>
<ol>
<li id='fn-370-1'>My high school French tells me that&#8217;s pronounced very much like &#8220;many core&#8221;. Nice branding, AMD. <span class='footnotereverse'><a href='#fnref-370-1'>&#8617;</a></span></li>
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://joe.definitelynotsafe.com/thoughts-on-magny-cours/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on the Rumored IBM/Sun Buyout</title>
		<link>http://joe.definitelynotsafe.com/thoughts-on-the-rumored-ibmsun-buyout</link>
		<comments>http://joe.definitelynotsafe.com/thoughts-on-the-rumored-ibmsun-buyout#comments</comments>
		<pubDate>Wed, 25 Mar 2009 07:54:29 +0000</pubDate>
		<dc:creator>Joe Auricchio</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://joe.definitelynotsafe.com/?p=344</guid>
		<description><![CDATA[There are rumors floating around that IBM plans to buy Sun. A few friends and I discussed the effects on Twitter.
cwhitney: I kinda like the IBM + Sun idea. That actually works, although it would then be basically SunBM versus HP versus genericroSoft.
jauricchio:  I&#8217;m in favor of hitting ZFS, DTrace, and OSol with the [...]]]></description>
			<content:encoded><![CDATA[<p>There are <a href="http://slashdot.org/article.pl?sid=09/03/18/1213209">rumors floating around</a> that IBM plans to buy Sun. A few friends and I discussed the effects on Twitter.</p>
<p><b><a href="http://twitter.com/cwhitney">cwhitney</a></b>: I kinda like the IBM + Sun idea. That actually works, although it would then be basically SunBM versus HP versus genericroSoft.<br />
<b>jauricchio</b>:  I&#8217;m in favor of hitting ZFS, DTrace, and OSol with the GPLHammer. I&#8217;m not in favor of axing Rock and Niagara. You know it&#8217;s true.<br />
<b>cwhitney</b>: They would have a stable of old but mission critical ($$$) unix OSes too. GPLv3 Hammer is a no for me (you do want OS X ZFS?)<br />
<b>jauricchio</b>:  It&#8217;d be v2. THEY want it in Linux.<br />
<b>cwhitney</b>: Rock and Niagara may go away, but a future of Sun arch guys + PPC team + actual in-house fab = fun times.<br />
<b>jauricchio</b>  Not stoked about POWER5, 6. Niagara and Rock broke more notable architectural ground.<br />
<b>cwhitney</b>:  Most certainly but most all the non-embedded, non-x86 CPU arch work comes from those teams. IBM also still has fabs, unlike most<br />
<b><a href="http://twitter.com/djcapelis">djcapelis</a></b>: You mean unlike&#8230; AMD? :( Yeah it would be interesting to see them together. I would hope for people to jump between them more.<br />
<b>cwhitney</b>  I understand the $ reasons, but that AMD move was dumb. Losing vertical integration = bad.</p>
<p>I&#8217;m very pleased by Sun&#8217;s work in architecture. <a href="http://www.sun.com/processors/niagara/">Niagara</a> and <a href="http://blogs.sun.com/jonathan/entry/rock_arrived">Rock</a> are both bold experiments. At a time when most of the chip vendors were just starting to realize single-threaded scaling was going to get harder, Niagara threw away single-thread performance for radical parallelism. For the workloads Sun targets (network serving, mostly), that turned out to be a very, very good trade-off. These days <a href="http://www.sun.com/servers/coolthreads/t5440/">you can get a 4U</a> with 256 hardware threads and 512GB RAM. That&#8217;s a lot of threads. Matched with Sun&#8217;s reliably solid memory systems, that&#8217;s some pretty serious multi-thread performance.</p>
<p>Rock is something out of a research paper. Somebody finally built a hardware transactional memory system? Suddenly all those papers become relevant to the real world! I&#8217;ve got more to say on Rock, but that&#8217;s another column. Let&#8217;s just say it&#8217;s a Good Thing.</p>
<p>On the other hand, IBM&#8217;s architecture team hasn&#8217;t impressed me lately. The POWER6 looks like a solid chip, but it&#8217;s just more of the same: all the old tricks with bigger numbers.</p>
<ul>
<li>Two-way SMT is good.</li>
<li>The semi-shared L2 looks like a cute idea: if you have up to four threads working intensely on the same data, you can fit up to 8MB in their L2. Without semi-sharing, you&#8217;d only get the same speed for 4MB between two threads. That wider and larger sharing could squeeze some more parallel speedup out of code that can be parallelized but still contends for the same data. To put everything in the right places, you&#8217;d need a good scheduler that can see the coherence patterns.</li>
<li>The L3 is huge! 32MB? What is this a, <a href="http://en.wikipedia.org/wiki/PA-RISC?#PA-RISC_microprocessor_specifications">PA-8800 Mako</a>?</li>
<li>Clock speeds are ever higher. Anybody running lots of POWER chips doesn&#8217;t care about power and heat: they&#8217;ll just put a little more in the budget for their <i>new supercomputing center</i>. Because of who they sell to, IBM is in some ways immune to the general purpose computer power/heat crunch. The first wave of the crunch (laptops and desktops) only hit Intel, AMD, and IBM&#8217;s PowerPC. The second wave (datacenters) is hitting everyone but the POWER team. At least, that&#8217;s sure how it looks from where I&#8217;m standing.</li>
</ul>
<p>Pretty much the only things I find interesting in POWER6&#8217;s architecture are the semi-shared cache and the retreat to shallow in-order pipes. Even that latter was foreshadowed by Niagara and the <a href="http://en.wikipedia.org/wiki/Cell_(microprocessor)">Cell</a>/<a href="http://en.wikipedia.org/wiki/Xenon_(processor)">Xenon</a>.</p>
<p>Don&#8217;t get me wrong: I&#8217;m not trying to belittle the POWER6 in any way. It&#8217;s a great work of engineering. I&#8217;m just not impressed with it as <i>research</i>. In contrast, Sun&#8217;s doing research with every processor they make. If IBM does buy Sun, I really hope they let Sun&#8217;s architects and chip engineers keep doing their thing. The best future, as Chris said, is Sun&#8217;s creativity on IBM&#8217;s resources.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.definitelynotsafe.com/thoughts-on-the-rumored-ibmsun-buyout/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on the Atom</title>
		<link>http://joe.definitelynotsafe.com/thoughts-on-the-atom</link>
		<comments>http://joe.definitelynotsafe.com/thoughts-on-the-atom#comments</comments>
		<pubDate>Tue, 20 May 2008 05:15:51 +0000</pubDate>
		<dc:creator>Joe Auricchio</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://joe.definitelynotsafe.com/?p=239</guid>
		<description><![CDATA[Intel's new Atom microarchitecture targets embedded systems, bringing two new innovations to embedded ISAs: SMT and x86 binary compatibility. SMT will bring better performance on multithreaded workloads, but the Atom's heavy front end may consume too much power on simple monothreaded code. I don't think x86 compatibility is a particularly desirable property of an embedded system, though: nobody cares about binary compatibility back to MS-DOS, and the x86 community may not be ready for the board support issues of the embedded world.]]></description>
			<content:encoded><![CDATA[<p>The RISC vs CISC war isn&#8217;t over, and the next battle will be for handheld devices. Intel&#8217;s new Atom microarchitecture looks like a very interesting competitor to ARM and PowerPC in the &#8220;embedded systems with muscle&#8221; space (roughly: smartphones and set-tops). Hannibal nicely sums up the issue in an article that&#8217;s made the rounds of Slashdot et al, so I&#8217;ll let him do the talking for a few moments.</p>
<p><a href="http://arstechnica.com/articles/paedia/risc-vs-cisc-mobile-era.ars">RISC vs CISC in the Mobile Era</a></p>
<p>I&#8217;m surprised at how strongly Intel is now embracing SMT. The Core lost HyperThreading for power and heat concerns a few years ago, and it stayed out of Core 2. But this year, Nehalem brings back SMT&#8230; and it&#8217;s in Atom too!</p>
<p>SMT in an in-order low-power chip is an interesting choice. Historically, SMT was about performance (<i>not</i> about perf per watt). In 2000, if you had a big honkin&#8217; superscalar, you probably didn&#8217;t care about power consumption much. Hannibal makes the very strong and clear point that because of Atom&#8217;s x86 legacy (the excess of transistors burned on predecode, length decode, and complex-op microcode hardware), it&#8217;s impossible to follow the ARM Cortex strategy of building a tiny core and stamping them out (see also Sun Niagara!). The front-end is so heavy that its power cost <i>has</i> to be shared by/amortized over a few threads.</p>
<p>I&#8217;d suspect, for comparable parts, Atom will outperform Cortex on multithreaded workloads (no surprise), Cortex will beat Atom for complex single threads, and Cortex will use much less <i>power</i> than Atom on easy single threaded code.</p>
<p>Finally, I&#8217;m still not convinced by Intel&#8217;s &#8220;x86 everywhere&#8221; strategy. This is the embedded space, where different system boards share nothing in common. In answering the question, &#8220;What does this device look like to my code?&#8221; the ISA is the <i>least</i> interesting thing to examine. The embedded community has to support many many wildly different systems, and they do a very good job of it. The x86 community has not had any experience like this, and I don&#8217;t think giving them the <i>option</i> to adapt to this new world is necessarily a productive thing to do.<sup class='footnote'><a href='#fn-239-1' id='fnref-239-1'>1</a></sup></p>
<p>Case in point: the Linux i386 branch is almost exclusively intended for &#8220;PCs&#8221;&#8230; even a diskless workstation like Scott&#8217;s little Cyrix is way out in the boonies of supported systems. But Linux also supports <a href="http://www.linuxdevices.com/articles/AT4313418436.html">dozens of fantastically varied embedded systems</a>: I count 59 ARM-based, 27 MIPS-based, 22 PPC-based, and 22 others including Super-H, SHARC, Blackfin, Tensilica, and FPGA soft-cores. There are only ten x86-based embedded systems. It is the embedded community that can most effectively accommodate new devices. All x86 could bring to the table is an arrogant assumption that things &#8220;ought to work like they do on PCs&#8221; and binary compatibility with software nobody cares about. If I&#8217;m building a set-top box, I don&#8217;t care if it can run Word &#8216;97. That&#8217;s just not a selling point I see for the Atom.</p>
<div class='footnotes'>
<div class='footnotedivider'></div>
<ol>
<li id='fn-239-1'>Of course the PC world has many different devices, and Windows users have been dealing with driver problems as long as there have been PCs or drivers. But it&#8217;s one thing to have to track down the right driver for your old ISA sound card. It&#8217;s something completely different when your CPU talks to the sound chip over memory-mapped registers that go through a Spartan-3&#8217;s GPIO pins. <span class='footnotereverse'><a href='#fnref-239-1'>&#8617;</a></span></li>
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://joe.definitelynotsafe.com/thoughts-on-the-atom/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

