OTTAWA — Monday afternoon at ISSCC in San Francisco, AMD disclosed technical details of its accelerated processing unit (APU) known as Llano for the first time. AMD provided me with a telephone briefing of their Llano disclosure at ISSCC (Paper 5.6 An x86-64 Core Implemented in 32nm SOI CMOS). My briefing was delivered by AMD Senior Fellow Sam Naffziger who leads processor design.
Naffziger co-authored the AMD paper that highlights the X86 side of the design although there is speculation that much of the challenge of the APU lies in the monolithic fusion of CPU and GPU on silicon.
The AMD PR team told me that the decision to present the X86 design first is not a reflection of the relative difficulty of one over the other. However, they did admit that ntegrating the APU on the same silicon as the CPU required some interesting design elements that they will keep close to the vest for now.
Obviously, I can’t comment on that, but I have to agree with the marketing folks that so much has been said about GPU integration in the last few years that there was a clear need to keep the X86 core design from getting swamped in the news.
AMD believes it takes more than just scaling up the number of cores to improve performance. In fact, they think that approach will soon reach a limit. In AMD’s assessment, we are in the mature stage of multi-core design, but poised to begin the new era of “heterogeneous systems” offering “abundant data parallelism” and “power efficient GPUs.”
To summarize the Llano APU, it will contain four X86 cores each with over 35 million transistors occupying just shy of 10mm2. All four cores get their own megabyte of L2 cache SRAM (which adds to the quoted total transistor count and silicon associated with each). AMD targets operation above 3GHz and supply voltages of 0.8 to 1.3 V.
Although AMD’s ISSCC presentation today is not about the new Bulldozer or Bobcat architecture planned for next year, the Llano design offers some cool features.
AMD uses a legacy architecture for the device to lessen the risk of the transition to the 32nm process node. But ISSCC is about circuits, and that’s what AMD will present. Naffziger and the rest of the design team have wrapped a number of power reduction elements around their existing X86 core design.
Drawing on a solid history of energy efficient CPUs, AMD’s processor group presents three power management innovations in its talk this afternoon.
1. Core power gating
Gives Llano an envelope of 2.5 to 25W depending on the performance demand. Each core can be independently and completely disconnected from the power supply.
2. Digital on-die temperature measurements
On die temperature measurements are not new, but AMD’s digital approach claims to improve accuracy and repeatability of the die temperature map.
3. Power aware clock grid design
Taking a close look at the clock grid saved a lot of potentially wasted watts. This thorough approach to not just achieving clock specs across the die but also trying to improve the clock efficiency actually reduced the metal capacitance in the grid by more than 80% and reduced the number of final clock buffers by better than half.
The before and after pictures of the clock metal grid in the AMD presentation are quite striking. And they should be. There are big power-performance benefits of the new clock design if you consider that up to 30% of total processor power consumption can be consumed by the clock tree.
While many will argue how competitive AMD currently is, it’s probably no surprise that just participating in the microprocessor business demands staying on the leading edge. But I thought it was still interesting that AMD was able to include three hot topics in their X86 paper even though their core architecture is not new.
Looking at the other sessions at ISSCC, the AMD presentation discusses three major technical fields considering that the conference offered tutorial sessions for each of these areas. (See T5: Design of Energy-Efficient On-Chip Networks , T6: Design of Smart Sensors, and T8: Power Gating.) In fact, the instructor for the power gating tutorial was actually given by Stephen Kosonocky from AMD.
Their design innovations are significant considering that Llano is not actually a new architecture for AMD like the upcoming Bulldozer or Bobcat. Instead, AMD uses a legacy X86 core to minimize the risk of moving to a new process technology ‘ la Intel’s tick-tock approach.
If you think advanced CPUs are just tiny bits (Llano rolls out on 32nm – more on that below), think again. The power gating transistors used to cut power to individual cores are a full meter wide.
Yes, I do mean a full 39 inches or more than three full feet. That’s the kind of W/L ratio that’s sure to get some poor reverse engineer chastised by a supervisor (at least for RE companies that include that type of information). “What’s this number with the seven zeroes? Did you fall asleep at your workstation again?”
The more serious aspect to the large power gating devices is that AMD switches the ground side of the supply. Knowing that N-channel transistors offer 20 to 30 percent higher drive currents per unit width than their P-channel CMOS counterparts, controlling the ground side of the supply means that NFET switches require up to 30% less width and therefore space on the die. AMD is quick to point out that competitors designing on bulk CMOS technology need to gate the positive side of the power supply and use larger PFETs.
By the way, AMD will not require an extra thick top metal for power distribution on this chip that is commonly used on other technology platforms. That’s another advantage of gating the ground side. It allows the design to switch the core connections to the big ground conductors present in the package rather than thick on die copper for VDD.
To add a measure of objectivity, the ground gating advantages of SOI are becoming less important. As Intel pointed out in a recent technology analyst call, their PFETs are catching up fast. Embedded SiGe source / drains add an extra knob for process engineers trying to improve current drive by adding compressive strain to the P-channel. At 32nm, Intel was closer to the 20% end of the scale and PFET performance will continue to catch up to NFETs at 22nm.
While AMD’s transition to 32nm high-K metal gate (HKMG) processing is a major factor keeping power consumption in check, that’s not the highlight of today’s presentation. Although Llano will be the first chip in the consumer market to use the HKMG stack on SOI substrate (actually the first 32nm SOI as well), AMD lags their competition at Intel in both the process node and the material innovation since Intel launched 32nm in the fourth quarter of 2009 and brought high-K and metal gates into the mainstream on their 45nm node that hit the street two years earlier.
AMD has done a good job of tempering what I’m sure is a lot of internal excitement (loosely interpreted to include Global Foundries as well) over the launch of their own 32nm process considering the lead Intel maintains in this area. They know that too much emphasis here would put the follower’s spotlight on them since most people would not understand the additional effort necessary to roll out HKMG on the SOI platform compared to bulk silicon.
Some who understand the added complexity of SOI would probably still debate the business decision to pursue it over bulk technology. But AMD maintains its strong belief in the advantages of SOI, and they will highlight that again in the Naffziger talk this afternoon.
So when will Llano hit the market? AMD’s PR team promises that the chips will sample to its customers by June this year. Expect it in consumer goods some time in 2011.