Posts Tagged ‘ 2010 ISSCC ’ – Closer look inside AMD’s Llano APU at ISSCC – Closer look inside AMD’s Llano APU at ISSCC.

OTTAWA — Monday afternoon at ISSCC in San Francisco, AMD disclosed technical details of its accelerated processing unit (APU) known as Llano for the first time. AMD provided me with a telephone briefing of their Llano disclosure at ISSCC (Paper 5.6 An x86-64 Core Implemented in 32nm SOI CMOS). My briefing was delivered by AMD Senior Fellow Sam Naffziger who leads processor design.

Naffziger co-authored the AMD paper that highlights the X86 side of the design although there is speculation that much of the challenge of the APU lies in the monolithic fusion of CPU and GPU on silicon.

The AMD PR team told me that the decision to present the X86 design first is not a reflection of the relative difficulty of one over the other. However, they did admit that ntegrating the APU on the same silicon as the CPU required some interesting design elements that they will keep close to the vest for now.

Obviously, I can’t comment on that, but I have to agree with the marketing folks that so much has been said about GPU integration in the last few years that there was a clear need to keep the X86 core design from getting swamped in the news.

AMD believes it takes more than just scaling up the number of cores to improve performance. In fact, they think that approach will soon reach a limit. In AMD’s assessment, we are in the mature stage of multi-core design, but poised to begin the new era of “heterogeneous systems” offering “abundant data parallelism” and “power efficient GPUs.”

To summarize the Llano APU, it will contain four X86 cores each with over 35 million transistors occupying just shy of 10mm2. All four cores get their own megabyte of L2 cache SRAM (which adds to the quoted total transistor count and silicon associated with each). AMD targets operation above 3GHz and supply voltages of 0.8 to 1.3 V.

Although AMD’s ISSCC presentation today is not about the new Bulldozer or Bobcat architecture planned for next year, the Llano design offers some cool features.

AMD uses a legacy architecture for the device to lessen the risk of the transition to the 32nm process node. But ISSCC is about circuits, and that’s what AMD will present. Naffziger and the rest of the design team have wrapped a number of power reduction elements around their existing X86 core design.

Drawing on a solid history of energy efficient CPUs, AMD’s processor group presents three power management innovations in its talk this afternoon.

1. Core power gating

Gives Llano an envelope of 2.5 to 25W depending on the performance demand. Each core can be independently and completely disconnected from the power supply.

2. Digital on-die temperature measurements

On die temperature measurements are not new, but AMD’s digital approach claims to improve accuracy and repeatability of the die temperature map.

3. Power aware clock grid design

Taking a close look at the clock grid saved a lot of potentially wasted watts. This thorough approach to not just achieving clock specs across the die but also trying to improve the clock efficiency actually reduced the metal capacitance in the grid by more than 80% and reduced the number of final clock buffers by better than half.

The before and after pictures of the clock metal grid in the AMD presentation are quite striking. And they should be. There are big power-performance benefits of the new clock design if you consider that up to 30% of total processor power consumption can be consumed by the clock tree.

While many will argue how competitive AMD currently is, it’s probably no surprise that just participating in the microprocessor business demands staying on the leading edge. But I thought it was still interesting that AMD was able to include three hot topics in their X86 paper even though their core architecture is not new.

Looking at the other sessions at ISSCC, the AMD presentation discusses three major technical fields considering that the conference offered tutorial sessions for each of these areas. (See T5: Design of Energy-Efficient On-Chip Networks , T6: Design of Smart Sensors, and T8: Power Gating.) In fact, the instructor for the power gating tutorial was actually given by Stephen Kosonocky from AMD.

Their design innovations are significant considering that Llano is not actually a new architecture for AMD like the upcoming Bulldozer or Bobcat. Instead, AMD uses a legacy X86 core to minimize the risk of moving to a new process technology ‘ la Intel’s tick-tock approach.

If you think advanced CPUs are just tiny bits (Llano rolls out on 32nm – more on that below), think again. The power gating transistors used to cut power to individual cores are a full meter wide.

Yes, I do mean a full 39 inches or more than three full feet. That’s the kind of W/L ratio that’s sure to get some poor reverse engineer chastised by a supervisor (at least for RE companies that include that type of information). “What’s this number with the seven zeroes? Did you fall asleep at your workstation again?”

The more serious aspect to the large power gating devices is that AMD switches the ground side of the supply. Knowing that N-channel transistors offer 20 to 30 percent higher drive currents per unit width than their P-channel CMOS counterparts, controlling the ground side of the supply means that NFET switches require up to 30% less width and therefore space on the die. AMD is quick to point out that competitors designing on bulk CMOS technology need to gate the positive side of the power supply and use larger PFETs.

By the way, AMD will not require an extra thick top metal for power distribution on this chip that is commonly used on other technology platforms. That’s another advantage of gating the ground side. It allows the design to switch the core connections to the big ground conductors present in the package rather than thick on die copper for VDD.

To add a measure of objectivity, the ground gating advantages of SOI are becoming less important. As Intel pointed out in a recent technology analyst call, their PFETs are catching up fast. Embedded SiGe source / drains add an extra knob for process engineers trying to improve current drive by adding compressive strain to the P-channel. At 32nm, Intel was closer to the 20% end of the scale and PFET performance will continue to catch up to NFETs at 22nm.

While AMD’s transition to 32nm high-K metal gate (HKMG) processing is a major factor keeping power consumption in check, that’s not the highlight of today’s presentation. Although Llano will be the first chip in the consumer market to use the HKMG stack on SOI substrate (actually the first 32nm SOI as well), AMD lags their competition at Intel in both the process node and the material innovation since Intel launched 32nm in the fourth quarter of 2009 and brought high-K and metal gates into the mainstream on their 45nm node that hit the street two years earlier.

AMD has done a good job of tempering what I’m sure is a lot of internal excitement (loosely interpreted to include Global Foundries as well) over the launch of their own 32nm process considering the lead Intel maintains in this area. They know that too much emphasis here would put the follower’s spotlight on them since most people would not understand the additional effort necessary to roll out HKMG on the SOI platform compared to bulk silicon.

Some who understand the added complexity of SOI would probably still debate the business decision to pursue it over bulk technology. But AMD maintains its strong belief in the advantages of SOI, and they will highlight that again in the Naffziger talk this afternoon.

So when will Llano hit the market? AMD’s PR team promises that the chips will sample to its customers by June this year. Expect it in consumer goods some time in 2011. – ISSCC: Intel has edge over AMD, for now – ISSCC: Intel has edge over AMD, for now.

SAN FRANCISCO, Calif. — Intel Corp. has a significant, if temporary, edge over archrival Advanced Micro Devices based on news and papers emerging here Monday (Feb. 8) at the International Solid State Circuits Conference (ISSCC).

Intel described at ISSCC its first 32nm server processor to use six cores. Meanwhile AMD discussed a new core it will use in its first processor to combine x86 and graphics cores called Llano.

Separately, Intel announced Monday its long-delayed Itanium 9300. It is Intel’s first Itanium chip to use the company’s QuickPath Interconnect letting OEMs link eight multicore processors with additional logic. To date, AMD has been limited to linking four chips in a symmetric multiprocessing system without the need for extra chips.

Intel’s Westmere EP is a 32nm server CPU using six dual-threaded cores linked to DDR3 memory. It leapfrogs AMD’s existing 45nm Istanbul server chip, launched in June that uses six single-threaded cores and links to DDR2 memory.

Intel said it will roll out in 90 days an eight-core server chip, Nehalem EX, made in a 45nm process. AMD is expected to respond later this year with a 12-core CPU called Magny Cours. It will put two six-core die in a package that links to DDR3 memory.

Intel’s six”core Westmere packs 1.17 billion transistors, uses a 12 MByte shared L3 cache and supports low-voltage DDR3 memory. Intel’s ISSCC paper describes a new anti-resonance feature in Intel’s QuickPath Interconnect that lowers jitter on the chip.

Meanwhile, AMD is providing only a few details about Llano, a version of its 45nm x86 core upgraded for use in the company’s first processor to merge x86 and graphics cores. Llano will use four x86 and one graphics core, link to DDR3 memory, sample this year and ship in PCs in 2011.

Intel showed a working version of its first Westmere processor at last year’s ISSCC. That chip combines separate 45nm graphics and 32nm x86 cores in a single chip package and is shipping in systems now. Intel has said it will put graphics and x86 cores on a single die with a 2011 chip that uses its next-generation microarchitecture called SandyBridge.

AMD revealed the x86 core in Llano measures 9.69mm2 and uses 35 million transistors, excluding a Mbyte L2 cache block. It will run at up to 3 GHz and operates across a 0.8 to 1.2 V range while dissipating up to 25W. It is made in a 32nm silicon-on-insulator process.

In its paper, AMD detailed power saving techniques used in the core. They include use of a novel NFET power grating transistor and a clock grid optimized to reduce clock buffers and clock switching power. AMD did not talk about the graphics core used in Llano or any other details of the processor beyond its core.

The Llano x86 core is an anomaly of sorts. Most of AMD’s future CPUs will employ one of two new x86 cores, called Bobcat and Bulldozer. The company so far has not revealed many details about those cores expected to emerge in products starting in 2011.

The new AMD cores will compete with the SandyBridge 32nm microarchitecture Intel will likely reveal late this year. The new cores will set up the next round of leapfrog between the two archrival’s products.

Separately at ISSCC, Intel also described new low power circuits used in its Westmere device that includes two x86 cores and one graphics core. The chip uses low voltage DDR3 links running at up to 1.3 GigaTransfers/second with new fast wake up circuits. It also applies new power management techniques to the graphics core that runs from 150 to 500 MHz and to analog circuits on the chip.

Intel will also present a handful of papers on research efforts exploring more aggressive multicore architectures.

One paper describes a 45nm device that uses 48 Pentium-class cores on a message-passing network. It marks a step forward from a chip the company discussed at previous ISSCC sessions using 80 cores that were essentially floating-point units, not full x86 cores.

A second paper presented Monday discusses an 8×8 mesh network on a chip that delivers 2.6 Terabits/s in throughput using a circuit switched technique. The approach aims to save power by setting up direct point-to-point links across a chip, eliminating buffers. – ISSCC: Expert picks winner for post-CMOS era – ISSCC: Expert picks winner for post-CMOS era.

SAN FRANCISCO — Chip scaling is expected to continue for at least the next 15 years, according to one expert, who also predicted perhaps the next technology after the post-CMOS era.The winner in the post-CMOS era has not been declared yet, but graphene holds great promise, said James Meindl, director of the Joseph M. Pettit Microelectronics Research Center and Pettit Chair Professor of Microelectronics at the Georgia Institute of Technology in Atlanta, Georgia.

”We will continue to scale vigorously for the next 15 years,” he said during a keynote at the International Solid State Circuit Conference (ISSCC) here. “Beyond silicon microchip technology, revolutionary developments in nanoelectronics, perhaps centering on graphene, may evolve.”

For processors, silicon could scale to the 7.9-nm node, which is slated for 2024. Before or after that, graphene could enable future terascale computing, he said.

So why graphene over the other post-CMOS technology candidates, such as spintronics, molecular electronics, and others? Some claim graphene chips are 100-to-1,000 times faster than silicon. Graphene is the crystalline form of carbon that self-assembles into two-dimension hexagonal arrays perfect for fabricating electronic devices.

Unfortunately, when conventional deposition techniques are used with carbon to grow sheets much larger than one inch, they tend to degenerate into irregular graphite structures. Graphene has higher carrier mobility than silicon, but has been hampered by the lack of a band gap, which has kept the on-off ratio of graphene transistors dismally low–usually less than 10 compared to hundreds for silicon.

Recently, several companies have made headlines in the area. A 100-GHz transistor has recently been demonstrated by IBM Research. Fabricated on new 2-inch graphene wafers and operating at room temperature, the RF graphene transistors are said to beat the speeds of all but the fastest GaAs transistors, paving the way to commercialization of high-speed, carbon-based electronics.

The next-generation of semiconductors could be based on carbon instead of silicon, according to Penn State researchers, who recently claimed to have perfected a method of fabricating pure sheets of carbon semiconductor–called graphene–on 100 millimeter (4-inch) wafers.

Here’s six reasons why Meindl thinks graphene will drive the industry in the post-CMOS era:

1. Graphene has ”a mechanical strength-to-weight ratio exceeding that of any known material.”

2. ”Carrier mobility exceeds 200,000-cm2/Vs.”

3. ”Carriers with zero effective mass that propagate as ‘Dirac fermions’ in a manner similar to photons with a velocity 300 times less than the speed of light without scattering for distances in the micrometer range.”

4. ”The capacity to conduct current densities as large as one thousand times greater than copper without electromigration.”

5. ”Record values of more than 5,000W/mK for room temperature thermal conductivity.”

6. ”The capability to serve as a source, channel drain regions of a field effect transistor (FET) and as an interconnect.”