|
Chip geeks and semiconductor mavens from around the
world are converging on San Francisco this week to show off their
latest innovations at the annual IEEE's International Solid State
Circuits Conference. This is one of the two annual events where a lot
of advances in chip design are first revealed to the world--the other
being the Hot Chips conference hosted in the summer. At this week's
event, Intel, IBM, Advanced Micro Devices, Sun Microsystems, and PA Semi are showing off future server microprocessors.
Chip geeks and semiconductor mavens from around the
world are converging on San Francisco this week to show off their
latest innovations at the annual IEEE's International Solid State
Circuits Conference. This is one of the two annual events where a lot
of advances in chip design are first revealed to the world--the other
being the Hot Chips conference hosted in the summer. At this week's
event, Intel, IBM, Advanced Micro Devices, Sun Microsystems, and PA Semi are showing off future server microprocessors.
IBM's dual-core Power6 chip will be the first chip to be detailed at
ISSCC. The Power6 chip, which is expected to first appear in IBM's
System p AIX and Linux servers this year and in its System i
proprietary servers possibly in 2008, will be made using a 10-level 65
nanometer copper/SOI/low-k chip process that is being perfected right
now in the company's East Fishkill, New York, chip fabs. IBM is now
confirming that the Power6 chip will have a clock speed in excess of 5
GHz and will consume less than 100 watts. IBM is also confirming that
the chip has 700 million transistors and will have a die size of 341
square millimeters.
As previously reported, the Power6 chip includes VMX vector
co-processors for each core and a new decimal floating point unit that
does "money math" instead of the normal base 2 math done by processors.
This is the first time anyone has put a decimal FP unit into a
production chip, and given the commercial nature of the System p and
System i product lines, this is not surprising. (It would also not be
surprising to see the decimal FP unit as well as VMX co-processors
appear in future processors for IBM's System z family of mainframes.)
Each Power6 core will have its own dedicated 2 MB L2 cache and a shared
32 MB L3 cache for each chip (which has two cores).
Intel is providing a few more details on a research project it created
to push the limits of number-crunching on a single piece of silicon by
putting 80 RISC processor cores onto a chip. The company talked very
generally about this project back in September 2006 at Intel Developer
Forum, which is it now referring to as a "network on a chip." This chip
consists of 80 tiles, as Intel is calling the floating point cores,
arranged in a 10 by 8 grid. These cores, which only do mathematical
calculations and which are not based on either the X64 or Itanium
instruction set architectures (but which probably are a subset of the
i960 RISC processor Intel created more than a decade ago), operate at 4
GHz. The chip includes routers that link the math units together so
they can share the results of calculations with each other, much as
nodes in massively parallel supercomputers do today to model weather,
chemical processes, physics experiments, and other kinds of complex
natural systems.
The Intel RISC chip includes fine-grained clock gating, which is a
technique that allows sets of transistors to be turned down to their
idle state if they are not required by current transactions.
Fine-grained clock gating is something that all chip makers will
eventually work into their designs because it radically reduces the
power consumption on the chip. The Intel chip can deliver 1.28
teraflops of aggregate peak floating point performance--about what you
can cram into a rack of X64 servers these days--and Intel says that
running on 1 volt of juice, it can deliver 1 teraflops of performance
and only dissipate 98 watts of heat. This is stunning. This chip is 275
square millimeters and has 100 million transistors--that's pretty big
for a chip with relatively few transistors, and particularly so given
that the chip was made using Intel's 65 nanometer chip process. The
chip also needs to have a memory controller and memory chips grafted on
it--something that the company is working on right now.
Intel is also expected to talk a bit about its dual-core "Merom" Core 2
Duo chip for laptops, which are due to be tweaked this year with faster
front side buses and higher clock speeds. The Merom chip Intel will
show off at ISSC has 291 million transistors and has an area of 143
square millimeters. It has a shared 4 MB L2 cache for the two cores,
with clock speeds that range from 1 GHz to 3 GHz and a bus speed that
ranges from 666 MHz to 1333 MHz.
AMD will be sandwiched in between the Intel RISC and Merom chip
presentations, and will reveal some more details on its future
quad-core "Barcelona" Rev F Opteron processors. AMD put out some of the specs on the Barcelona chips a few weeks ago,
boasting about the true quad-core nature of its design, which includes
a revamped Opteron core with faster math processing and what it says
will be better support for virtualized environments.
But today at ISSCC, AMD wants to talk about the power-saving features
of the Opteron chips. The initial Opteron processors--the so-called C0
stepping in the industry lingo--did not have AMD's PowerNow power
management features, which were originally created for the laptop
variants of its Athlon processors. But in February 2004, PowerNow was
added to the CG stepping of the Opterons, which plug into the 940-pin
sockets AMD created for the Opterons. The PowerNow features allow the
voltage and clock speeds of the older Rev E and the current dual-core
Rev F Opteron processors to be stepped down into 200 MHz increments in
five stages (for a total reduction of 1 GHz off the top-end clock
speed), and then drop down to a base idle speed of 1 GHz. The average
Opteron processor running a heavy load might be rated at 95 watts, but
it only burns about 70 watts of juice typically and with PowerNow
features, that number can be dropped to around 32 watts in the idle
state. That is a power reduction of 75 percent from the maximum load to
idle load in the Opteron chip. This is a tremendous reduction in power
consumption and heat dissipation for systems that have uneven
workloads--which is true of most computers.
While the Opteron processors were designed to have two and four cores
from the getgo, the PowerNow features were not designed to gate the
power consumption of each individual core on the chip, but rather the
chip as a whole. With the future quad-core Rev F Opterons, AMD is going
to be able to step down in 100 MHz increments with PowerNow--providing
more granularity in the performance and power consumption reduction of
a chip--and will also allow each core on the chip to be individually
gated using PowerNow features.
The new Rev F design also splits up the other connectivity circuits on
the chip (called a northbridge) that link processor cores to the
on-chip main memory controller. So even if a chip has a workload that
requires a lot of CPU but little memory, the memory subsystem can be
stepped down to an idle state. Or, conversely, if a workload is memory
or I/O intensive but is not doing a lot of calculation, then the CPU
cores can be idled. The memory controller accounts for around 10 watts
of the 95 watts in a standard Opteron part.
By adding these technologies to the Barcelona chip, AMD is confident
that it can increase performance (probably around 60 percent or more on
generic workloads) and still keep within the same thermal envelope.
That includes 2 MB of on-chip shared L3 cache and 1 MB of L2 cache per
core, by the way.
Sun will be on hand at ISSCC to talk about its second generation of
"Niagara" Sparc T1 processors. Sun went over its Sparc processor
roadmap at its security analyst meeting a week ago, and said that it
was adding a two-way variant of the Niagara-2 chip called "Victoria
Falls." The Niagara-2 chip has eight cores, each with eight threads and
each with a floating point unit. The current first generation Niagara
chips have eight cores with four threads each and a shared floating
point unit for the entire chip. The Niagara-2 chip that Sun is
detailing at the show has 4 MB of L2 cache, one x8 PCI-Express slot,
two 10 Gigabit Ethernet ports, and 8 FB-DIMM memory slots driven by an
on-chip memory controller. The Niagara-2 chip has 500 million
transistors and a 342 square millimeter die size; it is implemented in
an 11-layer, 65 nanometer process by Sun's fab partner, Texas Instruments. Niagara-2 chips are expected to be in servers in the second half of 2007.
Finally, upstart PowerPC chip cloner PA Semi will show off its
dual-core PA-6T processor, which runs at 2 GHz, has its own crossbar
interconnect implemented on chip for scaling out single system images,
and has 2 MB L2 caches on chip for the cores. The initial PA-6T
processor has a tiny 115 square millimeter die size and only consumes
25 watts on peak workloads; it runs at between 5 watts and 13 watts on
normal workloads. The chip uses clock gating techniques to offer
considerable performance, and will be in production in the fourth
quarter of this year.
Read the original article:
|