|
Chip manufacturers are creating multi-core processors with
cores that handle graphics, mathematics, floating point computation and general
purpose computing, says Akhtar Pasha
The
rapid pace of development in the world of multi-core CPUs (perhaps more importantly,
the arrival of massively parallel cores in the Graphics Processing Units) will
turn PCs into supercomputers. Each core is now programmed to handle a discrete
task such as graphics, math processing, general purpose computing and encryption/decryption
to increase the performance of the CPU as a whole. Additionally other system
components are packaged onto a single piece of silicon, adding muscle to the
CPU.
Savor this—IBM is expected to get to eight-cores with Power7 next year
(their Cell Processor already has eight-cores), Intel will get to eight cores
with Nehalem Xeons this year, Sun Microsystems already has eight cores per chip
(with eight threads per core) with its Niagara family of SPARC T Series and
will boost that to 16 cores and 32 threads with its Rock UltraSPARC-RK processors
due later this year. Then we have AMD that has quad-core Shanghai Opterons and
is boosting this to six-cores with Istanbul Opterons and will later put six
core chips into a single package with Magny-Cours Opterons.
While we set out looking for answers as to how some cores of the CPU can handle
specific tasks such as security, GPU and math, we found that cores alone do
not guarantee performance. To understand we need to go back to why multi-cores
are being developed.
Karthik Ramarao, Director Technology-Systems Practice, Sun Microsystems India
said, “Processor manufacturing has improved significantly from 90 nm to
45 nm to 32 nm, which has allowed chip vendors to add more transistors into
the same silicon—leading to more real-estate being available to them to
add components closer to the processor and that increases the performance of
the CPU in addition to giving more cores per CPU.”
Vamsi Krishna, Senior Technical Manager, AMD India, added, “While new manufacturing
technologies allow chip vendors to put more cores per CPUs, at the same time
many components are now built closer to the CPU such as GPU (Graphical Processing
Unit), memory controllers and networking that will reduce manufacturing cost.”
Adding system components to CPUs
According to Krishna, “If we look at each core of a multi-core
CPU, it is a heterogeneous core, which has schedulers that find out whether
it wants to handle general integers using the CPU or more compute intensive
mathematics [math core processing] such as Floating Point Units (FPUs). While
CPUs favors sequential data streaming, the GPU does parallel data streaming
useful for large computation datasets [math] and graphics for video and gaming
applications. A CPU alone cannot give performance. The CPU is important but
now there are other blocks that contribute to the experience a consumer gets
in a machine.”
To this end, AMD has combined the functions of three chips, the GPU, core-logic
chip set and CPU, into a single chip ‘Fusion’ that is code-named Swift.
The GPU on the Fusion chip will include multiple ‘mini-cores’ that
breaks down code from a program, such as 3D games, to process data faster. This
results in better-balanced platforms capable of running demanding computing
tasks faster than ever. Krishna added, “We have added ATI Avivo Video Converter
utility that helps transcode HD video at much faster rate than is possible with
the CPU alone. The moment you put graphics card and the system identifies that
a graphics request has been placed and instead of processing this with the CPU
it is automatically transferred to the GPU. This Fusion chip is expected to
come out in 2011.” AMD plans to bring Fusion in multiple variants such
as dual-core with single GPU or quad-core with single GPU and more.
Sun Microsystems uses chip multithreading or CMT. Having
recognized that the speed of data access from memory was the critical bottleneck,
in Sun’s CMT architecture, applications are divided into active threads,
and each processor core is designed to switch between up to eight threads during
each clock cycle in the UltraSPARC T2 processor. Even if a particular thread
stalls while waiting for data to be available from memory, the core can switch
immediately to another thread and the pipeline remains continuously active,
doing useful work. The processor’s entire execution pipeline is thus optimized
to execute active threads as much as possible, rather than be held up by any
particular thread waiting for data to be made available from memory. The negative
effect of memory latency is therefore masked and minimized. Ramarao added, “The
Niagara II eight-core has capabilities that handle encryption/de-encryption
at the chip level as well and we have packaged other components such as 10GE,
PCI Controllers and memory controllers closure to the CPU to reduce latency
issues and increase performance.” He however added that cores alone cannot
be defined to execute a job.
Sreenath Chary, Business Unit Executive, Cross Server-Sales, IBM India, said,
“The IBM Cell Processor has eight-cores and each of these is capable of
doing specific tasks such as it has a general purpose CPU, 2 FPU (floating point
processing unit) for complex floating points, and more. We have added a GPU
at the chip level because for computing heavy video conversion such as MPEG
2 used in a DVD player requires a huge amount of multiplication, which a general
purpose CPU cannot do.” Chary added that the Cell processor also known
as the Cell Broadband Engine consists of a PowerPC Processing Element (PPE),
which acts as the primary processor to distribute tasks, an MFC (Memory Flow
Controller), which interfaces between the computing and memory units and eight
SPUs (Synergistic Processing Units), which are the hardware cells with their
own memory. The Cell CPU is a combination of a Power processor with eight small
vector processors. All these Units interconnect via an EIB (Element Interconnect
Bus) and communicate with the peripheral devices or other CPUs via a FlexIO
Interface. Each of the SPUs is capable of doing eight floating points operations
per cycle at 4.4 GHz—giving higher performance to multi-cores.
Over next few years, the dramatic expansion of programmable, multi-core integrated
chips attached to CPU will allow substantial enhancements in data manipulation
and presentation. Additionally vendors also point out that main bottleneck with
regard to system performance is arguably the disconnect between CPU speeds and
memory. Memory speeds have come nowhere close to keeping pace with the processor
speeds. This is where the industry needs to re-group and focus more responsively.
Read the original article: http://www.expresscomputeronline.com/20090511/technology01.shtml
|