Pages

February 25, 2008

Tensilica configurable processors could make affordable petaflop and exaflop supercomputers

Lawrence Berkeley National Lab researchers are looking at is configurable processor technology developed by Tensilica Inc. The company offers a set of tools that system developers can employ to design both the SoC and the processor cores themselves. A real-world implementation of this technology. LBNL estimate that a 10 petaflop peak system built with Tensilica technology would only draw 3 megawatts and cost just $75 million. It's not a general-purpose system, but neither is it a one-off machine for a single application (like Japan's MD-GRAPE machine, for example). A 10 petaflop Opteron-based system was estimated to cost $1.8 billion and require 179 megawatts to operate; the corresponding Blue Gene/L system would cost $2.6 billion and draw 27 megawatts. Extrapolating the half petaflop Barcelona-based Ranger supercomputer to 10 petaflops, it would require about 50 megawatts and cost $600 million (although it's widely assumed that Sun discounted the Ranger price significantly). A 10 petaflop Blue Gene/P system would draw 20 megawatts, with perhaps a similar cost as the Blue Gene/L.

- AMD Opteron: Commodity Approach - Lower efficiency for scientific applications
offset by cost efficiencies of mass market
• Popular building block for HPC, from commodity to tightly-coupled XT3.
• Our AMD pricing is based on servers only without interconnect
- BlueGene/L: Use generic embedded processor core and customize System on Chip
(SoC) services around it to improve power efficiency for scientific applications
• Power efficient approach, with high concurrency implementation
• BG/L SOC includes logic for interconnect network
- Tensilica: In addition to customizing the SOC, also customizes the CPU core for
further power efficiency benefits but maintains programmability
• Design includes custom chip, fabrication, raw hardware, and interconnect


10 petaflops of sustained performance would cost 10-20 times more, which would be available for the same price in 5 years with Moore's Law.

So by 2012-2013, a 100-200 petaflop peak performance supercomputer based on configurable processors would be $75 million and an exaflop supercomputer would be in the $375-750 million range in 2012-2013.



The development of a lot of petaflop affordable power in supercomputers would help fulfill a couple of my computing predictions from 2006

10 petaflop supercomputer by 2012-2013
Petaflop personal computers and wearable computing 2016-2018

Personal petaflop machines seem likely to come about from better GPGPUs, FPGAs and mainstreaming several configurable components.
Another breakthrough is for four times as much memory in cheaper servers. More memory is needed for high performance applications

New memory controller allows four times as much memory to be placed into existing servers

MetaSDRAM is a drop-in solution that closes the gap between processor computing power, which doubles every 18 months -- and DRAM capacity, which doubles only every 36 months. Until now, the industry addressed this gap by adding higher capacity, but not readily available, and exponentially more expensive DRAM to each dual in-line memory module (DIMM) on the motherboard.

The MetaSDRAM chipset, which sits between the memory controller and the DRAM, solves the memory capacity problem cost effectively by enabling up to four times more mainstream DRAMs to be integrated into existing DIMMs without the need for any hardware or software changes. The chipset makes multiple DRAMs look like a larger capacity DRAM to the memory controller. The result is "stealth" high-capacity memory that circumvents the normal limitations set by the memory controller. This new technology has accelerated memory technology development by 2-4 years.


FURTHER READING
Powerpoint describing the Berkeley National Lab plan for customized chips for more efficiency and powerful supercomputers

Research paper on the IBM Kittyhawk project to build a global scale computer IBM wants to use supercomputers to handle many kinds of large scale applications more efficiently than with clusters of boxes.

A glimpse of how this might take shape was revealed in a recent IBM Research paper that described using the Blue Gene/P supercomputer as a hardware platform for the Internet. The authors of the paper point to Blue Gene's exceptional compute density, highly efficient use of power, and superior performance per dollar. Regarding the drawbacks of the current infrastructure of the Internet, the authors write:

At present, almost all of the companies operating at web-scale are using clusters of commodity computers, an approach that we postulate is akin to building a power plant from a collection of portable generators. That is, commodity computers were never designed to be efficient at scale, so while each server seems like a low-price part in isolation, the cluster in aggregate is expensive to purchase, power and cool in addition to being failure-prone.

The IBM'ers are certainly talking about a more general-purpose petascale application than the Berkeley researchers, but one aspect is the same: ditch the loosely coupled, commodity-based systems in favor of a tightly coupled, customized architecture that focuses on low power and high throughput. If this is truly the model that emerges for ultra-scale computing, then the whole industry is in for a wild ride.


2 comments:

Anonymous said...

Do you go to top500.org? They list the top 500 supercomputers in the world, and update the list every 6 months. They also have some great graphs showing performance growth over time.

bw said...

I believe that we will be exceeding the historic trend lines because of these new developments. Also, the top500 does not include specialized machines like Japan's Mdgrape. Exaflop machines made from GPGPU's and customized chips also may not officially count.