Lawrence Berkeley National Lab researchers are looking at is configurable processor technology developed by Tensilica Inc. The company offers a set of tools that system developers can employ to design both the SoC and the processor cores themselves. A real-world implementation of this technology. LBNL estimate that a 10 petaflop peak system built with Tensilica technology would only draw 3 megawatts and cost just $75 million. It’s not a general-purpose system, but neither is it a one-off machine for a single application (like Japan’s MD-GRAPE machine, for example). A 10 petaflop Opteron-based system was estimated to cost $1.8 billion and require 179 megawatts to operate; the corresponding Blue Gene/L system would cost $2.6 billion and draw 27 megawatts. Extrapolating the half petaflop Barcelona-based Ranger supercomputer to 10 petaflops, it would require about 50 megawatts and cost $600 million (although it’s widely assumed that Sun discounted the Ranger price significantly). A 10 petaflop Blue Gene/P system would draw 20 megawatts, with perhaps a similar cost as the Blue Gene/L.

– AMD Opteron: Commodity Approach – Lower efficiency for scientific applications
offset by cost efficiencies of mass market
• Popular building block for HPC, from commodity to tightly-coupled XT3.
• Our AMD pricing is based on servers only without interconnect
– BlueGene/L: Use generic embedded processor core and customize System on Chip
(SoC) services around it to improve power efficiency for scientific applications
• Power efficient approach, with high concurrency implementation
• BG/L SOC includes logic for interconnect network
– Tensilica: In addition to customizing the SOC, also customizes the CPU core for
further power efficiency benefits but maintains programmability
• Design includes custom chip, fabrication, raw hardware, and interconnect

10 petaflops of sustained performance would cost 10-20 times more, which would be available for the same price in 5 years with Moore’s Law.

So by 2012-2013, a 100-200 petaflop peak performance supercomputer based on configurable processors would be $75 million and an exaflop supercomputer would be in the $375-750 million range in 2012-2013.

The development of a lot of petaflop affordable power in supercomputers would help fulfill a couple of my computing predictions from 2006

10 petaflop supercomputer by 2012-2013
Petaflop personal computers and wearable computing 2016-2018

Personal petaflop machines seem likely to come about from better GPGPUs, FPGAs and mainstreaming several configurable components.
Another breakthrough is for four times as much memory in cheaper servers. More memory is needed for high performance applications

New memory controller allows four times as much memory to be placed into existing servers

MetaSDRAM is a drop-in solution that closes the gap between processor computing power, which doubles every 18 months — and DRAM capacity, which doubles only every 36 months. Until now, the industry addressed this gap by adding higher capacity, but not readily available, and exponentially more expensive DRAM to each dual in-line memory module (DIMM) on the motherboard.

The MetaSDRAM chipset, which sits between the memory controller and the DRAM, solves the memory capacity problem cost effectively by enabling up to four times more mainstream DRAMs to be integrated into existing DIMMs without the need for any hardware or software changes. The chipset makes multiple DRAMs look like a larger capacity DRAM to the memory controller. The result is “stealth” high-capacity memory that circumvents the normal limitations set by the memory controller. This new technology has accelerated memory technology development by 2-4 years.

FURTHER READING
Powerpoint describing the Berkeley National Lab plan for customized chips for more efficiency and powerful supercomputers

Research paper on the IBM Kittyhawk project to build a global scale computer IBM wants to use supercomputers to handle many kinds of large scale applications more efficiently than with clusters of boxes.

A glimpse of how this might take shape was revealed in a recent IBM Research paper that described using the Blue Gene/P supercomputer as a hardware platform for the Internet. The authors of the paper point to Blue Gene’s exceptional compute density, highly efficient use of power, and superior performance per dollar. Regarding the drawbacks of the current infrastructure of the Internet, the authors write:

At present, almost all of the companies operating at web-scale are using clusters of commodity computers, an approach that we postulate is akin to building a power plant from a collection of portable generators. That is, commodity computers were never designed to be efficient at scale, so while each server seems like a low-price part in isolation, the cluster in aggregate is expensive to purchase, power and cool in addition to being failure-prone.

The IBM’ers are certainly talking about a more general-purpose petascale application than the Berkeley researchers, but one aspect is the same: ditch the loosely coupled, commodity-based systems in favor of a tightly coupled, customized architecture that focuses on low power and high throughput. If this is truly the model that emerges for ultra-scale computing, then the whole industry is in for a wild ride.

Brian Wang

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

2 thoughts on “Tensilica configurable processors could make affordable petaflop and exaflop supercomputers”

al fin

April 27, 2007 at 3:55 pm

The new test has better specificity and selectivity than the PSA, but for a universal screening test it’s still not perfect. Confirming the screen with prostate biopsy is likewise hit or miss, in early stages.

Treatment for prostate cancer leaves a lot to be desired, so you do not want to undergo treatment unnecessarily.

By the way, the PSA gives a continuous value, rather than a positive or negative. It is possible to improve specificity by looking at the free/total PSA ratio.

Comments are closed.