
Berkeley Lab has signed a collaboration agreement with Tensilica®, Inc. to explore the use of Tensilica’s Xtensa processor cores as the basic building blocks in a massively parallel system design. Tensilica’s Xtensa processor is about 400 times more efficient in floating point operations per watt than the conventional server processor chip shown here and is far smaller than a regular chip as shown above.
Here is an update on Berkeley Lab's Computational research division development of Tensilica configurable processor based supercomputers
Nextbigfuture had previously covered a plan and research paper analysis to use Tensilica configurable processors to make petaflop and exaflop supercomputers that were far more affordable and energy efficient.
Wehner, Oliker and Shalf, along with researchers from UC Berkeley, are working with scientists from Colorado State University to build a prototype system in order to run a new global atmospheric model developed at Colorado State.
They conclude that a supercomputer using about 20 million embedded microprocessors would deliver the results and cost $75 million to construct. This “climate computer” would consume less than 4 megawatts of power and achieve a peak performance of 200 petaflops. They have shown for the exascale computing regime, it makes more sense to target machine design for specific applications [at this time]. It is currently impractical from a cost and power perspective to build general-purpose machines like today’s supercomputers.
Under the agreement with Tensilica, the team will use Tensilica’s Xtensa LX extensible processor cores as the basic building blocks in a massively parallel system design. Each processor will dissipate a few hundred milliwatts of power, yet deliver billions of floating point operations per second and be programmable using standard programming languages and tools. This equates to an order-of-magnitude improvement in floating point operations per watt, compared to conventional desktop and server processor chips. The small size and low power of these processors allows tight integration at the chip, board and rack level and scaling to millions of processors within a power budget of a few megawatts.
May 08, 2008
Berkeley Labs and Tensilica working on energy efficient supercomputers
Posted by
bw
at
5/08/2008
0
comments
Links to this post
Labels: energy, exaflop, future, petaflop, supercomputer, united states
April 07, 2008
Onchip photonic communications for 2017 computer processors

Problems that need a lot of computing power and new architecture that will be needed to enable that speed. Click on the pictures for a larger image.
Communication challenge in ultradense computing devices. Chips are not full speed because communication is not fast enough.
DARPA MoleApps–Aim: 10**15devices/ cm**3
Currently: 17 nm half-pitch,3.5*10**11 /cm**2 demonstrated
Communication speed of 80 TB/s for full speed 2017 chips

FURTHER READING
Zettaflop workshop 2007
Computational challenge for systems biology and personalized medicine
Prospects for computing beyond CMOS
Programming techniques to harness Exaflops [and zettaflops]
Rethink hardware
–Parallelism is mainstream, but most cores are optimized for serial performance
–Need to design hardware for power and parallelism
Rethink software
– Massive parallelism
– Eliminate scaling bottlenecks replication, synchronization
Rethink algorithms
– Massive parallelism and locality
– Counting Flops is the wrong measure
Enabling technology for Zettaflops
Optical communication and nanomemory.
Zettaflop architecture report
Systems software for zettaflop systems
Things like billions of threads.
Energy of Computing in 2005
The biggest barrier to exaflops and zettaflops is the heat/power problem. Transistors may be cheap, but the energy they dissipate is not.
• Heat/power is not all in switching hardware; most of it is wattage for communication and memory. And clock switching is increasingly wasteful.
• In the long term, application programmers can help just as much as hardware engineers, by being less sloppy with memory use and precision demands.
We need to have new tools for analyzing power used in software. Less precision is more energy efficient (use just enough precision).
Posted by
bw
at
4/07/2008
0
comments
Links to this post
Labels: communication, exaflop, future weapons, ibm, intel, petaflop, silicon photonics, supercomputer, zettaflop
Intel forecasts Moore's law to continue until 2029
Pat Gelsinger, head of the Digital Enterprise Division at Intel, says that Moore's Law will continue until 2029 with zettaflop supercomputers at that time. [link is to his Intel Developers Forum keynote address, 80 pages, From Petaflops to Milliwatts]
Pat expects by 2017 it will be possible to create a complete genetic simulation of a cell, which would require an exaflop (10 to the 18th power floating-point operations) per second.
I have covered Tensilica's configurable processors which could be one of several approaches to accelerating or at least maintaining Moore's law computer performance progress to exaflops and beyond
From a 38 page study of detailing petaflop and exaflop scale computing challenges
Some have expressed concerns that silicon will stop having performance improvement [from shrinking lithography stalling out] in as little as four years.
The Inquirer also has another quote from Pat Gelsinger on Moore's law from the same event."I compare Moore's Law to driving down the road on a foggy night, how far can you see? Does the road stop after 100 metres? How far can you go?
"That's what it's been like with Moore's Law. We thought there were physical limits and we casually speak about going to 10 nanometres. "We have work going on different transistor structures. Silicon has become scaffolding for the rest of the periodic table. We're putting these other structures into the materials. We see no end in sight and we've had 10 years of visibility for the last 30 years.
Intels chips now and future
Tukwila chip
- Quad-core with 30 MB cache core with 30 MB cache
- 2 billion transistors
- Multi-threading technology threading technology
- Intel QuickPath QuickPath interconnect interconnect
- Dual integrated memory controllers
- Estimate 2 times performance of dual core Itanium 9100 series
- Mainframe-class RAS
Dunnington 6 cores
- 45nm high-k technology
- 1.9B transistors
- 16 MB L3 cache
- Caneland socket compatible socket compatible
- Latest Intel virtualization technologies
- 2H’08
FURTHER READING
Press room for the Intel Spring 2008 developer's forum
Zettaflop architecture challenges
Frontiers of Extreme Computing 2007 workshop was held in Santa Cruz, CA October 21-25, 2007.
Zettaflop applications
Ab initio million-atom electronic structure simulations.
Communication challenge in ultradense computing devices
DARPA MoleApps–Aim: 10**15devices/ cm**3
17 nm half-pitch,3.5*10**11 /cm**2 demonstrated
Communication speed of 80 TB/s for full speed 2017 chips

Posted by
bw
at
4/07/2008
4
comments
Links to this post
Labels: artificial intelligence, exaflop, future, petaflop, petaflops, supercomputer, zettaflop
March 14, 2008
Artificial Intelligence ? You're soaking in it.
This phrase was popularized by a television commercial campaign for Palmolive dish washing detergent. Madge, a manicurist, would comment on the dry, rough appearance of her client's skin as she worked on one hand while the other soaked in a bowl of light green liquid. The client would ask her advice; Madge would recommend Palmolive; the client would act surprised (after all, how could a dish washing detergent affect one's skin? Preposterous.). Then Madge would inform the client about the liquid in the bowl: "You're soaking in it," she'd say, in a very matter-of-fact tone. The shocked client would immediately remove her hand from the bowl, and Madge would guide it back down, assuring her that everything was fine: "Palmolive softens hands as you do dishes."
Program trading (using classic artificial intelligence techniques) is closing in on controlling half of all financial transactions in the world and 80% in the USA.
A third of all EU and US stock trades in 2006 were driven by automatic programs, or algorithms, according to Boston-based consulting firm Aite Group LLC. By 2010, that figure will reach 50 percent, according to Aite.
In 2006 at the London Stock Exchange, over 40% of all orders were entered by algo traders, with 60% predicted for 2007. American markets and equity markets generally have a higher proportion of algo trades than other markets, and estimates for 2008 range as high as an 80% proportion in some markets.
University endowments and corporate pension funds are distributed into Hedge Funds (20%) and stock, bond and commodity funds which are mostly algorithmically controlled. Particularly US markets with 80% program trading.
University endowment investments are described in this pdf
Some people like to mock the idea of Artificial Intelligence and Artificial General intelligence as "robot gods". The generally superior than human generated returns from program trading are helping to provide money for paycheck, pension and department budgets of those who mock AI and mock the idea that better AI is coming or that AI will have more and more influence on society.
Reality and facts would just get in the way of Dale's worldview.
Ray Kurzweil is on the vanguard of using even more advanced AI to run his own hedge fund. Part of $30 billion/year invested in hardware and software for financial trading and spending on improving the power and capabilities of those AI systems. As if better AI won't be adopted in this financial intelligence arms race.
A breakthrough that could happen this year [October, 2008] is a supercomputer able to model what people believe could pass a form of the Turing test.
Google is using artificial intelligence techniques to provide better searches and to provide better matching of advertising with search results.
It's pretty clear from what [Google co-founders] Larry Page and Sergey Brin have said in interviews that
Google sees search as essentially a basic form of artificial intelligence. A year ago, Google executives said the company had achieved just 5% of its complete vision of search. That means, in order to provide the best possible results, Google's search engine will eventually have to know what people are thinking, how to interpret language, even the way users' brains operate.
Google has lots of experts in artificial intelligence working on these problems, largely from an academic perspective. But from a business perspective, artificial intelligence's effects on search results or advertising would mean huge amounts of money.
Some of the most powerful AI will be trying to achieve the goal of anticipating what you want to buy when you want to buy it.
FURTHER READING: Many competing options to make computers millions of times more powerful than today.
Proper framing of the transhumanist debate
Promising new approach to molecular computing.
Brain simulation progress.
Tensilica configurable processors could make affordable petaflop and exaflop computers
New nanoscale metamaterial architecture for enabling an all optical computer.
More autonomous robots using better 3D freeze frame visual systems with LIDAR
The struggle over high risk high payoff research.
Quantum annealing can be millions of times faster than classical computers.
Predictions on artificial general intelligence.
Hardware for artificial intelligence.
Cognitive enhancement methods
Posted by
bw
at
3/14/2008
3
comments
Links to this post
Labels: artificial intelligence, economy, exaflop, future, neurons, optical computing, quantum computer, singularity, supercomputer
February 25, 2008
Tensilica configurable processors could make affordable petaflop and exaflop supercomputers
Lawrence Berkeley National Lab researchers are looking at is configurable processor technology developed by Tensilica Inc. The company offers a set of tools that system developers can employ to design both the SoC and the processor cores themselves. A real-world implementation of this technology. LBNL estimate that a 10 petaflop peak system built with Tensilica technology would only draw 3 megawatts and cost just $75 million. It's not a general-purpose system, but neither is it a one-off machine for a single application (like Japan's MD-GRAPE machine, for example). A 10 petaflop Opteron-based system was estimated to cost $1.8 billion and require 179 megawatts to operate; the corresponding Blue Gene/L system would cost $2.6 billion and draw 27 megawatts. Extrapolating the half petaflop Barcelona-based Ranger supercomputer to 10 petaflops, it would require about 50 megawatts and cost $600 million (although it's widely assumed that Sun discounted the Ranger price significantly). A 10 petaflop Blue Gene/P system would draw 20 megawatts, with perhaps a similar cost as the Blue Gene/L.
- AMD Opteron: Commodity Approach - Lower efficiency for scientific applications
offset by cost efficiencies of mass market
• Popular building block for HPC, from commodity to tightly-coupled XT3.
• Our AMD pricing is based on servers only without interconnect
- BlueGene/L: Use generic embedded processor core and customize System on Chip
(SoC) services around it to improve power efficiency for scientific applications
• Power efficient approach, with high concurrency implementation
• BG/L SOC includes logic for interconnect network
- Tensilica: In addition to customizing the SOC, also customizes the CPU core for
further power efficiency benefits but maintains programmability
• Design includes custom chip, fabrication, raw hardware, and interconnect
10 petaflops of sustained performance would cost 10-20 times more, which would be available for the same price in 5 years with Moore's Law.
So by 2012-2013, a 100-200 petaflop peak performance supercomputer based on configurable processors would be $75 million and an exaflop supercomputer would be in the $375-750 million range in 2012-2013.
The development of a lot of petaflop affordable power in supercomputers would help fulfill a couple of my computing predictions from 2006
10 petaflop supercomputer by 2012-2013
Petaflop personal computers and wearable computing 2016-2018
Personal petaflop machines seem likely to come about from better GPGPUs, FPGAs and mainstreaming several configurable components.
Another breakthrough is for four times as much memory in cheaper servers. More memory is needed for high performance applications
New memory controller allows four times as much memory to be placed into existing servers
MetaSDRAM is a drop-in solution that closes the gap between processor computing power, which doubles every 18 months -- and DRAM capacity, which doubles only every 36 months. Until now, the industry addressed this gap by adding higher capacity, but not readily available, and exponentially more expensive DRAM to each dual in-line memory module (DIMM) on the motherboard.
The MetaSDRAM chipset, which sits between the memory controller and the DRAM, solves the memory capacity problem cost effectively by enabling up to four times more mainstream DRAMs to be integrated into existing DIMMs without the need for any hardware or software changes. The chipset makes multiple DRAMs look like a larger capacity DRAM to the memory controller. The result is "stealth" high-capacity memory that circumvents the normal limitations set by the memory controller. This new technology has accelerated memory technology development by 2-4 years.
FURTHER READING
Powerpoint describing the Berkeley National Lab plan for customized chips for more efficiency and powerful supercomputers
Research paper on the IBM Kittyhawk project to build a global scale computer IBM wants to use supercomputers to handle many kinds of large scale applications more efficiently than with clusters of boxes.
A glimpse of how this might take shape was revealed in a recent IBM Research paper that described using the Blue Gene/P supercomputer as a hardware platform for the Internet. The authors of the paper point to Blue Gene's exceptional compute density, highly efficient use of power, and superior performance per dollar. Regarding the drawbacks of the current infrastructure of the Internet, the authors write:
At present, almost all of the companies operating at web-scale are using clusters of commodity computers, an approach that we postulate is akin to building a power plant from a collection of portable generators. That is, commodity computers were never designed to be efficient at scale, so while each server seems like a low-price part in isolation, the cluster in aggregate is expensive to purchase, power and cool in addition to being failure-prone.
The IBM'ers are certainly talking about a more general-purpose petascale application than the Berkeley researchers, but one aspect is the same: ditch the loosely coupled, commodity-based systems in favor of a tightly coupled, customized architecture that focuses on low power and high throughput. If this is truly the model that emerges for ultra-scale computing, then the whole industry is in for a wild ride.
Posted by
bw
at
2/25/2008
2
comments
Links to this post
Labels: computer memory, computers, exaflop, future, petaflop, predictions
Exaflop computer studies
$7.4 million in funding has been provided for researchers at Sandia and Oak Ridge National Laboratories to look at issues for exaflop computers They are preparing for the challenges of developing an exascale computer at the new Institute for Advanced Architectures.One such challenge is power consumption. "An exaflop supercomputer might need 100 megawatts of power, which is a significant portion of a power plant," said Dosanjh. "We need to do some research to get that down. Otherwise no one will be able to power one."
Then there's the issue of reliability, which tends to decline as the parts count increases. Given that an exascale computer might have a million hundred-core processors, Dosanjh speculated that such a machine might run for 10 minutes before suffering a failure. To manage a machine with so many parts, new fault-tolerance schemes need to be developed.
Data movement is also a critical concern, said Dosanjh. "The rate of memory access has not kept up with the ability of these processors to do floating point operations," he said.
And in addition to the hardware engineering challenges, programmers have to be educated to write code for such massively parallel systems. "As far as the industry is concerned, there needs to be an education effort as well to get people trained to write software at this scale," said Dosanjh.
Posted by
bw
at
2/25/2008
0
comments
Links to this post
March 15, 2007
Network of PS3 could deliver petaflop and even exaflop computing
Scientists believe that 10,000 idle PS3s can deliver over a petaflop This is four times more than IBM's BlueGene/L System, which cranks out 280.6 trillion calculations per second. If Sony could actually sell the PS3 with as much success as the PS2, then 100 million units could provide over an exaflop of computing power.
The cell roadmap is for 45nm chips by 2010 and be about 5 times more powerful with teraflop performance
The roadmap shows the already scheduled die shrink to 65 nm (just introduced March 12, 2007, making the Cell considerably cheaper to produce and reducing power consumption. Its die with 9 processors (1 PPE + 8 SPEs) is currently still 235 mm² in size and therefore at the level of IBM's top server chip POWER5+ with 243 mm² (for comparison: Intel Core2Duo - 143 mm²).
IBMs Dual-Cell Bladeserver hardware uses up to 315 Watts and we learn that Sony puts a 380 Watts PSU into their Playstation 3, a very comfortable power margin for the quiet-running performance product.
A new line of mid-class Cells is set to debut in 2008 with only 4 SPEs and a particular focus on low power consumption with cheap producibility in bulk silicon vs. the more complex SOI technology. Toshiba plans to scale it down to a single-SPU version for ultra-portable devices in 2010.
At the other end of the performance scale the renewed 5-years alliance will culminate in a teraflops processor. According to Cell architect Jim Kahle the performance goal can be achieved by 2010 with a new 32 SPE Cell die.
Posted by
bw
at
3/15/2007
0
comments
Links to this post

