May 31, 2010

China has the Second and Seventh Fastest Supercomputers and Germany has Fifth and the USA has the rest of the Top Ten

The June 2010 list of the top 500 supercomputers has been released.

1. Jaguar - Oak Ridge National Labs, Cray XT5 1.75 petaflops
2. Nebulae - China - Shenzhen
Dawning TC3600 Blade, Nvidia/Intel mix 1.27 petaflops
3. Roadrunner - DOE IBM Powercell 1.04 petaflops
4. Kraken - XT5 Tennessee 832 teraflops
5. Jugene - Germany IBM Blue Gene 825 teraflops

A Chinese system called Nebulae, build from a Dawning TC3600 Blade system with Intel X5650 processors and NVidia Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a Chinese system ever achieved. There are now 2 Chinese systems in the TOP10 and 24 in the TOP500 overall.

China keeps increasing its number of systems to 24 and is now tied with Germany (steadily declining) for spot No. 4 after the USA, UK and France.

The UK Register discusses the supercomputers.

Jaguar is an XT5 massively parallel cluster with a 3D torus interconnect that currently has six-core Opteron 8400 processors and uses Cray's SeaStar2+ interconnect. It has 224,162 cores to deliver a peak theoretical performance of 2.33 petaflops and delivers 1.76 petaflops of sustained performance on the Linpack Fortran matrix math benchmark. Jaguar could be upgraded with the twelve-core XT6 Opteron blades and the new "Gemini" interconnect, which Cray debuted last week as the XE6 super, formerly code-named "Baker" and easily doubling performance.

Thus far, Oak Ridge has not divulged its plans, but is monkeying around with x64 clusters and Nvidia next-generation "Fermi" GPUs. It would be interesting to see what a next-generation "Cascades" super from Cray, using the "Aries" interconnect (a kicker to the just-announced Gemini), Intel Xeon processors (very likely "Sandy Bridge" Xeons with eight or more cores each), and Nvidia GPUs might do in terms of sustained performance. We'll have to wait a few years to see that, and it may be at Oak Ridge and it may not.

But for the moment, China's NSCS is enthusiastically adopting Dawning's TC3600 blade servers, equipped with Intel's six-core X5650 processors and Nvidia's C2050 GPUs. The exact configuration of the Nebulae machine at NSCS was not available at press time, but the TC3600 blade server is a 10U chassis that holds ten two-socket blades. The C2050s are PCI-Express GPU co-processors with 448 cores and 3 GB of their own GDDR5 memory, rated at 515 gigaflops doing double-precision floating point math and 1.03 teraflops doing single-precision. The Top 500 ranking for Nebulae does not provide blade or GPU count, but the word on the street is that it has 4,700 nodes. What the Top 500 does say the machine has 120,640 cores in total for a peak theoretical performance of 2.98 petaflops and 1.27 petaflops sustained running the Linpack test. All of the nodes in the Dawning blade cluster are linked by quad data rate (40 Gb/sec) InfiniBand switches

The first thing to notice about the Jaguar and Nebulae supers is the difference between peak and sustained performance. For the Cray Jaguar Opteron cluster, 75.5 per cent of the flops contained in the box end up doing real Linpack work, while on the Dawning Xeon-Tesla hybrid, only 42.6 per cent of the peak performance embodied in the CPUs and GPUs actually push Linpack math. So it would seem that the all-X64 machine has the edge, right? Wrong. Jaguar cost around $200m to build and burns around 7 megawatts of juice, while the Nebulae machine probably costs on the order of $50m (that's an El Reg estimate) and burns only 2.55 megawatts of juice.

When you do the math, as far as Linpack is concerned, Jaguar takes just under 4 watts to deliver a megaflops at a cost of $114 per megaflops for the iron, while Nebulae consumes 2 watts per megaflops at a cost of $39 per megaflops for the system.


59 pages of statistics with a lot of country and vendor breakdowns.
