Exaflop computer studies

$7.4 million in funding has been provided for researchers at Sandia and Oak Ridge National Laboratories to look at issues for exaflop computers They are preparing for the challenges of developing an exascale computer at the new Institute for Advanced Architectures.

One such challenge is power consumption. “An exaflop supercomputer might need 100 megawatts of power, which is a significant portion of a power plant,” said Dosanjh. “We need to do some research to get that down. Otherwise no one will be able to power one.”

Then there’s the issue of reliability, which tends to decline as the parts count increases. Given that an exascale computer might have a million hundred-core processors, Dosanjh speculated that such a machine might run for 10 minutes before suffering a failure. To manage a machine with so many parts, new fault-tolerance schemes need to be developed.

Data movement is also a critical concern, said Dosanjh. “The rate of memory access has not kept up with the ability of these processors to do floating point operations,” he said.

And in addition to the hardware engineering challenges, programmers have to be educated to write code for such massively parallel systems. “As far as the industry is concerned, there needs to be an education effort as well to get people trained to write software at this scale,” said Dosanjh.