Low-Power Chips to Model a Billion Neurons

IEEE Spectrum – A miniature, massively parallel computer, powered by a million ARM processors, could produce the best brain simulations yet.

The average human brain packs a hundred billion or so neurons—connected by a quadrillion (10^15) constantly changing synapses—into a space the size of a cantaloupe. It consumes a paltry 20 watts, much less than a typical incandescent lightbulb. But simulating this mess of wetware with traditional digital circuits would require a supercomputer that’s a good 1000 times as powerful as the best ones we have available today. And we’d need the output of an entire nuclear power plant to run it.

SpiNNaker, for Spiking Neural Network Architecture, is a machine that will look a lot like a conventional parallel computer, but it boasts some significant changes to the way chips communicate. We expect it will let us model brain activity with speeds matching those of biological systems but with all the flexibility of a supercomputer.

Over the next year and half, we will create SpiNNaker by connecting more than a million ARM processors, the same kind of basic, energy-efficient chips that ship in most of today’s mobile phones. When it’s finished, SpiNNaker will be able to simulate the behavior of 1 billion neurons. That’s just 1 percent as many as are in a human brain but more than 10 times as many as are in the brain of one of neuroscience’s most popular test subjects, the mouse. With any luck, the machine will help show how our brains do all the incredible things that they do, providing insights into brain diseases and ideas for how to treat them. It should also accelerate progress toward a promising new way of computing.

Stacked Deck: SpiNNaker’s machine architecture is divided into three fundamental layers. Each chip contains 18 cores that act like neurons, sending and receiving signals. All information on the connections’ delays and strengths is stored in a layer of synchronous dynamic RAM (SDRAM) on each chip, and all signals pass through a separate router layer.

Traditional CMOS chips were not invented with parallelism in mind, so it shouldn’t come as a big surprise that they have trouble mimicking mammalian brains, the best parallel machines on Earth. A few comparisons show why brain modeling is such a thorny problem. The logic gate in an integrated circuit is typically connected to just a few neighboring devices, but the neurons in the brain receive signals from thousands—sometimes even hundreds of thousands—of other neurons, some clear on the other side of the brain. Also, neurons are always at the ready, responding as soon as they receive a signal. Silicon chips, by contrast, rely on global clocks to advance computation in discrete time steps, an approach that consumes a lot of power. To top it all off, while the connections between CMOS-based processors are fixed, the synapses that link neurons are always in flux. Connections are constantly being forged or reinforced or phased out.

Given all these differences, it’s a wonder we can even begin to tackle the problem of simulating brain activity. But there have actually been some pretty impressive supercomputer models that have managed to reproduce neuron operation with great fidelity. The ongoing Blue Brain Project, led by Henry Markram at the École Polytechnique Fédérale de Lausanne, in Switzerland, is a prime example. The simulation, which began in 2005, now uses a 16 384-processor IBM BlueGene/P supercomputer and data collected from very detailed studies of brain tissue to simulate 10 000-neuron sections of the rat brain, each section no larger than the head of a pin.

Another team, led by Dharmendra Modha at IBM Almaden Research Center, in San Jose, Calif., works on supercomputer models of the cortex, the outer, information-processing layer of the brain, using simpler neuron models. In 2009, team members at IBM and Lawrence Livermore National Laboratory showed they could simulate the activity of 900 million neurons connected by 9 trillion synapses, more than are in a cat’s cortex. But as has been the case for all such models, its simulations were quite slow. The computer needed many minutes to model a second’s worth of brain activity.

One way to speed things up is by using custom-made analog circuits that directly mimic the operation of the brain. Traditional analog circuits—like the chips being developed by the BrainScaleS project at the Kirchhoff Institute for Physics, in Heidelberg, Germany—can run 10 000 times as fast as the corresponding parts of the brain. They’re also fabulously energy efficient. A digital logic circuit may need thousands of transistors to perform a multiplication, but analog circuits need only a few. When you break it down to the level of modeling the transmission of a single neural signal, these circuits consume about 0.001 percent as much energy as a supercomputer would need to perform the same task. Considering you’d need to perform that operation 10 quadrillion times a second, that translates into some significant energy savings. While a whole brain model built using today’s digital technology could easily consume more than US $10 billion a year in electricity, the power bill for a similar-scale analog system would likely come to less than $1 million.

SpinNNaker

The basic idea behind SpiNNaker is pretty simple. The machine will consist of 57 600 custom-designed chips, each of which contains 18 low-power ARM9 processor cores. Such chips are, of course, eminently programmable. At the center of each chip, we place a specially designed router that receives and directs all the packets coming from the cores and forms links with neighboring chips. We stack 128 megabytes of synchronous dynamic RAM, or SDRAM, on top of each chip to hold the connectivity information for up to 16 million synaptic connections.

As with most other brain models, SpiNNaker’s operation is centered on the “spike”—an idealization of the electrical impulse sent out by firing neurons. The information needed to model a spike is tiny: You can condense it down to a single packet containing just 40 bits. But things get complicated when you set out to pass around as many of those packets as the brain does. To model even 1 percent of the human brain could involve wrangling 10 billion packets a second, each of which might need to be sent along to dozens of other chips containing hundreds of processors.

The basic operation of SpiNNaker involves mapping a problem onto the machine—setting up the connectivity graphs in the machine’s routing hardware—and then letting the model run with the spikes flying where and when they may.

Building a digital computer in this way comes with a lot of flexibility advantages. With SpiNNaker, there is effectively no difference between communicating with a nearby processor and one that’s many chips away. We can upload any neural network we’d like, and the exact way that processors are connected should have no bearing on how fast that neural network can be modeled. In a sense, the SpiNNaker machine could be considered a rewirable computer—an enormous version of the field-programmable gate array chip, or FPGA, specialized for neurons. With appropriate tweaking, it should be able to model any part of the brain we choose.

Our full 57 600-chip machine won’t be finished until the end of 2013, but we’ve already made some progress. Since we accepted delivery of the first SpiNNaker test chip in May 2011, we’ve built circuit boards containing four such chips, for a total of 72 processor cores. We’ve mounted this prototype system onto a simple wheeled robot and shown that the robot can perform real-time processing of basic visual information, like following the path of a white line of tape. It’s certainly not a difficult task for a modern computer, but it shows that SpiNNaker chips can be connected to form a real-time neural network and can interact with the world through real-world sensors and actuators. We recently received the first 48-node boards, which will be used to build the upcoming system.

When complete, the full million-processor SpiNNaker machine will occupy 10 or so standard 19-inch racks and consume 50 to 100 kilowatts of power. That’s still about a hundred times as much as a comparable analog model would need, but then again it’s only about a hundredth the power you’d need for an equivalent supercomputer. We also have room to improve. To save money, our processors were built using a decade-old, 130-nanometer chip manufacturing process. If the project produces good results, we could move to a much smaller feature size for our integrated circuits and potentially drop power consumption by a factor of 10.

If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks