COMPUTING SCIENCE

# The 100-Billion-Body Problem

A full-scale computer simulation of the galaxy we call home must trace the motions of at least 10^11 stars and other objects over several billion years.

Isaac Newton earned his fame by solving a two-body problem: He showed how gravitational attraction binds the Earth and the Sun into elliptical orbits. His formulas worked equally well for the Earth and the Moon or the Earth and an apple, but Newton was stumped when he tried to apply his law of universal gravitation to a *three*-body problem, such as the combined Sun-Earth-Moon system.

By the middle of the 19th century, astronomers and mathematicians had devised tools capable of tackling an eight-body problem—calculating the orbital motions of the Sun and the seven known planets. Unlike Newton’s solution of the two-body problem, these methods gave only approximate results. Nevertheless, they were accurate enough to reveal a tiny discrepancy between theory and observation, which led to the discovery of an additional planet, Neptune.

The challenge now is to solve a 100-billion-body problem. Computational astronomers are preparing to simulate the motions of all the stars in a galaxy the size and shape of the Milky Way, tracing the stars’ trajectories over a period of several billion years. The team working on this computational extravaganza—a Dutch-Japanese collaboration led by Simon Portegies Zwart of the Leiden Observatory—has already completed a pilot study of 51 billion stars.

In all of these “*N*-body” models, the individual particles obey very simple laws of motion, yet a large ensemble of particles can dance a surprisingly intricate and lively ballet. Gravitation organizes astronomical bodies into a hierarchy of structures: planetary systems, clusters of stars, galaxies, lacy webs of galaxy clusters that stretch across the entire visible universe. *N*-body studies may help illuminate the mechanisms that create these forms. In the case of the Milky Way, the 51-billion-star simulation shows a featureless disk evolving into a majestic pinwheel with a central bar and several bright spiral arms. Still larger simulations should reveal finer details, just in time for comparison with new satellite observations expected in the next few years.

# A Few Bodies

Imagine a clump of stars in an empty corner of the universe, isolated from everything else. You know the mass of each star, along with its position and velocity at some initial time *t*=0. Can you predict the future positions and velocities of all the stars? This is the essence of an *N*-body problem.

For simplicity, each star is assumed to have all of its mass concentrated at a single point. Of course real stars are not dimensionless points, but they are tiny compared to the typical distances between them, so the approximation is reasonable. The stars interact through mutual gravitational attraction (and no other forces). According to Newton’s law, stars *i* and *j* attract each other with a force equal to *Gm*_{i}*m*_{j}/*r*^{2}, where *m*_{i} and *m*_{j} are the stellar masses, *r* is the distance separating them, and *G* is the gravitational constant.

If there are just two stars, this problem has a “closed-form” solution: a set of equations that define the bodies’ motion for all time. Plug in any value of *t* and the equations tell you where the stars are and where they’re going.

Unfortunately, this elegant analysis works only for *N*=2. Adding a third body changes the character of the problem entirely; there are no exact closed-form solutions except in a few special configurations. Newton remarked that struggling to understand the three-body dynamics of the Sun, the Earth, and the Moon was the only problem that ever gave him a headache.

Newton’s successors in the 18th and 19th centuries found ways to cope with few-body problems, successfully predicting the motions of planets, moons, comets, and other solar system objects. The basic idea was to start with a two-body solution, then add in the effects of other bodies as small perturbations that slightly alter the orbit. The result of this process is again a set of equations that give positions and velocities as functions of *t*, but the equations are more complicated, and they provide only approximate answers.

# Many Bodies

The perturbation methods developed for solar-system astronomy become unwieldy when the number of objects grows beyond a dozen or so. Clusters with millions of stars, and galaxies with billions, require a different strategy. The preferred approach today is step-by-step numerical simulation of the particle trajectories. Mathematically, this scheme is appealingly simple, replacing much algebra and calculus with mere arithmetic. On the other hand, the amount of arithmetic needed is prodigious—enough to strain the capacity of the largest supercomputers.

A step-by-step algorithm moves the particles in discrete jumps from one configuration to the next, as if in a sequence of movie frames. Each frame depends on the previous one, so they have to be computed in order. If you want to know what the system looks like after a million time steps, you need to work your way through all the intermediate states.

Here is a three-part recipe for computing a single step in the motion of a star inside a cluster. First measure the force *F* attracting the star to each of the other stars, using the gravitational equation *F*=*Gmm*/*r*^{2}. Summing up all these attractions—taking account of both their magnitudes and directions—yields the net force acting on the star.

Once you know the net force, you can calculate the star’s acceleration, or change in velocity, using another Newtonian equation, *F=ma*. Then, knowing the acceleration, you can update the star’s velocity.

Finally, the updated velocity allows you to calculate the star’s trajectory over some interval Δ*t*, and thereby establish its new position at time *t*+Δ*t*. The easiest algorithm for this “particle pushing” process assumes the velocity calculated at time *t* remains constant throughout the interval Δ*t*, so that the star moves in a straight line at constant speed. This crude extrapolation gives acceptable results only if Δ*t* is very brief. More sophisticated integration methods allow longer time steps at the cost of greater complexity.

At every time step, a new position and velocity must be calculated not just for one star but for all *N* stars in the system. This is the major computational bottleneck in *N*-body studies. Because each star responds to *N*–1 other stars, there are *N*(*N*–1)/2 pairs of stars for which the mutual attraction must be worked out. In the notation of computer science, the algorithm is said to have *O*(*N*^{2}) complexity; as *N* increases, the amount of work rises approximately as *N*^{2}. For a million stars, every time step requires roughly 10^{12} force computations.

The number of time steps can also present a computational challenge. Each step should be brief enough that the distance between pairs of stars does not change by more than a few percent. For stars that are many light-years apart, time steps of several thousand years give acceptable results, but close encounters and tightly bound binary stars can require steps of days or weeks—too frequent for a simulation that extends over billions of years.

# The Center Cannot Hold

It’s not entirely obvious how gravitation gives rise to elaborate, long-lived structures such as planetary systems and galaxies. Other complex forms in nature, such as molecules and crystals, depend on a balance of attractive and repulsive forces, but gravitation is purely attractive. The explanation is that galaxies and other gravitating systems are dynamic, made up of objects in motion. Instead of an equilibrium between attraction and repulsion, there is a continual interchange of kinetic and potential energy.

Suppose several thousand stars are scattered at random within some finite spherical volume, with all of the stars initially at rest. What happens next? A century ago the physicist Arthur Eddington wrote that stars in such clusters probably cannot “escape the fate of ultimately condensing into one confused mass.” *N*-body simulations show otherwise. There *is* an initial infall, but the stars do not all converge on their common center of mass. Instead, their paths are deflected slightly by interactions with near neighbors, so that collisions are avoided. Most of the stars enter elongated elliptical orbits, shuttling back and forth between the core and the periphery of the cluster. Furthermore, some stars *do* escape the grip of mutual attraction: Close encounters can whip a star out of the cluster at high velocity, never to return. The stars left behind give up some of their kinetic energy in this transaction, but the cluster as a whole also loses mass, so that the stars are less tightly bound together. The ultimate fate of the cluster may be the very opposite of condensation: evaporation, as all of the stars disperse.

The Milky Way has at least 150 globular clusters. The prevalence of close encounters and binary stars in these regions makes *N*-body simulation particularly difficult. Early attempts were limited to fewer than 100 stars. By 2010, a star-by-star simulation followed the evolution of a specific globular cluster over a lifetime of 11 billion years. That cluster, Palomar 14, is one of the smaller ones orbiting the Milky Way, with fewer than 5,000 stars. The bigger clusters have a million stars or more.

Whole galaxies, of course, are far larger still; the Milky Way has roughly 100 billion stars. Furthermore, stars are not the only components of a galaxy. Some 90 percent of the mass takes the form of “dark matter,” whose exact composition remains unknown. Thus a fully realistic galaxy simulation, including both the visible and the dark matter, should have substantially more than 100 billion particles. Tracing the paths of such an immense swarm of moving and interacting objects is a daunting prospect.

# From GRAPE to GPU

One way to speed up *N*-body simulations is simply to build faster computers. Early experiments relied on mainframes, workstations, and monolithic supercomputers, but the field was transformed in the 1990s by the arrival of a relatively small and inexpensive device known as GRAPE, specially designed for *N*-body simulations. Developed by Junichiro Makino and his colleagues at the University of Tokyo, GRAPE operates as a coprocessor attached to a conventional computer. The coprocessor handles all the pairwise force calculations—the part of the *N*-body algorithm with *O*(*N*^{2}) complexity—while the host computer completes the star-pushing part of the simulation.

“GRAPE” stands for Gravity Pipe. Circuits performing various aspects of the force calculation are arranged in sequence, with data flowing through the pipeline from one unit to the next. The final version of the GRAPE system, introduced in 2002, is built around a semiconductor chip with six such pipelines, each of which can compute 90 million particle-particle forces per second. As many as 2,048 chips can be combined in a single system. By some measures, GRAPE devices were the world’s fastest computers for a few years (though only for this one problem).

Today GRAPE has been surpassed by another kind of coprocessor, the graphics processing unit, or GPU. These devices evolved from hardware designed for video games, where the main requirement is to project a three-dimensional scene onto a display composed of several million pixels, refreshing the image quickly enough to create the illusion of continuous motion. The task can be decomposed into multiple “threads” that run almost independently. A GPU performs the computation with an array of many small processors, or “cores,” tuned for high-speed arithmetic. The individual cores are simpler than the central processing unit (CPU) of a typical computer, but with thousands of cores in a single GPU chip, their aggregate power is greater.

GPUs have spread far beyond video games. They are well adapted to many kinds of scientific computing, and they provide most of the computational horsepower in some of the newest supercomputers. Although GPUs are not explicitly designed for *N*-body problems, as the GRAPE is, they have economic and technological advantages: A market for millions of GPU chips provides funds for development and economies of scale in manufacturing.

The first experiments with *N*-body simulations on GPUs began more than 10 years ago. By 2009, Evghenii Gaburov, Stefan Harfst, and Portegies Zwart had created emulation software that allows programs written for the GRAPE to run on a GPU. The more recent Milky Way simulations by Portegies Zwart and his colleagues were designed from the outset to make the best possible use of supercomputers with thousands of GPUs.

# Stars in a Box

Solving a 100-billion-body problem calls for algorithmic ingenuity as well as computational muscle. With 10^{11} stars, an *O*(*N*^{2}) algorithm would make 10^{22} pairwise force calculations on each time step—a task that would keep a billion GRAPE machines busy. Somehow, the number of operations must be drastically reduced. There are several ways to approach this task, but they all involve assembling stars into clusters, then clusters of clusters, and so on.

The Sun’s immediate neighborhood in the Milky Way has a dozen stars within a radius of 10 light-years. An accurate description of their dynamics needs to consider all (12×11)/2=66 pairwise interactions. For more distant stars, however, the all-pairs requirement can be relaxed. Imagine another group of a dozen stars, 10,000 light-years away. The Sun feels the gravitational influence of those far-off stars, but they all pull in very nearly the same direction. Treating them as a single body with the combined mass of all 12 stars introduces only a tiny imprecision.

One way to organize a hierarchical *N*-body calculation is to recursively divide the space into subvolumes. Begin with a cube of edge length *L*, large enough to encompass the entire galaxy. Partition this box into eight subcubes, each with edge length *L*/2. Now each of these subcubes becomes a candidate for further partitioning into eight boxes of side length *L*/4. Within each box the process continues until the number of stars in the box is no greater than some threshold. This partitioning scheme, introduced in the 1980s by Josh Barnes and Piet Hut, is called a tree code. The tree is the branching structure formed by the hierarchy of boxes, with the single largest box at the root of the tree and the smallest boxes at the leaves.

In the tree code calculation, a star has direct, one-on-one interactions with nearby stars, but some longer-range interactions are between the star and an entire box of stars. The technique reduces the complexity of the computation from *O*(*N*^{2}) to *O*(*N* log *N*), a huge improvement. For 10^{11} stars, the number of force evaluations per time step falls from 10^{22 }to as few as 10^{13}. The trade-off between speed and accuracy can be fine-tuned by adjusting the crossover point between individual and aggregated force computations.

To observe the long-term evolution of a simulated galaxy, time steps must be reasonably long. Galactic motions tend to be stately and slow, allowing time steps of thousands of years, but occasional close encounters cause rapid changes in a star’s speed and direction. One way to suppress such events is to “soften” the gravitational attraction. Whereas the real force is proportional to 1/*r*^{2} (and therefore tends to infinity as *r* goes to 0), the softened force is proportional to 1/(*r*+ε)^{2}, where ε is a constant with dimensions of distance. The softening of the force law eliminates the troublesome divergence at *r*=0 yet has almost no effect when the stars are far apart.

# The Milky Way on a GPU

The Milky Way simulation by Portegies Zwart and his colleagues is based on tree-code software they call Bonsai. In addition to Portegies Zwart, the primary authors are Jeroen Bédorf and Gaburov, who were Ph.D. candidates at Leiden when the project began. Bédorf is now at the Centrum Wiskunde and Informatica in Amsterdam; Gaburov is at an Amsterdam research institute called SURFsara. Also participating in the Milky Way simulation were Michiko S. Fujii of the National Astronomical Observatory of Japan, Keigo Nitadori of the Japanese institute RIKEN, and Tomoaki Ishiyama of the University of Tsukuba in Japan.

In creating the Bonsai software, the first major challenge was to perform the force calculation entirely on the GPU; if parts of the procedure are done on the host CPU, most of the GPU speed advantage is dissipated in moving data back and forth. The second challenge was getting the software to run efficiently when the computation is distributed across multiple GPUs; here the threat to performance is slow communication between separate GPU modules. Such communication cannot be eliminated entirely because all stars interact with all others, but delays were minimized by carefully distributing the stars among the modules and by having the CPU handle communication while the GPUs were kept busy with other tasks.

The galactic simulation was carried out at the Swiss National Supercomputing Centre in Lugano, using a computer called Piz Daint (named for an Alpine peak). The computation ran on 4,096 nodes of this machine. Each node has an Intel CPU and a GPU called the NVIDIA Tesla K20X, which has 2,688 cores; thus the total count of GPU cores was more than 11 million. With these facilities, the Bonsai program was able to simulate a galaxy with 51 billion particles—a thousand times more than earlier studies, though still short of the star-for-star goal. The galaxy’s evolution was followed over six billion years, with time steps of 75,000 years.

The Bonsai program also had a brief test run on a larger machine, Titan, at Oak Ridge National Laboratory in Tennessee. The aim of this exercise was to measure performance when the simulation was scaled up to 242 billion particles, running on 18,600 GPU modules and almost 50 million GPU cores. The measurements showed that a complete eight-billion-year simulation would take about a week of runtime on Titan. That allocation of computer time has not yet been granted. On the other hand, the Bonsai project was one of five finalists for the 2014 Gordon Bell Prize, which recognizes progress in high-performance parallel computing.

# A Galactic Selfie

Trying to get a clear view of our home galaxy is like trying to see one’s own face. We can’t reach out 100,000 light-years and take a selfie. Only in the past decade have astronomers been able to ascertain even the basic architecture of the Milky Way. It turns out we live in a barred spiral. A thick shaft of luminous matter protrudes from the galaxy’s central bulge, with loosely wrapped streamers of stars trailing from the ends of the bar.

In a year or two, a more precise portrait will begin to emerge from data gathered by the *Gaia* spacecraft, which was launched into solar orbit by the European Space Agency in 2013. *Gaia* is mapping a billion stars—a 1 percent sample of the Milky Way—measuring their positions in three-dimensional space and recording velocities for a subset of 150 million objects.

*N*-body simulations provide a theoretical complement to this observational program. The *Gaia* catalog will reveal the spatial structure of the galactic disk in unprecedented detail; computational models can suggest how the observed features arose and what physical mechanisms maintain them. The Bonsai simulation begins with particles pre-allocated to the central bulge, the disk, and the dark-matter halo, but the bar and the spiral arms are not part of the initial conditions. Those structures evolve spontaneously from the gravitational dynamics. Interestingly, it takes more than three billion years for them to appear.

Do we really need 100 billion stars to simulate a galaxy? Portegies Zwart points out that spatial resolution is a crucial issue in these models. A coarse-grained view of the universe, like a blurry photograph, obscures identifying features. Meaningful comparisons with the Gaia data will require models with resolution of a few light-years, distinguishing individual stars.

Forty years ago the physicist Philip Anderson remarked, “More is different.” He was talking about condensed-matter physics, where a small cluster of atoms exhibits properties quite unlike those of a macroscopic lump of matter. Perhaps the same principle applies in astrophysics as well. The two-body problem is qualitatively different from the few-body problem, and it seems the 100-billion-body problem falls in yet another category. For the first time, computer technology can take us across the more-is-different threshold.

© Brian Hayes

# Bibliography

- Aarseth, Sverre J. 2003.
*Gravitational*N*-Body Simulations: Tools and Algorithms*. Cambridge: Cambridge University Press. - Barnes, Josh, and Piet Hut. 1986. A hierarchical
*O(N*log*N)*force calculation algorithm.*Nature*324:446–449. - Bédorf, Jeroen, Evghenii Gaburov, and Simon Portegies Zwart. 2012. A sparse octree gravitational
*N*-body code that runs entirely on the GPU processor.*Journal of Computational Physics*231:2825–2839. - Bédorf, Jeroen, Evghenii Gaburov, Michiko S. Fujii, Keigo Nitadori, Tomoaki Ishiyama, and Simon Portegies Zwart. 2014. 24.77 pflops on a gravitational tree-code to simulate the Milky Way galaxy with 18600 GPUs. In
*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis*, pp. 54–65. Piscataway, NJ: IEEE Press. - Heggie, Douglas, and Piet Hut. 2003.
*The Gravitational Million-Body Problem: A Multidisciplinary Approach to Star Cluster Dynamics*. New York: Cambridge University Press.

EMAIL TO A FRIEND :