Imitation of Life
Can a computer program reproduce everything that happens inside a living cell?
Modes of Modeling
Building a computer model calls for a multitude of choices and compromises in finding the appropriate level of detail. Take the case of carbohydrate metabolism, in which sugars such as glucose break down to yield water and carbon dioxide. At the most abstract level, this process becomes a single chemical equation:
C6H12O6 + 6O2 → 6CO2 + 6H2O,
which doesn’t reveal much about what’s actually happening inside the cell. A closer look would add dozens of intermediate steps. For example, the six-carbon glucose molecule is first split into two three-carbon pyruvate molecules, liberating energy that can be captured in the phosphate bonds of adenosine triphosphate (ATP).
Adding still more detail leads to a vast web of chemical reactions, as in the famous Metabolic Pathways poster devised by the late Donald E. Nicholson. And one needn’t stop there. In principle a simulation could follow every individual molecule—or every atom, for that matter—as it passes through the cellular machinery. The Goldilocks strategy seeks a middle path between bland abstraction and pointless verisimilitude.
The authors of the WholeCell project chose to implement different parts of their model with different levels of detail. Certain key macromolecules are represented as distinct and identifiable entities. Smaller molecules are treated as aggregated quantities; the program keeps track of their numbers but not of their identity as individuals.
The distinction between these two modes of representation can be seen clearly in the sector of the model dealing with protein synthesis. Ribosomes, the large organelles where proteins are assembled, are represented as individuals; each ribosome has its own identity and history. Within the computer program, a separate block of memory is allocated to each ribosome. But the program has no representation for individual molecules of amino acids, the subunits that are linked together to form a protein. Instead the model merely keeps track of the quantity of each type of amino acid. There’s a variable for counting all the alanine molecules, another variable for the lysines, and so on.
The WholeCell model is divided into 28 process modules, which correspond to major cellular activities such as replication of the genome, synthesis of protein and repair of damaged DNA. In addition, 16 data structures called state variables record the current status of various subsystems at every instant. The program begins by initializing the state variables to values appropriate to a “newborn” cell, just after cell division. Next, all 28 of the process modules are run for one second of simulated time. At the end of this interval the state variables are updated with the results of the calculations, and then the cycle repeats. The simulation continues until the cell completes its growth and divides. For M. genitalium this generation time is typically nine hours, or roughly 32,000 repetitions of the simulation loop. Running time for the program is about the same as the generation time.
The program is written in MATLAB. Source code is available on the project web page at http://wholecell.stanford.edu, along with a knowledge base of quantitative information that went into building the model.
Together with Covert, the principal authors of the software are Jonathan R. Karr and Jayodita C. Sanghvi, who are both graduate students in Covert’s group. The model is described in a report published last July in Cell; the authors, in addition to Karr, Sanghvi and Covert, are Derek N. Macklin, Miriam V. Gutschow, Jared M. Jacobs and Benjamin Bolival Jr. of Stanford and Nacyra Assad-Garcia and John I. Glass of the Venter Institute.