Imitation of Life
Can a computer program reproduce everything that happens inside a living cell?
Fitting an Elephant
The WholeCell model is based on data collected from 900 publications. Some 1,900 numerical values were extracted from these sources to become parameters of the model. This is an impressive compendium, which anchors the simulation in real data.
However, a slate of 1,900 parameters also raises a red flag. If each parameter represents a control knob that can be turned to adjust the model’s behavior, then by twiddling enough of the knobs, the output could be “fitted” to just about any desired result. When I asked Covert about this, he immediately cited John von Neumann’s quip, “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” But Covert went on to say that the 1,900 WholeCell parameters have not been used for knob-twiddling or trunk-wiggling. Almost all of the values were taken directly from experimental measurements. They constrain the model rather than adapt it to a preconceived outcome.
Yet that’s not quite the end of the story. The data come from many different experiments conducted by different workers over a period of decades. Quite a few parameters come from organisms other than M. genitalium, simply because not enough is known about mycoplasma physiology. Given these disparate sources, it’s not surprising that the measured parameters are not always consistent. For example, an inventory of cell contents (published by Morowitz 50 years ago) suggested that mycoplasmas have only trace amounts of the amino acid cysteine, whereas analysis of the genome showed a notably greater need for cysteine in mycoplasma proteins. Such inconsistencies must be reconciled if the simulation is to succeed.
Covert and his colleagues tackled this problem by formulating a system of constraints, then searching for parameter values that satisfy the constraints while deviating as little as possible from the measured values. Initially they tried formal optimization algorithms, but these methods failed to converge on a feasible solution. They therefore adopted a heuristic approach, starting from the parameters that are deemed most reliable. Some such reconciliation procedure will remain necessary until more complete and accurate biochemical data become available.
In the meantime, the simulations reported in the Cell paper do give physiologically plausible results. The duration of the cell cycle, the rate of growth in biomass and the concentrations of various metabolites are all reasonably close to values measured in real cells. Further support for the model’s robustness comes from a series of “knockout” experiments, in which single genes are deleted from the chromosome. After multiple model runs, a gene is classified as essential if losing it compromises viability. The simulation results agree with in vivo experiments on 79 percent of the genes.
Still another finding extends and explains known results. The mycoplasma cell cycle has an early phase of genome replication, in which the binding of enzymes initiates the process, and a later phase, in which the replication itself proceeds. Each of these phases varies in length, and yet their sum—the length of the overall cycle—shows comparatively little variation. Examination of the internal details of the model revealed the cause of this odd behavior. The nucleotides needed to synthesize the new chromosome are manufactured throughout the cell lifetime. If the early stage of replication is brief, the later stage is slowed by a shortage of nucleotides. If the early stage is prolonged, the stockpile of nucleotides is sufficient to support full-speed replication.