Logo IMG


Ode to the Code

Brian Hayes

Reshuffling the Deck of Codons

As early as 1969, Cynthia Alff-Steinberger of the University of Geneva began trying to quantify the code's resilience to error by means of computer simulation. The basic idea was to randomly generate a series of codes that reshuffle the codon table but retain certain statistical properties, such as the number of codons associated with each amino acid. Then the error-resistance of the codes was evaluated by generating point mutations that caused amino acid substitutions. A code scored well if the erroneous amino acids were similar to the original ones. With the computing facilities available in the 1960s, Alff-Steinberger was able to test only 200 variant codes. She concluded that the natural code tolerates substitutions better than a typical random code.

Click to Enlarge Image

A decade later J. Tze-Fei Wong of the University of Toronto approached the same question from another angle—and reached a different conclusion. Instead of generating many random codes, he tried a hand-crafted solution, identifying the best substitution for each amino acid. Wong found that the substitutions generated by the natural code are less than half as close, on average, as the best ones possible. This result was taken as evidence that the code has not evolved to maximize error tolerance. But Wong did not attempt to find a complete, self-consistent code would generate all the optimal substitutions.

Returning to studies of random codes, David Haig and Laurence D. Hurst of the University of Oxford generated 10,000 of them in 1991, keeping the same blocks of synonymous codons found in the natural code but permuting the amino acids assigned to them. The result depended strongly on what criterion was chosen to judge the similarity of amino acids. Using a measure called polar requirement, which indicates whether an amino acid is hydrophobic or hydrophilic, the natural code was a stellar performer, better than all but two of the 10,000 random permutations. But in other respects the biological code was only mediocre; 56 percent of the random codes did a better job of matching the electric charge of substituted amino acids.

Focusing on the encouraging result with polar requirement, Hurst and Stephen J. Freeland (now at the University of Maryland, Baltimore County) later repeated the experiment with a sample size of 1 million random codes. Using the same evaluation rule as in the smaller simulation, they found that 114 of the million codes gave better substitutions than the natural code when evaluated with respect to polar requirement. Then they refined the model. In the earlier work, all mutations and all mistranslations were considered equally likely, but nature is known to have certain biases—some errors are more frequent than others. When the algorithm was adjusted to account for the biases, the natural code emerged superior to every random permutation with a single exception. They published their results under the title "The genetic code is one in a million."

But still there was the question of whether polar requirement is the right criterion for estimating the similarity of amino acids. Choosing the one factor that gives the best result and ignoring all others is not an experimental protocol that will convince skeptics. This issue was addressed in a further series of experiments by Freeland and Hurst in collaboration with Robin D. Knight and Laura F. Landweber of Princeton University. Rather than try to deduce nature's criteria for comparing amino acids, they inferred it from data on actual mutations. If two amino acids are often found occupying the same position in variant copies of the same protein, then it seems safe to conclude that the amino acids are physiologically compatible. Conversely, amino acids that are never found to occupy the same position would not be likely substitutions in a successful genetic code. There is a circularity to this formulation: The structure of the genetic code helps determine which substitutions are seen most often, and then the frequencies of substitutions serve to rank candidate genetic codes. Freeland and his colleagues argue that they can break the cycle by choosing an appropriate subset of the mutation data, including only proteins at substantial evolutionary distance, which should be separated by many mutations.

Using this bootstrap criterion, Freeland and his colleagues compared the biological code with another set of a million random variations. The natural code emerged as the uncontested champion. They wrote of the biological code: " appears at or very close to a global optimum for error minimization: the best of all possible codes."

comments powered by Disqus


Subscribe to American Scientist