COMPUTING SCIENCE
Ode to the Code
Brian Hayes
Reshuffling the Deck of Codons
As early as 1969, Cynthia Alff-Steinberger of the University of
Geneva began trying to quantify the code's resilience to error by
means of computer simulation. The basic idea was to randomly
generate a series of codes that reshuffle the codon table but retain
certain statistical properties, such as the number of codons
associated with each amino acid. Then the error-resistance of the
codes was evaluated by generating point mutations that caused amino
acid substitutions. A code scored well if the erroneous amino acids
were similar to the original ones. With the computing facilities
available in the 1960s, Alff-Steinberger was able to test only 200
variant codes. She concluded that the natural code tolerates
substitutions better than a typical random code.

A decade later J. Tze-Fei Wong of the University of Toronto
approached the same question from another angle—and reached a
different conclusion. Instead of generating many random codes, he
tried a hand-crafted solution, identifying the best substitution for
each amino acid. Wong found that the substitutions generated by the
natural code are less than half as close, on average, as the best
ones possible. This result was taken as evidence that the code has
not evolved to maximize error tolerance. But Wong did not attempt to
find a complete, self-consistent code would generate all the optimal substitutions.
Returning to studies of random codes, David Haig and Laurence D.
Hurst of the University of Oxford generated 10,000 of them in 1991,
keeping the same blocks of synonymous codons found in the natural
code but permuting the amino acids assigned to them. The result
depended strongly on what criterion was chosen to judge the
similarity of amino acids. Using a measure called polar requirement,
which indicates whether an amino acid is hydrophobic or hydrophilic,
the natural code was a stellar performer, better than all but two of
the 10,000 random permutations. But in other respects the biological
code was only mediocre; 56 percent of the random codes did a better
job of matching the electric charge of substituted amino acids.
Focusing on the encouraging result with polar requirement, Hurst and
Stephen J. Freeland (now at the University of Maryland, Baltimore
County) later repeated the experiment with a sample size of 1
million random codes. Using the same evaluation rule as in the
smaller simulation, they found that 114 of the million codes gave
better substitutions than the natural code when evaluated with
respect to polar requirement. Then they refined the model. In the
earlier work, all mutations and all mistranslations were considered
equally likely, but nature is known to have certain
biases—some errors are more frequent than others. When the
algorithm was adjusted to account for the biases, the natural code
emerged superior to every random permutation with a single
exception. They published their results under the title "The
genetic code is one in a million."
But still there was the question of whether polar requirement is the
right criterion for estimating the similarity of amino acids.
Choosing the one factor that gives the best result and ignoring all
others is not an experimental protocol that will convince skeptics.
This issue was addressed in a further series of experiments by
Freeland and Hurst in collaboration with Robin D. Knight and Laura
F. Landweber of Princeton University. Rather than try to deduce
nature's criteria for comparing amino acids, they inferred it from
data on actual mutations. If two amino acids are often found
occupying the same position in variant copies of the same protein,
then it seems safe to conclude that the amino acids are
physiologically compatible. Conversely, amino acids that are never
found to occupy the same position would not be likely substitutions
in a successful genetic code. There is a circularity to this
formulation: The structure of the genetic code helps determine which
substitutions are seen most often, and then the frequencies of
substitutions serve to rank candidate genetic codes. Freeland and
his colleagues argue that they can break the cycle by choosing an
appropriate subset of the mutation data, including only proteins at
substantial evolutionary distance, which should be separated by many mutations.
Using this bootstrap criterion, Freeland and his colleagues compared
the biological code with another set of a million random variations.
The natural code emerged as the uncontested champion. They wrote of
the biological code: "...it appears at or very close to a
global optimum for error minimization: the best of all possible codes."
» Post Comment