Logo IMG


The Invention of the Genetic Code

Brian Hayes

The Diamond Code

The first coding scheme inspired by the Watson-Crick structure came from an unexpected quarter. The author was not a biologist or a chemist but a physicist: George Gamow, the chief proponent of the Big Bang theory in cosmology.

In Gamow's initial proposal, which he called the diamond code, double-stranded DNA acted directly as a template for assembling amino acids into proteins. As Gamow saw it, the various combinations of bases along one of the grooves in the double helix could form distinctively shaped cavities into which the side chains of amino acids might fit. Each cavity would attract a specific amino acid; when all the amino acids were lined up in the correct order along the groove, an enzyme would come along to polymerize them.

Figure 1. George Gamow's diamond codeClick to Enlarge Image

Each of Gamow's cavities was bounded by the bases at the four corners of a diamond. If the DNA helix is oriented vertically, the bases at the top and bottom corners of a diamond are on the same strand and are separated by a single intervening base; the left and right corners of the diamond are defined by that intervening base and by its complementary partner on the opposite strand.

Some years later, Crick wrote: "The importance of Gamow's work was that it was really an abstract theory of coding, and was not cluttered up by a lot of unnecessary chemical details...." Actually, Gamow's description of the diamond code had more chemical clutter than many of the later code proposals, but it was indeed the abstract parts of the scheme that made an impression and had a lasting influence. In particular, Gamow's treatment of the problem of mismatched alphabets is still the starting point for textbook accounts of the genetic code.

The alphabet problem is simply that there are 20 kinds of amino acids in proteins but only four kinds of nucleotide bases in DNA. Hence there cannot be any one-to-one mapping from bases to amino acids. Using two bases to represent each amino acid still comes up short, since there are only 16 doublets of bases. It therefore seems that the basic unit of information in the genetic code can be no smaller than a triplet of bases. But there are 64 triplets—more than three times the number needed. Explaining away this excess became a major preoccupation of coding theorists.

Gamow's diamond code—viewed abstractly, after sweeping away the chemical clutter—turns out to be a triplet code in disguise. Although the diamonds have four corners, the paired bases along the horizontal diagonal are complementary, and so only one of them carries any information; the other is entirely determined by the rules that link A with T and C and G. Thus each code word—or "codon"—consists of three bases lined up along one strand. There are 64 possible codons, but not all of them are distinct. Gamow noted that most amino acid side chains are symmetrical, and he therefore postulated that the diamonds could be flipped end-for-end or flopped side-to-side without changing their meaning. For example, the triplet CAG becomes GAC when it is flipped end-for-end, and both of these codons must specify the same amino acid. Flopping CAG side-to-side changes the middle A into a complementary T, so that CTG and GTC are also members of the same family of equivalent codons. When all such symmetries are taken into account, how many distinct codons remain? Gamow counted them up and found the answer is 20—just the magic number he was looking for.

The diamond code had another important property: It was an overlapping triplet code. Each nucleotide base (except perhaps at the ends of a strand) claimed simultaneous membership in three adjacent codons. For example, the base sequence GATTACA consists of five overlapping triplets: GAT, ATT, TTA, TAC and ACA. At the time, overlapping triplets seemed like a good idea. There was a stereochemical justification: The spacing between amino acids in a protein is similar to the spacing between bases in DNA, so that the two polymers mesh best when their subunits are matched one-to-one. The overlapping code also maximizes the density of information storage: Even though three bases are needed to specify any single amino acid, the overall ratio of bases to amino acids approaches 1:1. Finally, overlapping imposes constraints on the possible sequences of amino acids. Gamow thought the constraints might reveal the nature of the code; as it turned out, they were the downfall of his hypothesis.

comments powered by Disqus


Of Possible Interest

Feature Article: Curious Chemistry Guides Hydrangea Colors

Computing Science: Clarity in Climate Modeling

Feature Article: Candy Crush's Puzzling Mathematics

Subscribe to American Scientist