The Invention of the Genetic Code
The RNA Tie Club
A physicist popping up to tell biologists how to solve their problems can't always count on a warm reception. Gamow was welcomed, though, perhaps in part because biology labs in those days were full of carpetbagging physicists. (Crick himself began his career with a physics degree.) Or maybe Gamow just charmed his way in; by all accounts he was an exceptionally amiable fellow. In any case he was soon spending a summer at the Marine Biological Laboratory and collaborating with distinguished molecular biologists. He also founded the RNA Tie Club, limited to 20 regular members (one for each amino acid) and four honorary members (one for each nucleotide base). The ties were wool, with an embroidered green-and-yellow helix. Such an organization might not prosper today—who wears neckties?—but at the time it had an important role in circulating ideas.
The respect accorded to Gamow largely took the form of careful criticism. Attention focused particularly on his overlapping triplets. In any code where the ratio of bases to amino acids is 1:1, there are only 4N nucleotide sequences of length N, but there are 20N amino acid sequences. It follows that many of the amino acid sequences cannot be encoded by any base sequence. This effect can be seen even in an amino acid sequence of length 2 (called a dipeptide). With 20 kinds of amino acids, there are 202 = 400 possible dipeptides, but two overlapping triplet codons comprise only four bases, so that there are only 44 = 256 combinations. Evidently some 144 dipeptides cannot appear in proteins encoded by an overlapping code.
Even with the sparse protein sequence data available in the mid-1950s, Crick was able to show that the diamond code was ruled out by the experimental evidence. There were known patterns of amino acid repetitions that the diamond code could not produce.
Undaunted, Gamow proposed a "triangle code" that was also overlapping but had different constraints. In this code too the 64 possible triplet codons sorted themselves into 20 families. Later Gamow suggested yet another overlapping-triplet code with an even simpler description: Each codon is defined entirely by its base composition, ignoring the order of the bases within the codon. Thus ACT, ATC, CAT, CTA, TAC and TCA are all members of the same codon family and specify the same amino acid. Remarkably, the number of codon families in this scheme again turns out to be exactly 20. (It is just the number of combinations of four things taken three at a time.)
Still more overlapping codes came from Gamow and his friends. Richard Feynman had a hand in working out one idea. Edward Teller proposed another—a fairly funky scheme in which each amino acid is specified by two bases in the DNA and by the previous amino acid.
But overlapping codes were coming to the end of their string. Patterns of mutations were one source of doubt. With an overlapping code, changing a single base in the DNA could alter three neighboring amino acids, but protein sequence data were starting to show instances of single amino acid replacements. Then came a definitive proof. Sydney Brenner analyzed all the known protein sequence fragments and found enough nearest-neighbor correlations to rule out every possible overlapping code.
In retrospect, the long fixation on overlapping codons seems unfortunate and misguided, but there were strong arguments favoring such schemes. Matching the dimensions of the protein to those of the template seemed important. So did coding efficiency. Natural selection was expected to maximize storage density and avoid any waste of information capacity. Engineers building the computers of the era certainly worked hard to pack in the bits, so why wouldn't nature do the same? No one could have guessed the awful truth—that nature is wildly profligate, that genomes are stuffed with gobs of "junk DNA," that storage efficiency just doesn't seem to be an issue except in a few ultracompact viruses.
Still another reason for favoring overlaps was to avoid the frame-shift problem. To understand the nature of this problem, it's best to turn to a very different kind of proposed code-one that I would like to nominate as the prettiest wrong idea in all of 20th-century science.