COMPUTING SCIENCE

# Third Base

# Cheaper by the Threesome

Under the skin, numbering systems are all alike. Numerals in various bases may well look different, but the numbers they represent are the same. In decimal notation, the numeral 19 is shorthand for this expression:

1 x 10^{1}+ 9 x 10^{0}.

Likewise the binary numeral 10011 is understood to mean:

1 x 2^{4}+ 0 x 2^{3}+ 0 x 2^{2}+ 1 x 2^{1}+ 1 x 2^{0},

which adds up to the same value. So does the ternary version, 201:

2 x 3^{2}+ 0 x 3^{1}+ 1 x 3^{0}.

The general formula for a numeral in any positional notation goes something like this:

... d_{3}r^{3}+d_{s}r^{s}+d_{1}r^{1}+d_{0}r^{0}....

Here *r* is the base, or *radix*, and the coefficients *d*_{i} are the digits of the number. Usually, *r* is a positive integer and the digits are integers in the range from 0 to *r*–1, but neither of these restrictions is strictly necessary. (You can build perfectly good numbers on a negative or an irrational base, and below we'll meet numbers with negative digits.)

To say that all bases represent the same numbers, however, is not to say that all numeric representations are equally good for all purposes. Base 10 is famously well suited to those of us who count on our fingers. Base 2 dominates computing technology because binary devices are simple and reliable, with just two stable states—on or off, full or empty. Computer circuitry also exploits a coincidence between binary arithmetic and binary logic: The same signal can represent either a numeric value (1 or 0) or a logical value (*true* or *false*).

The cultural preference for base 10 and the engineering advantages of base 2 have nothing to do with any intrinsic properties of the decimal and binary numbering systems. Base 3, on the other hand, does have a genuine mathematical distinction in its favor. By one plausible measure, it is the most efficient of all integer bases; it offers the most economical way of representing numbers.

How do you measure the cost of a numeric representation? If you simply count digits, then the biggest base will always win; for example, base 1,000,000 can represent any number between 0 and decimal 999,999 in a single digit. The trouble is, that single digit can be any of a million different symbols, all of which you must somehow recognize. At the opposite pole are unary, or base-1, numbers. The unary representation of decimal 1,000,000 needs only one type of symbol, but that symbol is repeated a million times. (Unary notation is in a category apart from other bases—it's not really a positional number system—but in the present context it serves as a useful limiting case.)

Among all possible ways of writing the numbers up to a million, neither base 1,000,000 nor base 1 seems ideal; as a matter of fact, you could hardly do worse than either of these choices. Minimizing the number of digits causes an explosion in the alphabet of symbols, and vice versa; when you squish down one factor, the other squirts out. Evidently we need to optimize some joint measure of a number's width (how many digits it has) and its depth (how many different symbols can occupy each digit position). An obvious strategy is to minimize the product of these two quantities. In other words, if *r* is the radix and *w* is the width in digits, we want to minimize *rw* while holding *r ^{w}* constant.

Curiously, this problem is easier to solve if *r* and *w* are treated as continuous rather than integer variables—that is, if we allow a fractional base and a fractional number of digits. Then it turns out (*see Figure 1*) that the optimum radix is *e*, the base of the natural logarithms, with a numerical value of about 2.718. Because 3 is the integer closest to *e*, it is almost always the most economical integer radix *(see Figure 2)*.

Consider again the task of representing all numbers from 0 through decimal 999,999. In base 10 this obviously requires a width of six digits, so that *rw* = 60. Binary does better: 20 binary digits suffice to cover the same range of numbers, for *rw* = 40. But ternary is better still: The ternary representation has a width of 13 digits, so that *rw* = 39. (If base *e* were a practical choice, the width would be 14 digits, yielding *rw*=38.056.)

EMAIL TO A FRIEND :

**Of Possible Interest**

**Computing Science**: Belles lettres Meets Big Data

**Technologue**: Quantum Randomness

**Technologue**: The Quest for Randomness