COMPUTING SCIENCE

# The Higher Arithmetic

How to count to a zillion without falling off the end of the number line

# What’s the Point?

Most computer arithmetic is done not with bignums or exact rationals but with numbers confined to a fixed allotment of space, such as 32 or 64 bits. The hardware operates on all the bits at once, so arithmetic can be very fast. But an implacable law governs all such fixed-size formats: If a number is represented by 32 bits, then it can take on at most 2^{32} possible values. You may be able to choose *which* 2^{32} values are included, but there’s no way to increase the size of the set.

For 32-bit numbers, one obvious mapping assigns the 2^{32} bit patterns to the integers from 0 through 4,294,967,295 (which is 2^{32}–1). The same range of integers could be shifted along the number line, or the values could be scaled to cover a smaller numeric range in finer increments (perhaps 0.00 up to 42,949,672.95) or spread out over a wider range more sparsely. Arithmetic done in this style is known as “fixed point,” since the position of the decimal point is the same in all numbers of a given class.

Fixed-point arithmetic was once the mainstay of numerical computing, and it still has a place in certain applications, such as high-speed signal processing. But the dominant format now is floating point, where the decimal point (or binary point) can be moved around to represent a wide range of magnitudes. The floating-point format is based on the same idea as scientific notation. Just as we can write a large number succinctly as 6.02×10^{23}, floating-point arithmetic stores a number in two parts: the significand (6.02 in this example) and the exponent (23).

Designing a floating-point format entails a compromise between range and precision. Every bit allocated to the significand doubles its precision; but the bit has to be taken from the exponent, and it therefore reduces the range by half. For 32-bit numbers the prevailing standard dictates a 24-bit significand and an 8-bit exponent; a few stray bits are lost to storing signs and marking special cases, leaving an effective range of 2^{–126} up to 2^{127}. In decimal notation the largest representable number is about 3×10^{38}. Standard 64-bit numbers allocate 53 bits to the significand and 11 to the exponent, allowing a range up to about 10^{308}.

The idea of floating-point arithmetic goes back to the beginning of the computer age, but it was widely adopted only in the 1980s. The key event was the drafting of a standard, approved by the Institute of Electrical and Electronic Engineers (IEEE) in 1985. This effort was led by William Kahan of the University of California, Berkeley, who remains a strong advocate of the technology.

Early critics of the floating-point approach worried about efficiency and complexity. In fixed-point arithmetic, many operations can be reduced to a single machine instruction, but floating-point calculations are more involved. First you have to extract the significands and exponents, then operate on these pieces separately, then do some rounding and adjusting, and finally reassemble the parts.

The answer to these concerns was to implement floating-point algorithms in hardware. Even before the IEEE standard was approved, Intel designed a floating-point coprocessor for early personal computers. Later generations incorporated a floating-point unit on the main processor chip. From the programmer’s point of view, floating-point arithmetic became part of the infrastructure.

**IN THIS SECTION**

EMAIL TO A FRIEND :

# Comments

Unfortunately numerical analysis is a very boring tedious subject and is usually avoided. One DOD application I worked on used trig to find the intersections of annuli on the surface of a sphere. The ...

posted by Monkey Boy

August 12, 2009 @ 5:18 AM

**Of Possible Interest**

**Feature Article**: Twisted Math and Beautiful Geometry

**Feature Article**: Simulating Star Formation on a Galactic Scale

**Perspective**: Searching for Great Adventures