COMPUTING SCIENCE

# Identity Crisis

Same Difference

Real numbers are creatures of mathematics, not computer science.
Although some programming languages offer a data type named
"real," the numbers of this type are quite
*un*real. They are "floating-point" numbers, which
only approximate a continuous distribution. The floating-point
format is much like scientific notation, in which a number is
represented by a mantissa and an exponent. Only a finite number of
bits are reserved for the mantissa and exponent, and so the numbers
are limited in both precision and range.

One big advantage of floating-point arithmetic is that you never have to wait forever. When floating-point values stand in for reals, questions about equality are always answered promptly. On the other hand, the answers may well be wrong.

In your favorite programming language, calculate the square root of 2 and then square the result. I've just tried this experiment with an old programmable calculator, which reports the answer as 1.999999999. Interpreted as an approximation to the real number 1.999..., this result is not an error. It's just as correct as 2.000000000, which is also an approximation. The problem is that the machine itself generally cannot recognize the equivalence of the two alternative answers. Suppose a program includes the conditional statement:

*if* (√2)^{2} = 2

*then* let there be light

*else* annihilate the universe

If this program happens to be running on my HP-41C, we're all in trouble.

There are alternatives to floating-point arithmetic that avoid these hazards. Symbolic-mathematics systems such as Maple and Mathematica get the right answer by eschewing numerical approximations; in effect, they define the square root of 2 as "the quantity that when squared is equal to 2." A few programming languages provide exact rational arithmetic. And there have been various schemes for calculating with approximations to real numbers that grow more digits on demand, potentially without limit. Nevertheless, most numerical computation is done with conventional floating-point numbers, and a whole subdiscipline of numerical analysis has grown up to teach people how to cope with the errors.

Programmers are sometimes advised not to compare floating-point values for exact equality, but rather to introduce a small quantity of fudge. Instead of computing the relation*x*=

*y*, they are told to base a decision on the expression |

*x*–

*y*| is less than ε, where the straight brackets denote the absolute-value operation, and ε is a small number that will cover up the imprecision of floating-point arithmetic. This notion of approximate equality is good enough for many purposes, but it has a high cost. Equality loses one of its most fundamental properties: It is no longer a transitive relation. In the presence of fudge, you can't count on the basic principle that if

*x*=

*y*and

*y*=

*z*, then

*x*=

*z*.

**IN THIS SECTION**

EMAIL TO A FRIEND :

**Of Possible Interest**

**Perspective**: Searching for Great Adventures

**Computing Science**: New Dilemmas for the Prisoner

**Feature Article**: Digital Forensics