COMPUTING SCIENCE
Identity Crisis
Brian Hayes
Same Difference
Real numbers are creatures of mathematics, not computer science.
Although some programming languages offer a data type named
"real," the numbers of this type are quite
unreal. They are "floating-point" numbers, which
only approximate a continuous distribution. The floating-point
format is much like scientific notation, in which a number is
represented by a mantissa and an exponent. Only a finite number of
bits are reserved for the mantissa and exponent, and so the numbers
are limited in both precision and range.
One big advantage of floating-point arithmetic is that you never
have to wait forever. When floating-point values stand in for reals,
questions about equality are always answered promptly. On the other
hand, the answers may well be wrong.
In your favorite programming language, calculate the square root of
2 and then square the result. I've just tried this experiment with
an old programmable calculator, which reports the answer as
1.999999999. Interpreted as an approximation to the real number
1.999..., this result is not an error. It's just as correct as
2.000000000, which is also an approximation. The problem is that the
machine itself generally cannot recognize the equivalence of the two
alternative answers. Suppose a program includes the conditional statement:
if (√2)2 = 2
then let there be light
else annihilate the universe
If this program happens to be running on my HP-41C, we're all in trouble.
There are alternatives to floating-point arithmetic that avoid these
hazards. Symbolic-mathematics systems such as Maple and Mathematica
get the right answer by eschewing numerical approximations; in
effect, they define the square root of 2 as "the quantity that
when squared is equal to 2." A few programming languages
provide exact rational arithmetic. And there have been various
schemes for calculating with approximations to real numbers that
grow more digits on demand, potentially without limit. Nevertheless,
most numerical computation is done with conventional floating-point
numbers, and a whole subdiscipline of numerical analysis has grown
up to teach people how to cope with the errors.
Programmers are sometimes advised not to compare floating-point values
for exact equality, but rather to introduce a small quantity of fudge.
Instead of computing the relation
x
=
y
, they are told to base a decision on the expression |
x
–
y
| is less than ε, where the straight brackets denote the
absolute-value operation, and ε is a small number that will
cover up the imprecision of floating-point arithmetic. This notion of
approximate equality is good enough for many purposes, but it has a high
cost. Equality loses one of its most fundamental properties: It is no
longer a transitive relation. In the presence of fudge, you can't count
on the basic principle that if
x
=
y
and
y
=
z
, then
x
=
z
.
» Post Comment