COMPUTING SCIENCE
Identity Crisis
Brian Hayes
Some Are Less Equal Than Others
Numbers aren't the only things that can be equal or unequal. Most
programming languages also have equality operators for other simple
data objects, such as alphabetic characters; thus a = a but a ≠
b. (Whether a = A is a matter up for debate.)
Sequences of characters (usually called strings) are also easy to
compare. Two strings are equal if they consist of the same
characters in the same sequence, which implies the strings also have
the same length. Hence an equality operator for strings simply
marches through the two strings in parallel, matching up the
characters one by one. Certain other data structures, such as
arrays, are handled in much the same way.
But one important kind of data structure can be problematic. The
most flexible way of organizing data elements is with links, or
pointers, from one item to another. For example, the symbols
a, b and c might be linked into the list
a → b → c → nil,
where nil is a special value that marks the end of a chain
of pointers. Comparing two such structures for equality is
straightforward: Just trace the two chains of pointers, and if both
reach nil at the same time without having encountered any
discrepancies along the way, they are identical.
The pointer-following algorithm works well enough in most cases, but
consider this structure:


An algorithm that attempts to trace the chain of pointers until
reaching nil will never terminate, and so structural
equality will never be decided. This problem can be solved—the
workaround is to lay down a trail of breadcrumbs as you go, and stop
following the pointers as soon as you recognize a site you've
already visited—but the technique is messy.
There's something else inside the computer that's remarkably hard to
test for equality: programs. Even in the simplest cases, where the
program is the computational equivalent of a mathematical function,
proving equality is a challenge. A function is a program that
accepts inputs (called the arguments of the function) and computes a
value, but does nothing else to alter the state of the computer. The
value returned by the function depends only on the arguments, so
that if you apply the function to the same arguments repeatedly, it
always returns the same value. For example, f(x) =
x 2 is a function of the single argument
x, and its returned value is the square of x.
A given function could be written as a computer program in many
different ways. At the most trivial level, f(x) =
x 2 might be replaced by f(y) =
y 2, where the only change is to the name of the
variable. Another alternative might be f(x) = x ?
x, or perhaps f(x) = exp(2
log(x)). It seems reasonable to say that two such functions
are identical if they return the same value when applied to the same
argument. But if that criterion were to serve as a test of function
equality, you would have to test all possible arguments within the
domain of the function. Even when the domain is not infinite, it is
often inconveniently large. The alternative to such an
"extensional" test of equality is an
"intensional" test, which tries to prove that the texts of
the two programs have the same meaning. Fabricating such proofs is
not impossible—optimizing compilers do it all the time when
they substitute a faster sequence of machine instructions for a
slower one—but it is hardly a straightforward task.
When you go beyond programs that model mathematical functions to those
that can modify the state of the machine, proving the equality of
programs is not just hard but undecidable. That is, there is no
algorithm that will always yield the right answer when asked whether two
arbitrary programs are equivalent. (For a thorough discussion of program
equivalence, see Richard Bird's book
Programs and Machines
.)
» Post Comment