Logo IMG
HOME > PAST ISSUE > Article Detail


Identity Crisis

Brian Hayes

Some Are Less Equal Than Others

Numbers aren't the only things that can be equal or unequal. Most programming languages also have equality operators for other simple data objects, such as alphabetic characters; thus a = a but a ≠ b. (Whether a = A is a matter up for debate.)

Sequences of characters (usually called strings) are also easy to compare. Two strings are equal if they consist of the same characters in the same sequence, which implies the strings also have the same length. Hence an equality operator for strings simply marches through the two strings in parallel, matching up the characters one by one. Certain other data structures, such as arrays, are handled in much the same way.

But one important kind of data structure can be problematic. The most flexible way of organizing data elements is with links, or pointers, from one item to another. For example, the symbols a, b and c might be linked into the list abcnil, where nil is a special value that marks the end of a chain of pointers. Comparing two such structures for equality is straightforward: Just trace the two chains of pointers, and if both reach nil at the same time without having encountered any discrepancies along the way, they are identical.

The pointer-following algorithm works well enough in most cases, but consider this structure:

Click to Enlarge Image

An algorithm that attempts to trace the chain of pointers until reaching nil will never terminate, and so structural equality will never be decided. This problem can be solved—the workaround is to lay down a trail of breadcrumbs as you go, and stop following the pointers as soon as you recognize a site you've already visited—but the technique is messy.

There's something else inside the computer that's remarkably hard to test for equality: programs. Even in the simplest cases, where the program is the computational equivalent of a mathematical function, proving equality is a challenge. A function is a program that accepts inputs (called the arguments of the function) and computes a value, but does nothing else to alter the state of the computer. The value returned by the function depends only on the arguments, so that if you apply the function to the same arguments repeatedly, it always returns the same value. For example, f(x) = x 2 is a function of the single argument x, and its returned value is the square of x.

A given function could be written as a computer program in many different ways. At the most trivial level, f(x) = x 2 might be replaced by f(y) = y 2, where the only change is to the name of the variable. Another alternative might be f(x) = x ? x, or perhaps f(x) = exp(2 log(x)). It seems reasonable to say that two such functions are identical if they return the same value when applied to the same argument. But if that criterion were to serve as a test of function equality, you would have to test all possible arguments within the domain of the function. Even when the domain is not infinite, it is often inconveniently large. The alternative to such an "extensional" test of equality is an "intensional" test, which tries to prove that the texts of the two programs have the same meaning. Fabricating such proofs is not impossible—optimizing compilers do it all the time when they substitute a faster sequence of machine instructions for a slower one—but it is hardly a straightforward task.

When you go beyond programs that model mathematical functions to those that can modify the state of the machine, proving the equality of programs is not just hard but undecidable. That is, there is no algorithm that will always yield the right answer when asked whether two arbitrary programs are equivalent. (For a thorough discussion of program equivalence, see Richard Bird's book Programs and Machines .)

comments powered by Disqus


Of Possible Interest

Computing Science: Belles lettres Meets Big Data

Technologue: Quantum Randomness

Technologue: The Quest for Randomness

Subscribe to American Scientist