SCIENCE OBSERVER
Neatness Counts
David Schneider
Budding programmers have long been taught the value of writing
computer code that is easy for anyone to follow, thus allowing
modifications, upgrades and bug fixes to be made easily. One common
technique is to intersperse comments to describe various elements of
the program. Another useful strategy is to assign names to
variables, methods and subroutines that reflect their function. For
example, instead of writing the somewhat cryptic "For I = 1 to
N," a responsible programmer might use "For file_index = 1
to number_of_files." Such changes don't, of course, affect how
the software works, but they are nevertheless considered important
because they make a sequence of computer instructions to some extent self-documenting.

Recently, programmers have also been learning the value of making
their code less readable, a practice known as obfuscation.
While you might think that creating purposefully messy and
impenetrable computer programs was not something anyone would aspire
to, the ability to generate hard-to-follow code has become a hot
commodity. Many companies are now selling software designed
expressly to turn a series of neat and logical computer instructions
into reams of seeming gibberish.
The motivation for code obfuscation grows from a fundamental change
that has taken place in the way many programs are distributed.
Traditionally, software developers would write an application in
some high-level programming language, say, C++, and then
"compile" it, which is to say translate it into low-level
instructions that the processor on a particular machine can run.
Users would only be given the compiled version of the program, not
the source code. Although with special software it is possible to
de-compile a program (transform the executable version back into
source code), much is lost in translation—in particular, the
embedded comments and helpful name assignments are not retained.
Hence programmers could rest assured, knowing that as long as they
didn't give out source code, outsiders couldn't unravel the
software's inner workings.
The problem for software developers these days is that many common
computer languages no longer compile into low-level, hard-to-read
machine code. Instead the source code is transformed into an
intermediate-level language, which is what gets distributed to the
end user, where it runs on a "virtual machine" created by
resident software. Java works this way, as do computer programs
written for Microsoft's .NET framework, which is built into the new
Vista operating system, released to the public in January.
Because the intermediate-level language fed into these virtual
machines preserves a great deal of information that was in the
original source code, decompilation becomes a serious problem for
those trying, say, to avoid exposing security vulnerabilities or to
prevent competitors from stealing parts of their code. The solution
is to add an obfuscation step, which muddles things enough that the
de-compiled code becomes difficult for a person to understand or
reuse, although a computer is able to carry out the instructions and
produce exactly the same results as if no obfuscation had been attempted.
Sebastian Holst of PreEmptive Solutions, an Ohio company that sells
obfuscation software, points out that although the problem
that obfuscation addresses has long been well known to programmers,
many other people involved in corporate information technology are
just now realizing that the use of Java and .NET can pose a security
risk. "The technologists know this—it's like caller ID
these days," says Holst, giving another example where formerly
private information is now open to examination, "but the
IT-risk people don't yet understand, and the coders don't mention it
because it makes for more work."
Even before obfuscation became a valuable service to the software
industry, some programmers enjoyed seeing how difficult they could
make their code appear. Indeed, for more than two decades hackers
have vied for top honors in that category by entering their best
efforts in the International Obfuscated C Code Contest, known as the
IOCCC for short. Landon Curt Noll and Larry Bassel, both then
programmers at National Semiconductor, created this curious
programming competition—the longest-running contest on the
Internet—in 1984. Entrants must submit complete C programs no
longer than 4,096 bytes that, according to IOCCC guidelines,
"show the importance of programming style, in an ironic
way." Knoll, who currently works as a cryptographer and
security expert for NeoScale Systems of Milpitas, California,
explains: "What we are saying is that programs that work are
not good enough."
Acceptance of entries for the 19th IOCCC closed at the end of
February, although winners are not likely to be announced for a few
months yet. Past champions have included a flight simulator, a
program that plots the positions of the four Galilean moons of
Jupiter and a text-to-pig Latin translator, which has the added
attraction of looking like a pig. There was also the world's
shortest self-replicating C program, a file containing zero bytes of
code: When compiled, the output is also nothing. Clearly, the
contestants have a lot of fun formulating their entries. "We
have a lot of fun, too," says Noll. "That's why we
continue to do it."