Budding programmers have long been taught the value of writing
computer code that is easy for anyone to follow, thus allowing
modifications, upgrades and bug fixes to be made easily. One
common technique is to intersperse comments to describe
various elements of the program. Another useful strategy is
to assign names to variables, methods and subroutines that
reflect their function. For example, instead of writing the
somewhat cryptic "For I = 1 to N," a responsible
programmer might use "For file_index = 1 to
number_of_files." Such changes don't, of course, affect how
the software works, but they are nevertheless considered
important because they make a sequence of computer
instructions to some extent self-documenting.
Recently, programmers have also been learning
the value of making their code less readable, a
practice known as obfuscation. While you might think that
creating purposefully messy and impenetrable computer
programs was not something anyone would aspire to, the
ability to generate hard-to-follow code has become a hot
commodity. Many companies are now selling software designed
expressly to turn a series of neat and logical computer
instructions into reams of seeming gibberish.
The
motivation for code obfuscation grows from a fundamental change
that has taken place in the way many programs are distributed.
Traditionally, software developers would write an application in
some high-level programming language, say, C++, and then
"compile" it, which is to say translate it into
low-level instructions that the processor on a particular
machine can run. Users would only be given the compiled
version of the program, not the source code. Although with
special software it is possible to de-compile a program
(transform the executable version back into source code),
much is lost in translation—in particular, the
embedded comments and helpful name assignments are not retained.
Hence programmers could rest assured, knowing that as long as
they didn't give out source code, outsiders couldn't unravel
the software's inner workings.
The problem for
software developers these days is that many common computer
languages no longer compile into low-level, hard-to-read
machine code. Instead the source code is transformed into an
intermediate-level language, which is what gets distributed to
the end user, where it runs on a "virtual machine"
created by resident software. Java works this way, as do
computer programs written for Microsoft's .NET framework,
which is built into the new Vista operating system, released
to the public in January.
Because the intermediate-level
language fed into these virtual machines preserves a great
deal of information that was in the original source code,
decompilation becomes a serious problem for those trying,
say, to avoid exposing security vulnerabilities or to
prevent competitors from stealing parts of their code. The
solution is to add an obfuscation step, which muddles things
enough that the de-compiled code becomes difficult for a
person to understand or reuse, although a computer is able
to carry out the instructions and produce exactly the same
results as if no obfuscation had been attempted.
Sebastian Holst of PreEmptive Solutions, an Ohio company that
sells obfuscation software, points out that although
the problem that obfuscation addresses has long been well
known to programmers, many other people involved in
corporate information technology are just now realizing that
the use of Java and .NET can pose a security risk. "The
technologists know this—it's like caller ID these
days," says Holst, giving another example where formerly
private information is now open to examination, "but the
IT-risk people don't yet understand, and the coders don't
mention it because it makes for more work."
Even before obfuscation became a valuable service to the
software industry, some programmers enjoyed seeing how
difficult they could make their code appear. Indeed, for
more than two decades hackers have vied for top honors in
that category by entering their best efforts in the
International Obfuscated C Code Contest, known as the IOCCC
for short. Landon Curt Noll and Larry Bassel, both then
programmers at National Semiconductor, created this curious
programming competition—the longest-running contest on the
Internet—in 1984. Entrants must submit complete C programs
no longer than 4,096 bytes that, according to IOCCC
guidelines, "show the importance of programming style,
in an ironic way." Knoll, who currently works as a
cryptographer and security expert for NeoScale Systems of
Milpitas, California, explains: "What we are saying is
that programs that work are not good enough."
Acceptance of entries for the 19th IOCCC closed at the end of
February, although winners are not likely to be announced for a
few months yet. Past champions have included a flight
simulator, a program that plots the positions of the four
Galilean moons of Jupiter and a text-to-pig Latin
translator, which has the added attraction of looking like a
pig. There was also the world's shortest self-replicating C
program, a file containing zero bytes of code: When
compiled, the output is also nothing. Clearly, the
contestants have a lot of fun formulating their entries.
"We have a lot of fun, too," says Noll.
"That's why we continue to do it."