How Many Ways Can You Spell V1@gra?
Spam mutates, and the Internet community mounts an immune response
Antibodies to Spam
The spam we see today is shaped in many ways by our own efforts to combat it. The process is often likened to an arms race, with threats met by countermeasures, which then bring countercountermeasures, and so on. I prefer an immunological metaphor, where the contest is between a host organism and pathogens or parasites, and where both sides have to adapt and evolve in order to survive. In the case of bacteria and viruses, the vast majority never make it, but nature is profligate and can afford such high attrition; likewise spammers find it worth their while to send a million e-mails for a handful of responses.
Some organisms have "hard-wired" resistance to infection; they produce molecules—natural antibiotics—that inhibit the growth of certain bacteria. The mammalian immune system works differently; we are not born with specific defenses against Salmonella or measles. Instead, a random shuffling mechanism generates a vast array of defensive molecules, which have the potential to attack virtually anything they might encounter in the environment. Before going into action, however, the system must learn to distinguish friend from foe. This strategy has a cost: Because learning is a slow process, you may well get sick the first time you are exposed to an infectious agent. But the alternative of relying on a predetermined list of potential threats would be even more perilous, since any novel pathogen would meet no resistance at all.
The option of exploiting random variation is also available to the opposition. Indeed, the pathogens that pose the greatest danger of epidemic outbreaks are those that mutate rapidly and randomly, changing their outward appearance to evade immune-system surveillance.
It's easy to draw parallels between these biological concepts and the co-evolution of spam and antispam technologies. When the first unwanted bulk e-mails appeared, the recipients deleted them manually. As the volume increased (and along with it the level of irritation), savvy network users wrote simple programs to automate the deletion process. These early filtering programs, many of which were created with the Unix tool procmail, relied on static, hand-crafted rules to recognize spam. For example, a message might be rejected if the phrase "Free softwares!!" appeared in the subject line. The weakness of this system is that new rules are needed when the next spam advertises "Cheap softwares!!"
The procmail approach to spam filtering corresponds to the biological strategy of synthesizing a separate antibiotic for each type of bacterial infection. A more versatile filter, analogous to the mammalian immune system, can learn to recognize virtually any category of message, based on whatever characteristics of the text happen to be most salient. These distinctive markers are the counterparts of antigenic sites, or epitopes, on the protein molecules that label a virus or bacterium as foreign. The adaptive spam filter doesn't work from a predefined list of suspect phrases but rather discovers the most telling signs by exposure to spam and legitimate e-mail.
Whatever the mechanism of the filter, the spam writer can respond by varying the message. If e-mail containing the word "Viagra" is blocked, there are other ways of getting the idea across, including synonyms and circumlocutions ("sildenafil citrate," "impotence meds," "the little blue pill"). An adaptive filter will soon flag these terms as well, but by then the spammer can move on to other options. For some kinds of variation—such as obfuscatory misspelling along the lines of "V1@gra"—computational methods could automate the generation of random variants.