COMPUTING SCIENCE
How Many Ways Can You Spell V1@gra?
Spam mutates, and the Internet community mounts an immune response
Brian Hayes
Epidemic and Endemic Spam
What is the long-term outlook for the spam problem? The gloomy view argues that we are caught in a tragedy of the commons. Economics favor the spammer; we may as well scrap e-mail and move on to the next channel of communication. The rosy forecast sees filters improving; so little spam will leak through that sending the stuff will become unprofitable, and the whole enterprise will collapse. (Bill Gates predicted that the problem would be licked by 2006.) The middle path is coexistence. Neither e-mail nor spam is driven to extinction.
Filters surely will improve, and yet there are lots of reasons to think that accuracy has a limit—and it's not 100 percent. For one thing, even people can't achieve perfect accuracy in classifying mail. William Yerazunis of the Mitsubishi Electric Research Laboratories, an expert on text classification, tried the experiment with his own mail and scored 99.84 percent.
Something else to keep in mind is that spammers could choose to improve quality rather than increase quantity. One conclusion I took away from my sodden experience of reading 10,000 spams was that if we can't have less spam, we really need better spam. And there's no reason why it all has to be so monotonous and unpalatable. Just because someone is selling a sleazy, counterfeit and probably illegal product doesn't mean the advertising has to be verbal and visual sludge. On the contrary, it's the worst products that need the best marketing (think of cigarettes). I suppose this is a way of saying that the end of spam is not death but transfiguration.
Finally, one premise of the entire anti-spam industry seems to me highly questionable—namely the assumption that every spammer's ultimate goal is to slither through the spam filter. As a text-classification system, a filter acts not to block a certain of class of mail but rather to sort messages into two categories—the inbox and the spam bin. Most of us look upon the spam bin as nothing more than a dung heap that has to be mucked out every now and then, but someone is finding information of value and interest there, or else spam would already have withered away. Seen from this point of view, a reliable filter serves the interests of the spammer as well as those of the recipient.
Diseases tend to evolve from an epidemic to an endemic state. For the first population exposed, the infection is dire and deadly; later, everyone gets a little sick but survives. It's not really in the pathogen's interest to kill the host; and although the host might well like to exterminate the disease, that seldom happens. The future of spam may be a low-grade fever.
© Brian Hayes
Bibliography
Cockerham, Rob. 2004. There are 600,426,974,379,824,381,952 ways to spell Viagra. http://www.cockeyed.com/lessons/ viagra/viagra.html Gordillo, José, and Eduardo Conde. 2007. An HMM for detecting spam mail. Expert Systems with Applications: An International Journal 33(3):667-682. Graham, Paul. 2002. A plan for spam. http://www.paulgraham.com/spam.html Graham, Paul. 2003. Better Bayesian filtering. http://www.paulgraham.com/better.html Graham-Cumming, John. 2006. Does Bayesian poisoning exist? Spam Bulletin, February 2006. http://www.virusbtn.com/spambulletin/archive/2006/02/index Hayes, Brian. 2003. Computing science: Spam, spam, spam, lovely spam. American Scientist 91:200-204. Lee, Seunghak, Iryoung Jeong and Seungjin Choi. 2007. Dynamically weighted hidden Markov model for spam deobfuscation. In Proceedings of the 2007 International Joint Conference on Artificial Intelligence, IJCAI07. http://www.ijcai.org/papers07/Papers/IJCAI07-406.pdf Lowd, Daniel, and Christopher Meek. 2005. Good word attacks on statistical spam filters. In Second Conference on Email and Anti-Spam, CEAS 2005. http://www.ceas.cc/2005/ Pantel, Patrick, and Dekang Lin. 1998. SpamCop—A spam classification and organization program. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pp. 95-98. http://www.isi.edu/~pantel/Content/publications.htm Pu, Calton, and Steve Webb. 2006. Observed trends in spam construction techniques: a case study of spam evolution. In Third Conference on Email and Anti-Spam, CEAS 2006. http://www.ceas.cc/index-2006.html Sahami, Mehran, Susan Dumais, David Heckerman and Eric Horvitz. 1998. A Bayesian approach to filtering junk e-mail. In AAAI-98 Workshop on Learning for Text Categorization. http://robotics.stanford.edu/users/sahami/papers-dir/spam.ps Zdziarski, Jonathan A. 2005. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. San Francisco: No Starch Press. » Post Comment