Andrew Odlyzko

The Laws of the Web: Patterns in the Ecology of Information. Bernardo A. Huberman. x + 105 pp. The MIT Press, 2001. $24.95.

The Internet Galaxy: Reflections on the Internet, Business, and Society. Manuel Castells. xii + 292 pp. Oxford University Press, 2001. $25.

We live in a world full of patterns. Some are unavoidable, forced on us by mathematics. For example, any group of six people will include one of the following: at least three people who all know each other or at least three who are all unacquainted. Other patterns come from probability and are overwhelmingly likely, but not unavoidable. A good illustration is the fact that in a typical group of 30 people, the chances that some pair will have the same birthday are close to 75 percent.

Still other patterns are just as pervasive but less well understood—and often less precise: Benford's Law states that in most large collections of numbers (such as lists of credit card charges or populations of cities), the initial digit will be 1 about 30 percent of the time. There are convincing quantitative arguments as to why this should be so, with deviations from the prediction diminishing as the collection of numbers increases in size. Other patterns are even less precise, serving only as rough guides, such as Pareto's Principle (the 80-20 rule), which states that 20 percent of the people get 80 percent of the income, 20 percent of the students in a class take up 80 percent of the teacher's time, and so on. Zipf's Law says that in large collections of data (such as all the words in a newspaper or a novel, or the populations of the cities in a given country), the second most popular item will occur about half as often as the most popular one, and the third most popular item about a third as often as the most popular one.

Given the prevalence of patterns throughout our world, it's not surprising that the World Wide Web should also exhibit statistical regularities. Bernardo Huberman, currently a fellow at Hewlett-Packard Laboratories, has been a pioneer in finding them. The Laws of the Web is a slim volume based on his published research, which was mostly done in collaboration with others during the decade or so he spent at Xerox's Palo Alto Research Center. Written for a nontechnical audience, the book omits the statistical analyses and mathematical models of the original papers, aiming just to convey the flavor of the discoveries.

There are indeed patterns in the Web itself, and in how we use it. For example, a few sites have large numbers of pages, and most sites have very few, with the statistical distribution following a universal law. The sites we frequent are similarly distributed, with a few sites visited extremely often and others rarely. Patterns in how people surf a single site can be used to advantage by site owners.

Although Huberman is successful in conveying a sense of the patterns found on the Web, he has a tendency to jump to conclusions and make exaggerated claims. Finding a pattern in the Internet or Internet usage and a model that produces that pattern does not necessarily mean that the pattern came about because the Internet fits the model—as Walter Willinger and colleagues point out in "Scaling phenomena in the Internet: Critically examining criticality," which appeared recently in Proceedings of the National Academy of Sciences (99[suppl. 1]:2573–2580, February 19, 2002). Very often a more detailed investigation will reveal deeper patterns that put initial conclusions in a different light. For example, in chapter 8, "Markets and the Web," Huberman discusses work he did with Lada Adamic that found a striking example of the "rich get richer" phenomenon: A very small fraction of Web sites receive a disproportionately large share of traffic. This appears to dash hopes that the Internet will lead to less concentration than regular commerce. Adamic and Huberman developed a model that produces this type of behavior. However, in another recent PNAS article, "Winners don't take all: Characterizing the competition for links on the web" (99[8]:5207–5211, Apr. 16, 2002), David M. Pennock and others paint a somewhat different picture: Although the global distribution of links among all Web pages is indeed highly skewed, some categories, such as pages at universities, show much less biased distributions. Pennock and his coauthors propose a more complicated model that accounts for both the category-specific and the global distributions that have been observed. There will undoubtedly be further research in this area.

For another example of the need to treat the book's conclusions with caution, consider the "social dilemmas" Huberman discusses in chapter 6. He and Eytan Adar investigated the behavior of users of the Gnutella system for sharing music files, observing that "upwards of 70 percent" of users were "free riders," enjoying the service without contributing to its content. Huberman suggests that Gnutella might collapse even without any action by the music industry, as the "tragedy of the commons" takes over. Yet this possibility seems very remote to me. The Pareto 80-20 rule is widely applicable, and it is not rare to have fewer than 30 percent of users contribute significantly to the maintenance of a system. For example, just a tiny fraction of 1 percent of users of Linux contribute to the software. Thus it is risky to suggest that Gnutella could collapse on account of its free riders. Thought-provoking as The Laws of the Web is, it is by no means definitive.

The Internet Galaxy: Reflections on the Internet, Business, and Society is based on a series of lectures given at the University of Oxford by Manuel Castells, a distinguished sociologist at the University of California, Berkeley. This book is much shorter, more digestible and more current than Castells's landmark three-volume work, The Information Age: Economy, Society, and Culture.

Castells presents a fair and balanced view of how society is adapting to the unprecedented opportunities offered by the Internet. He avoids both the exaggerated hype common just a couple of years ago and the extremely negative views that have begun to emerge that deny any special role for the Internet. The book covers issues of privacy, liberty, the "New Economy" (and whether there is such a thing), virtual communities, the digital divide and the culture of the Internet.

If The Internet Galaxy has any important deficiency, it is a lack of appreciation of other telecommunication networks than the Internet. Castells points out that the Web has grown from 16 million users at the end of 1995 to more than 400 million in early 2001. This is an explosive growth rate. Yet during that same period, the number of telephone land lines in the world grew from about 700 million to slightly more than 1 billion. (Since land lines are often used by several people, the number of new users grew far more than the number of Internet users.) Even more spectacularly, the number of mobile phone subscribers in this period grew from about 90 million to about 950 million. Thus most of the networking that has taken place over the last few years has been of the traditional voice telephony variety. In the United States today, when offered the choice, most people vote with their pocketbooks for extremely narrowband wireless phones over comparably priced digital subscriber line (DSL) or cable modem links. It is advisable to take this into account when evaluating the impact and future prospects of the Internet.

The Internet Galaxy provides a good overview of the sociology of information and the chapters have very useful reference lists. (One could only wish that more of the works cited were available online.) But it is likely best read with The Information Age at hand to provide more detail when needed. Still, fans of Castells's magnum opus are likely to find the newer book a welcome survey and update on his thinking over the last few years.-Andrew M. Odlyzko, Digital Technology Center and Department of Mathematics, University of Minnesota

