Graph Theory in Practice: Part II
The Small World of Large Graphs
The various gigantic graphs that have lately attracted notice share other properties besides sheer size. In particular:
They tend to be sparse. The graphs have relatively few edges, considering their vast numbers of vertices. In a graph with n vertices, the maximum number of edges is n(n-1)/2, or roughly n2/2. (I consider here only "simple" graphs, as opposed to multigraphs, where more than one edge can join a pair of vertices.) In large real-world graphs, the number of edges is generally closer to n than to n2/2. For example, the Hollywood graph has 13 million edges connecting its 225,000 vertices. That sounds like a lot, but it falls far short of the 25 billion edges in a "complete graph," or "clique," where an edge joins every pair of vertices.
They tend to be clustered. In the World Wide Web, two pages that are linked to the same page have an elevated probability of including links to one another. Likewise among friends, if two people both know you, there's a higher-than-normal chance they also know each other. Thus the edges of the graph are not distributed uniformly but tend to form clumps or knots.
They tend to have a small diameter. The diameter of a graph is the longest shortest path across it, or in other words the length of the most direct route between the most distant vertices. Diameter is finite only for connected graphs—those that are all in one piece. A connected graph must have at least n-1 edges, and its largest possible diameter is n-1. At the opposite extreme, a complete graph, with n2/2 edges, has a diameter of 1, since you can get from any vertex to any other in a single step. Graphs nearer to the minimum than the maximum number of edges might be expected to have a large diameter. Clustering could increase the diameter further still, since edges used up in creating local clumps leave fewer edges available for long-distance connections. Nevertheless, the diameter of the Web and other big graphs seems to hover around the logarithm of n, which is much smaller than n itself.
Graphs with the three properties of sparseness, clustering and small diameter have been termed "small-world" graphs, after the familiar cocktail-party experience of making a new acquaintance in a distant city and discovering you have a friend in a common. The name was introduced by Duncan J. Watts and Steven H. Strogatz of Cornell University, whose 1998 paper in Nature discussed the Hollywood graph and several other examples. Watts, who has since moved on to the Santa Fe Institute and MIT, describes the graphs more fully in a recent book titled Small Worlds.