Graph Theory in Practice: Part II
Sparseness, clustering and small diameter are not the only distinctive properties of large real-world graphs. Another trait that has attracted notice is the degree sequence—the number of vertices with each possible number of edges from 0 to n-1.
A lattice has a simple degree sequence. All the vertices have the same number of edges, and so a plot of the degree sequence consists of a single sharp spike. Any randomness in the graph broadens this peak. In the limiting case of an Erdos-Rényi graph, the degree sequence has a Poisson distribution, which falls off exponentially away from the peak value. Because of this exponential decline, the probability of finding a vertex with k edges becomes negligibly small for large k.
There is evidence that real graphs such as the Web behave differently. The distribution of degrees is described by a power law rather than an exponential. That is, the number of vertices of degree k is given not by e-k (an exponential) but by k-γ (a power law, where the power γ is a positive constant). The power-law distribution falls off more gradually than an exponential, allowing for vertices of very large degree.
Several research groups have independently discovered this property of the degree sequence. A group that includes Kleinberg and several workers from the IBM Almaden Research Center found evidence of a power law in the Web, and so did Adamic and Bernardo A. Huberman of the Xerox Palo Alto Research Center. Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos observed power-law relationships in the hardware substrate of the Internet.
Still another group that recognized a power law in the statistics of the Web consists of Albert-László Barabási of Notre Dame University and his colleagues Réka Albert and Hawoong Jeong. Barabási, Albert and Jeong set out to estimate the diameter of the Web (or, strictly speaking, the characteristic path length), and found that two randomly chosen pages are about 19 mouse-clicks apart. In the course of this analysis they tabulated the degree sequence, observing that the probability that a page has links to k other pages is approximately k-2.45; the probability that k pages point to a given page is k-2.1.
To explain the power-law degree sequence, Barabási, Albert and Jeong have proposed a new random graph model. They argue that other models fail to take into account two important attributes of the Web. In the first place, the Web is continually sprouting new pages, but most models are static: Although edges can be added or rearranged, the number of vertices never changes. Second, both the Erdos-Rényi and the Watts-Strogatz processes assume uniform probabilities when creating new edges, but this is not very realistic either. Barabási and his coworkers note that Web pages that already have many links are more likely to acquire still more links.
Like the Erdos-Rényi model, the Barabási model starts with n vertices and no edges, but the evolution is different. At every step the graph adds a single new vertex and m edges; all the new edges link the new vertex to some of the vertices already present. The probability that a given vertex will receive a new edge is proportional to the share of the total set of edges that the vertex already owns; hence the well-connected become still better-connected. After t steps, the graph has n+t vertices and mXt edges. Growing according to these rules, the graph attains a statistical steady state: The shape of the distribution of node degrees does not change over time. The distribution is described by a power law with an exponent of 3; in other words, the probability of finding a vertex with k edges is proportional to k-3.
Barabási and his colleagues have tested their model on several large graphs, including those discussed by Watts and Strogatz—the Hollywood graph, an electric power grid and the neural network of C. elegans. They find that both of the novel features of the model are essential to its success; eliminating either growth in the vertex set or preferential attachment of edges impairs the model's performance.
The straightforward way that the growth of a Barabási graph mimics the growth of the Web gives the model strong intuitive appeal, and yet the correspondence of theory and observation is not quite as close as one might hope. As noted above, the actual γ exponents for the Web are about 2.45 for outward links and 2.1 for inward links, significantly different from the model's prediction of 3. For some other graphs, such as the C. elegans network, the discrepancy is even greater. Various adjustments could tune the model for a better match, but they inevitably sacrifice some of its simplicity.