Graph Theory in Practice: Part I
Reach Out and Touch Everyone
A good example of a really big graph comes from telephone billing records. Joan Feigenbaum of the AT&T Shannon Laboratories in Floral Park, New Jersey, heads a group working with a graph known as the call graph. The vertices are telephone numbers, and the edges are calls made from one number to another. A specific call graph recently analyzed by James M. Abello, P. M. Pardalos and M. G. C. Resende of AT&T has 53,767,087 vertices and more than 170 million edges.
The call graph is actually a directed multigraph—directed because the two ends of a call can be distinguished as originator and receiver, a multigraph because a pair of telephones can exchange more than one call in a day. For ease of analysis, these aspects of the graph are sometimes ignored: Sets of multiple edges are collapsed into a single edge, and the graph is treated as if it were undirected. (The graph also has some 255 self-loops, which I find rather puzzling. I seldom call myself, and it's never long-distance.)
The first challenge in studying the call graph is that you can't swallow it whole. Even though the analysis was done on a computer with six gigabytes of main memory, the full graph would not fit. Under these conditions most algorithms are ruinously inefficient, because pieces of the graph have to be repeatedly shuttled between memory and disk storage. The call graph has therefore become a test-bed for algorithms designed to run quickly on data held in external storage.
What did Abello, Pardalos and Resende learn about the call graph? It is not a connected graph but has 3.7 million separate components, most of them tiny; three-fourths of the components are pairs of telephones that called only each other. Yet the call graph also has one giant connected component, with 44,989,297 vertices, or more than 80 percent of the total. The emergence of a giant component is characteristic of Erdos-Rényi random graphs, but the pattern of connections in the call graph is surely not random. Some models that might describe it will be taken up in Part II of this article, to appear in the March–April issue.
Abello and his colleagues went hunting within the call graph for structures called cliques, or complete graphs. They are graphs in which every vertex is joined by an edge to every other vertex. Identifying the largest such structure—the maxclique—is computationally difficult even in a graph of moderate size. In the call graph, the only feasible strategy is a probabilistic search that finds large cliques without proving them maximal. Abello et al. found cliques of size 30, which are almost surely the largest. Remarkably, there are more than 14,000 of these 30-member cliques. Each clique represents a distinct group of 30 individuals in which everyone talked with everyone else at least once in the course of a day.