MY AMERICAN SCIENTIST
SEARCH

COMPUTING SCIENCE

# Connecting the Dots

Can the tools of graph theory and social-network studies unravel the next big plot?

Clique Here

Only the NSA knows whether it can actually spot such patterns, but a somewhat simpler problem can serve as a proxy in estimating the difficulty of the task. The proxy problem is that of finding a large clique within the call graph. This process was studied in the late 1990s by James Abello, then at AT&T Bell Laboratories, and his colleagues.

Finding the largest clique in a graph is a classic hard problem. The brute-force method simply examines every possible subset of vertices and checks to see if all of them are connected. The number of subsets grows so rapidly that this algorithm runs out of oomph even for a graph with 50 vertices; it would be unthinkable for 50 million. The only practical alternatives are approximate and probabilistic methods, which usually converge on a good solution but can't promise to find the best one.

The graph used for Abello's experiments included a single day's records; it had 53,767,087 vertices (corresponding to telephone numbers) and more than 170 million edges (representing calls). The algorithm started with a small clique and tried to build a bigger one. In a first stage, the program repeatedly searched for a new vertex connected with all those already in the set. When no more vertices of this kind could be found, the program switched to another strategy, looking for opportunities to remove one vertex from the clique in exchange for adding two others. Grinding through the day's database took about five hours on a machine with four processors and four gigabytes of memory.

The largest cliques found had 30 vertices. Consider what this means: In a group of 30 telephone numbers, each one either called or was called by all of the other 29 numbers in the course of a single day, for a total of at least 435 calls. That's quite a busy calling circle! But there wasn't just one such clique: Abello's group found more than 14,000 distinct cliques of size 30.

The curious result of this experiment suggests several conclusions, all of them tentative. First, the computational power needed for analyzing call graphs appears to be readily available—though it's not yet on every desktop. Abello's experiments were able to chew through one day's worth of data; today's hardware could doubtless manage much larger bites.

Second, it looks like finding instances of a given pattern within the call graph is not the problem. The problem is defining a pattern selective enough to identify a target group without also branding 14,000 others as possible terrorists. The algorithms must somehow distinguish a few dozen people intent on mayhem from other groups of the same size and structure who are planning a family reunion, canvassing the neighborhood for a lost cat, running for city council or war-dialing to win free concert tickets from a radio station.

No matter what methods are brought to bear on the problem, the intelligence agencies face a formidable task: To survey a huge population (potentially all six billion of us) looking for a tiny subgroup (those planning violence). It's analogous to screening for a rare disease. Even if the test is right 99 percent of the time, almost all the positive results will necessarily be false positives.

Abello (who is now with DIMACS, the Center for Computer Science and Discrete Mathematics at Rutgers University, and with Ask.com) thinks that the task of tracing terrorists through call graphs would be difficult, but he also notes that additional information can be extracted from the graphs, apart from the simple pattern matching I have described here. For example, cliques and other clusters have interesting dynamics; some persist from day to day but others vanish. Information like this might help to distinguish one kind of group from another. Abello also mentions extensive recent work on identifying self-organized communities in other contexts, from chat-room participants to eBay customers.