COMPUTING SCIENCE
Connecting the Dots
Can the tools of graph theory and social-network studies unravel the next big plot?
Brian Hayes
Who Calls Whom
The NSA is the U.S. espionage service with responsibility for
cryptography and "signals intelligence." Although its
budget and staffing are secret, it is often said to be the largest
of the U.S. intelligence agencies and also, incidentally, the
largest employer of mathematicians in the United States and perhaps
in the world. And it is assumed to possess prodigious computing resources.
Exploration of the call graph belongs to the branch of signals
intelligence known as traffic analysis. In a battlefield situation,
you might intercept an enemy's radio transmissions but be unable to
read their encrypted content. Nevertheless, just counting the
messages can yield valuable information. A flurry of activity might
signal an impending troop movement; sudden radio silence could be
even more ominous. If you can identify the source and the intended
recipient of each message—in effect, constructing a call
graph—you can learn even more, since lines of communication
often reveal something about the organization of a military force.
The search for meaningful patterns in telephone records could rely
on similar principles, but the problem is much harder. In the
military situation, messages between enemy units are readily
identified as such. In the telephone database, calls among a few
dozen conspirators would all too easily get lost in the background
noise of other conversations.
The records in the call database are collected not for the sake of
national security but for mundane commercial purposes. In order to
send you an itemized bill at the end of the month, a phone company
needs to keep track of every call completed, with the originating
and receiving phone numbers and the starting and ending times. The
largest companies handle roughly 250 million toll calls a day, and
so a month's worth of data amounts to several billion call records.
AT&T reports that its database of retained records is
approaching two trillion calls and more than 300 terabytes of data.
Apart from billing, the call graph has other uses within the phone
company—some of which are not too different from what the NSA
may be doing, and almost as secretive. Historical calling patterns
can be used to detect fraud, and some patterns are also of interest
in marketing. For example, a company that offers a discounted rate
within a "calling circle" can use information from the
call graph to estimate the costs and benefits of the program.
In principle, the same kind of traffic data found in telephone
call-detail records could also be compiled for other communications
channels. For instance, Federal Express and other courier services
keep digitized records of their deliveries, which could readily be
transformed into a database of senders and receivers. Curiously, the
most digital medium of all—the Internet—does not provide
for routine retention of who-speaks-to-whom data; there's no direct
need for it, since customers do not pay by the message. However,
there is no technological barrier to collecting detailed statistics
on e-mail messages and other kinds of Internet traffic. A
"packet sniffer" installed on the network backbone would
simply need to scan the headers of messages and record the
to and from addresses. (It's even possible that
equipment reportedly installed by the NSA at certain Internet
switching centers could have this purpose.)
» Post Comment