How much information does it take to single out one person among billions?
Anonymous But Well Known
In a recent critique of Sweeney’s re-identification work, Daniel C. Barth-Jones of Columbia University points out that a combination of attributes can’t be proved unique without a “perfect population register,” which lists the corresponding attributes of every person in the population. A perfect register is seldom available. Voter rolls are not even close to complete because not everyone votes. In the absence of a perfect register, an identification is a matter of probabilities—an assertion that coincidence is unlikely but not impossible.
The same argument applies to other identifying traits. I can’t be certain that my fingerprints or my DNA are unique because I can’t compare them with everyone else’s. Nevertheless, such biometric markers are used routinely in contexts where misidentification would have the gravest consequences. Of course the probability of uniqueness for fingerprints is thought to be very high—certainly higher than the 60 percent calculated for a combination of gender, zip code, and birth date. One hopes that no one will be sent to jail on the basis of a match to those three facts.
The standard of proof is quite different when the aim is preserving privacy rather than convicting an accused criminal. If you promise confidentiality to the subjects of a medical experiment, even a tentative identification represents a breach of trust.
Last year Sweeney and two colleagues published a follow-up study based on documents from the Personal Genome Project, where people voluntarily post their own genomic data for public access, annotated with whatever personal information they choose to disclose. Among 579 files that included gender, zip code, and birth date, Sweeney’s group was able to match 130 to unique entries in voter lists; the Genome Project administrators confirmed that at least 121 of those names were correct.
In some contexts, matching unique data to a conventional identifier such as a name and address is beside the point. An Internet advertiser, for example, can make excellent use of a profile that reveals your interests and activities, even though the data are not linked to you by name. Indeed, the advertiser may prefer such “anonymous” data because there are fewer legal constraints on its collection and use.