How much information does it take to single out one person among billions?
Suppose you fill out a survey online, with the assurance that your answers will remain anonymous. The questionnaire doesn’t record your name and address, but it does ask for some demographic information: your date of birth, your zip code, and your gender. What are the chances you could be identified from those three facts alone? You can answer this question for yourself at the website http://aboutmyinfo.org, which was set up by Latanya Sweeney of Harvard University. In my case, the site reports that I am probably the only male born on December 10, 1949, living in zip code 02144. Thus three items of not-very-intimate information—gender, zip, birth date—reveal enough to pick me out of a crowd.
Ideas about identity, privacy, and anonymity are changing fast in this era of big data and social networks. At the deepest level, identity is all about the sense of self—the answer to the question “Who am I?” Each of us also has a biological identity (manifested in fingerprints, facial features, DNA sequences) and a legal identity (name, Social Security number, signature, and so on). Now we also have a data identity, defined by various combinations of traits that distinguish us from the rest of humanity. If you ask me to identify myself, I will not answer “M, 02144, 12/10/49”; and yet, by the combinatorics of uniqueness, I am that person as much as I am “Brian Hayes.” Maybe more so: Dozens of people share my name.
In the online world we have still more identities, most of them unknown even to ourselves. For example, I am my web browser history. The list of URLs I have visited in the past week or the past month is surely unique to me, just as my fingerprints are. I could even be identified by the list of fonts available to my web browser—and a few companies make use of such facts to track individuals as they wander from site to site across the web.