How much information does it take to single out one person among billions?
33 Bits of Information
Fifteen years ago, when the public Internet was still young, a Silicon Valley executive dismissed concerns about privacy in online life. “You have zero privacy anyway,” he said. “Get over it.” The remark was jarring at the time, but it seems that many of us have gotten over it—or else given in to it.
For a major segment of the population, the urgent concern is not privacy but sharing: We tweet, we link in, we update our status. Although these communications are meant for a select audience, most people understand that everything they post on a social network is also visible to the operators of that network, and perhaps to others. It’s a bargain they make willingly: A fifth of humanity is on Facebook. But no one willingly submits to font sniffing and other surreptitious profiling schemes.
Plugging such privacy leaks is hard. The root of the problem is that each of us really is unique, not only in deep matters of body and mind but even in our most trivial attributes, such as the cruft we’ve squirreled away over the years in dusty corners of a computer disk. In a world where every tiny idiosyncrasy can be cataloged and filed away in milliseconds, it’s all too easy to compile a unique fingerprint. Just 33 bits of information is enough to single out any one person from the world population of 7.1 billion.
In some contexts, thoughtful attention to counting those bits has helped to draw a curtain of discretion over personal data. The HIPAA regulations for medical data are an example, and the Census Bureau has similar policies. For example, population breakdowns by race and sex are not released for the smallest geographic divisions, and various kinds of random noise are added to some tabulations. The study of such measures—asking how best to protect individual identity without impairing the research value of the statistics—has grown into a thriving minidiscipline called differential privacy.
Perhaps some variant of the same approach can be made to work for everyday life online. Website designers would still get enough information about the browser environment to present information effectively, but they wouldn’t get 33 bits.
- Acar, G., et al. 2013. FPDetective: Dusting the web for fingerprinters. In Proceedings of the 20th ACM Conference on Computer and Communications Security, pp. 1129–1140.
- Barth-Jones, D. C. 2012 preprint. The “re-identification” of governor William Weld’s medical information: A critical re-examination of health data identification risks and privacy protections, then and now. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2076397.
- Eckersley, P. 2010. How unique is your web browser? In Proceedings of the 10th Privacy Enhancing Technologies Symposium, pp. 1–17.
- Golle, P. 2006. Revisiting the uniqueness of simple demographics in the U.S. population. In Proceedings of the Fifth ACM Workshop on Privacy in Electronic Society, pp. 77–80.
- Nikiforakis, N., et al. 2013. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In Proceedings of the 2013 IEEE Symposium on Security and Privacy, pp. 541–555.
- Olejnik, L., C. Castelluccia, and A. Janc. 2013. On the uniqueness of web browsing history patterns. Annals of Telecommunications doi:10.1007/s12243-013-0392-5.
- Sweeney, L. 2000 preprint. Simple demographics often identify people uniquely. Data privacy working paper 3, Carnegie Mellon University. http://dataprivacylab.org/projects/identifiability/paper1.pdf.
- Sweeney, L., A. Abu, and J. Winn. 2013 preprint. Identifying participants in the Personal Genome Project by name. http://privacytools.seas.harvard.edu/publications/identifying-participants-personal-genome-project-name.