Advertisers would give just about anything to be able to lurk over your shoulder as you browse the internet. They want to know what sites you visit, how you get to them, how long you spend on them, and where you go next—along with as much personal information about you as they can get.
Of course, they don’t have to be in the room to figure any of that out. Dozens of trackers embedded in nearly every website collect information about how you interact with the page, and cookies stored in your browser tell advertisers how often you’ve visited the site before. But the holy grail is the ability to string all this information together to create profiles that corresponds to each individual user—that is, creating a complete picture of each person on the internet, beyond just scattered data points.
Companies that compile user profiles generally do so pseudonymously: They may know a lot of demographic details about you, but they don’t usually connect your behavior to your individual identity. But a group of researchers at Stanford and Princeton developed a system that can connect your profile to your name and identity, just by examining your browsing history.
Here’s how the de-anonymization system works: The researchers figured that a person is more likely to click a link that was shared on social media by a friend—or a friend of a friend—than any other random link on the internet. (Their model controls for the baseline popularity of each website.) With that in mind, and the details of an anonymous person’s browser history in hand, the researchers can compute the probability that any one Twitter user created that browsing history. People’s basic tendency to follow links they come across on Twitter unmasks them—and it usually takes less than a minute.
For testing, the researchers recruited volunteers to download a Google Chrome extension that extracted their browsing history. Since Twitter uses a proprietary URL shortener—t.co—it was easy to tell which sites were arrived at via the social network. The study pulled as many as 100 recently visited t.co links from each user and ran them through the de-anonymization system, and within seconds, the program spits back the top 15 results from all possible Twitter users, in order of confidence. Volunteers were asked which profile was theirs, if it appeared at all, and had the option to sign into Twitter to prove their identity. The algorithm picked the right profile 72 percent of the time; 81 percent of the time, the right profile was in the top 15.
For this technique to work in the real world, where people don’t readily volunteer their browsing history for science, a snooper would need to access their target’s digital trail another way. From advertisers to internet service providers to spy agencies, many groups have access to at least a part of your browsing history.
Internet service providers like Comcast and Verizon can access many details about where their customers go on the internet—except when customers visit websites that use HTTPS, a protocol that encrypts traffic sent to and from the website. Service providers—or someone snooping on an open coffee-shop wi-fi network—can’t see details about visits to URLs that begin with https://. Even so, people can still be identified by the unencrypted HTTP sites they visit: The researchers were able to unmask nearly a third of the volunteers in the experiment using just their HTTP traffic.
And a powerful nation-state actor would have an even easier time accessing people’s browsing histories. The National Security Agency’s “upstream” collection programs, which scoop up enormous amounts of data as it passes through critical pieces of the internet’s infrastructure, could piece together someone’s history without any trouble at all. (Of course, there are probably other ways that the NSA could figure out who you are without resorting to these researchers’ de-anonymizing methods.)
Ultimately, if you want to use Twitter under your own name, there’s little you can do to thwart this de-anonymization technique. “Our deanonymization attack didn’t use any easily-fixed flaw in the Twitter service,” said Ansh Shukla, a graduate student at Stanford and one of the paper’s authors. “Users behaving normally revealed everything we need to know. As such, the research strongly implies that open social networks, detailed logging, and privacy are at odds; you can simultaneously have only two.”
Browser features like Safari’s private browsing or Chrome’s incognito mode—with its sneaky-looking fedora-and-glasses branding—aren’t real defenses against de-anonymization. Once “incognito” or “private” windows are closed, they delete the trail of history left on the browser itself, but they don’t prevent trackers, internet service providers, or certainly spy agencies from eavesdropping on traffic.
Using Tor, on the other hand—a program that anonymizes internet browsing by bouncing traffic randomly across a network of servers—would probably deter all but the most dogged spies. “We speculate that this attack can only be carried out against Tor users by well-resourced organizations on high-value targets,” Shukla wrote. “Think cyber-espionage, government intelligence, and the like.”
But for the average user, who might not be familiar with advanced privacy-preserving techniques, or who might be more interested in following interesting people on Twitter than keeping their identity safe from marketers or their internet service provider, the veil of online anonymity is thin. And as Jessica Su, one of the paper’s authors and a Stanford Ph.D. candidate, pointed out, even a person who refrains from tweeting publicly in order to remain anonymous can be unmasked.
“The conventional wisdom is that you should be careful what you share,” Su said. “But here, we show that you can even be de-anonymized if you just browse and follow people, without actually sharing anything.”