This week FaceBook and MySpace had to deal with the consequences of obscure bugs that leaked personal subscriber information to advertisers. The Wall Street Journal reported that because Facebook and MySpace put user handles in URLs on their sites, these user handles, which can very often be traced back to a user identity, leaked to advertisers via the referer headers sent by browser software.
Reaction on one technology blog reminded me of Intel's missteps. Marshall Kirkpatrick, on ReadWriteWeb, called the the Journal's article "a jaw dropping move of bizarreness", going on to explain that passing referrer information was "just how the Internet works" and accusing the Journal of "anti-technology fear-mongering".
When a web browser requests a file from a website, it sends a bunch of extra information via http headers. One header gives the address of the file, which might be a web page, an image, or a script file. Other headers give the name of the software being used, the language and character sets supported by the browser. The Referer header (yes, that's how it's spelled, blame the RFC for getting the spelling wrong) reports the address of the page that requested or linked to the file. If the request is made to an advertiser's site, the Referer URL identifies the page that the user is looking at. When that page has an address that include private information, the private stuff can leak.
The controversy spurred me to take a look at some library websites to see what sort of data they might leak using referer headers. I used the very handy Firefox add-on called "Live HTTP Headers". I was astounded to see that a well known book database website seemed to be reporting the books I was browsing to Bit.ly, the URL shortening service! In another header, Bit.ly was also getting an identifying cookie. I went to another website, and found the exact same thing. This set off some alarm bells.
This request for a CSS stylesheet has the side effect of causing Firefox to transmit to Bit.ly the address for each and every web page I visit in a referer header.
My last post described how URL shortening services can be abused for evil, but my point was that these abuses were a burden for the services, not that the services were abusive themselves. In fact, Bit.ly has probably done more than any shortening service to combat abuse and the Preview add-on is part of that anti-abuse effort. With Preview installed, users can safely check what's behind any of the short URLs they encounter by hovering over the link in question.
The privacy leak in bit.ly Preview is almost certainly an unintentional product of sloppy coding and deficient testing rather than an effort to spy on the 100,000 users who have installed the add-on. Nonetheless, it's a horrific privacy leak. There are other add-ons that intentionally leak private information, but typically they disclose their activity as a natural part of the add-on's functionality. One example would be GetGlue, which I've written about, and even Bit.ly preview cannot help but leak some info when it's doing what it's supposed to do (expand and preview shortened URLs).
I'm sure that Bit.ly will fix this bug quickly; their support was amazingly fast when I reported another issue. But a larger question remains. How do we make sure that the services we use everyday aren't leaking our info all over the place? The most widely deployed services- Google, Amazon, Facebook, etc. all deserve a higher level of scrutiny because of the quantity of data at their fingertips. All the privacy policies in the world aren't worth a dime if web sites can't be held accountable for the effects of sloppy coding. It's high time for popular sites to submit to strict third-party privacy auditing, and for web users to demand it. It doesn't matter whether any advertisers actually used the personal information that Facebook sent them; what matters is whether users can trust Facebook.
It's also time for the internet technology community to recognize that referer headers are as dangerous to privacy as they are to spelling. They should be abolished. Browser software should stop sending them. The referer header was originally devised to help dispersed server admins fix and control broken links. Today, the referer header is used for "analytics", which is a polite word for "spying". The collection of referer headers helps web sites to "improve their service", but you could say the same of informants and totalitarian governments.
The pipe is rusty- that's why it leaks. We need to fix it.