Wednesday, June 10, 2015

Protect Reader Privacy with Referrer Meta Tags

Back when the web was new, it was fun to watch a website monitor and see the hits come in. The IP address told you the location of the user, and if you turned on the referer header display, you could see what the user had been reading just before.  There was a group of scientists in Poland who'd be on my site regularly- I reported the latest news on nitride semiconductors, and my site was free. Every day around the same time, one of the Poles would check my site, and I could tell he had a bunch of sites he'd look at in order. My site came right after a Russian web site devoted to photographs of unclothed women.

The original idea behind the HTTP referer header (yes, that's how the header is spelled) was that webmasters like me needed it to help other webmasters fix hyperlinks. Or at least that was the rationalization. The real reason for sending the referer was to feed webmaster narcissism. We wanted to know who was linking to our site, because those links were our pats on the back. They told us about other sites that liked us. That was fun. (Still true today!)

The fact that my nitride semiconductor website ranked up there with naked Russian women amused me; reader privacy issues didn't bother me because the Polish scientist's habits were safe with me.


Twenty years later, the referer header seems like a complete privacy disaster. Modern web sites use resources from all over the web, and a referer header, including the complete URL of the referring web page, is sent with every request for those resources. The referer header can send your complete web browsing log to websites that you didn't know existed.

Privacy leakage via the referrer header plagues even websites that ostensibly believe in protecting user privacy, such as those produced by or serving libraries. For example, a request to the WorldCat page for What you can expect when you're expecting  results in the transmission of referer headers containing the user's request to the following hosts:
  • http://ajax.googleapis.com
  • http://www.google.com (with tracking cookies)
  • http://s7.addthis.com (with tracking cookies)
  • http://recommender.bibtip.de
None of the resources requested from these third parties actually need to know what page the user is viewing, but WorldCat causes that information to be sent anyway. In principle, this could allow advertising networks to begin marketing diapers to carefully targeted WorldCat users. (I've written about AddThis and how they sell data about you to advertising networks.)

It turns out there's an easy way to plug this privacy leak in HTML5. It's called the referrer meta tag. (Yes, that's also spelled correctly.)

The referrer meta tag is put in the head section of an HTML5 web page. It allows the web page to control the referer headers sent by the user's browser. It looks like this:

<meta name="referrer" content="origin" />

If this one line were used on WorldCat, only the fact that the user is looking a WorldCat page would be sent to Google, AddThis, and BibTip. This is reasonable, library patrons typically don't expect their visits to a library to be private; they do expect that what they read there should be private.

Because use of third party resources is often necessary, most library websites leak lots of privacy in referer headers. The meta referrer policy is a simple way to stop it. You may well ask why this isn't already standard practice. I think it's mostly lack of awareness. Until very recently, I had no idea that this worked so well. That's because it's taken a long time for browser vendors to add support. Although Chrome and Safari have been supporting the referrer meta tag for more than two years; Firefox only added it in January of 2015. Internet Explorer will support it with the Windows 10 release this summer. Privacy will still leak for users with older browser software, but this problem will gradually go away.

There are 4 options for the meta referrer tag, in addition to the "origin" policy. The origin policy sends only the host name for the originating page.

For the strictest privacy, use

<meta name="referrer" content="no-referrer" />

If you use this sitting, other websites won't know you're linking to them, which can be a disadvantage in some situations. If the web page links to resources that still use the archaic "referer authentication", they'll break.

 The prevailing default policy for most browsers is equivalent to

<meta name="referrer" content="no-referrer-when-downgrade" />

"downgrade" here refers to http links in https pages.

If you need the referer for your own website but don't want other sites to see it you can use

<meta name="referrer" content="origin-when-cross-origin" />

Finally, if you want the user's browser to send the full referrer, no matter what, and experience the thrills of privacy brinksmanship, you can set

<meta name="referrer" content="unsafe-url" />

Widespread deployment of the referrer meta tag would be a big boost for reader privacy all over the web. It's easy to implement, has little downside, and is widely deployable. So let's get started!

Links:

6 comments:

  1. You mention the case of the Polish scientist visiting a set of sites each day in order, but I'm having trouble understanding how you were able to figure out the order. I thought that a referer includes only one URL, and only that of the page on which a user followed a link to yours. I thought that if someone opened a browser bookmark or typed a URL by hand, it wouldn't send a referer link at all.

    Was the referer URL a webpage created by the scientist with a list of his favorite links, provided in numbered order under a heading such as "Morning routine"?

    ReplyDelete
    Replies
    1. That's a great question. Modern web browser software does not send the last-page address in the referrer header. My guess is that the scientist had a set of bookmarks that he would cycle through. You could look through the code for NCSA Mosaic to see what it did 20 years ago. It could also have been a caching proxy server leaking. The RFC for HTTP added this sentence in 1995: "The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard." Presumably this was in reaction to client code that made this mistake.

      Delete
  2. The referer is not sent for HTTPS connections, per RFC 2616, so there's yet another good reason for just using HTTPS.

    From 15.1.3:
    "Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol."

    ReplyDelete
    Replies
    1. This is the no-referrer-when-downgrade policy. It's a bit non-intuitive, but per RFC 2616, the referrer is still sent in secure requests. So, for example, if my web page embeds a CC button image using the https url, the address of my page is sent along with the request for the button image.

      Delete
    2. Damn! You're right. Thanks for sussing that out.

      In other news, it's very weird that the "no-referrer" option still only appears in the Editor's Draft at http://w3c.github.io/webappsec/specs/referrer-policy/ ; the latest version of the public draft (from July 2014) at http://www.w3.org/TR/referrer-policy/ doesn't have a "no-referrer" option ("none" sounds like it should do the same thing, and it appears to on Firefox, but does not on Chrome... argh).

      Delete
    3. Thanks for noticing that. I've added a note.

      Delete

Note: Only a member of this blog may post a comment.