When I was part of a committee working on the OpenURL standard, we had a brief discussion about the maximum length URL that would work over the internet. A few years before that, there were some systems on the internet that barfed if a URL was longer than 512 characters, but most everything worked up to 2,000 characters, and we anticipated that that limit would soon go away. So here we are in 2009, and Internet Explorer is just about the only thing that still has a length limit as low as 2083 characters. Along comes Twitter, with a 140 character limit on an entire message, and all of a sudden, the URL's we've been making have become TOO LONG! Just as fast, URL shortening services sprung up to make the problem go away.
The discussion on my last post (on CrossRef and OpenURL) got me interested in the semantics of redirection, and that got me thinking about the shortening services, which have become monster redirection engines. When we say something about a URI that is resolved by a redirector, what, exactly are we talking about?
Just as 301 and 302 semantics have been determined by their uses in search engines, the 303 has been coopted by the standards-setters of the semantic web, and they may well be successful in determining the semantics of the 303. As described in a W3C Technical Recommendation, the 303 is to be used
... to give an indication that the requested resource is not a regular Web document. Web architecture tells you that for a thing resource (URI) it is inappropriate to return a 200 because there is, in fact, no suitable representation for those resources.In other words, the 303 is suppoesed to indicate that the Thing identified by the URI (URL) is something whose existence is NOT on the web. Tim Berners-Lee wrote a lengthy note about this that I found quite enjoyable, though at the end I had no idea what it was advocating. The discussion that led to the W3C Recommendation has apperently been extremely controversial, and has been given the odd designation "http-range-14". The whole thing reminds me of reading the existentialists Sartre and Camus in high school - they sounded so much more understandable in French!
As discussed in Danny Sullivan's article, most of the URL shorteners use 301 redirects, which is usually what most users want to happen. An indexing agent or a semantic web agent should just look through these redirectors and use the target resource URL in its index. The DOI "gateway" redirector at dx.doi.org discussed in my previous post uses a 302 redirect. Unless doi's are handled specially by a search engine, it means that the "link credit" (a.k.a. google juice) for a dx.doi.org link will accrue to the dx.doi.org URL rather than the target URL. This seems appropriate. Although I indicated that if you use Linked Data rules the dx.doi.org link identifies whatever is indicated by the returned web page, from the point of view of Search engines, that URI identifies an abstraction of the resource it redirects to. A redirection service similar in conception, PURL, also uses 302 redirects.
I was curious about the length limits of the popular url shorteners. Using a link to this blog, padded by characters ignored by Blogger.com, I shortened a bunch of long URLs. Here are 4 shortened 256 character links to this blog:
- http://is.gd/1ro7Q clips the URL at 2000 characters
- tr.im fails to shorten the URL.
Next, I wanted to see if I could make a redirection loop. Most of the shortening services decline to shorten a shortened URL, but they're quite willing to shorten a URL from the PURL service. Also, I couldn't find any way to use the shortening services to fix a link that had rotted after I shortened it. It could be useful to add the PURL service as link-rot insurance behind a shortened url if the 302 redirect is not an issue. So here's a PURL: http://purl.oclc.org/NET/backatcha that redirects to http://bit.ly/aE0od which redirects to http://purl.oclc.org/NET/backatcha etc. Don't click these expecting an endless loop- your browser should detect the loop pretty fast.
A recent article about how bit.ly is using its data stream to develop new services got me thinking again about how a shortening redirector might be useful in Linked Data. I've written several times that Linked Data lacks the strong attribution and provenance infrastruction needed for many potential applications. Could shortened URIs be used as Linked Data predicates to store and retrieve attribution and provenance information, along with the actual predicate? And will I need another http status code to do it?