Wednesday, September 16, 2009

The Redirector Chain Mashup Design Pattern

The most exciting things to learn are the ones that you already knew, but didn't know that you knew. An example for me was when I learned about design patterns in the context of java programming. In an instant, I had both a framework for thinking about solutions I had figured out on my own, and a mechanism for discovery of solutions ready to be reused. You've surely encountered design patterns in one form or another, though perhaps you haven't thought of them as such. For example, the inverted pyramid and the Five Paragraph Essay are design patterns for writing. In software development there are many more design patterns than there are in writing, which is why the concept is so useful.

A post by Owen Stephens about managing link persistence using OpenURL got me thinking about design patterns used for the composition of web services. Most developers of web services tend to think in terms of container-oriented solutions to composed services, or to use another term, mash-ups. The container can be either server side or client-side and may be composing data, software services, or neither. In a post more than two years old, Alex Barnett enumerated 5 different design patterns for mash-ups. Missing from that list is what we might call the redirection chain pattern, of which Owen Stephens' proposed service for the Telstar Project would be an example.

In a redirection chain, a user traverses a link to one or more servers that redirect the user to a different address. At each step in the chain, services can be performed or functions added. In Owen's post, the service being contemplated is link maintenance. In most cases, the user will be redirected to a target url embedded in the link. If for some reason the URL no longer works, a new url can be provided to the user. For example, suppose a website goes out of business and the expired domain name is taken by a pornography site. The redirector provides a single point of maintenance for the link. Website publishers routinely use this type of redirector to enable them to move content around without breaking links. In the world of scholarly publishing, Crossref has provided an invaluable service of this type that enables doi-based e-journal links to continue to work in the face of publisher mergers, acquisitions, migrations and bankruptcy. PURL.org provides a similar service aimed at archives in libraries. The use of NISO OpenURL link would provide a standardized way to add metadata to the link, and would allow easy way to mash-up, using the redirector design pattern, the URL maintenance redirector with a link-to-full text redirector service used by the library.

In the years that I've worked on linking technology, I've come across a considerable diversity of redirector based services, but I don't think I've ever seen a list of things they are used for. So here goes:
  1. Session initiation. If a website needs to maintain state, visitors may need to first acquire a session. This is very often done by greeting a new visitor with a redirect that carries a cookie, or if that doesn't work, a session token in the link.
  2. Referrer tracking. Many web services need to keep track of the source of visitors, for example, in the context of an affiliate marketing program such as Amazon Associates. (Take a look at the link to Amazon Associates if you want an example of an affiliate link)
  3. Customized resolution.
    • In the world of libraries, this is referred to as solving the "appropriate copy problem", and almost all research libraries today make use of specialized redirectors that handle links conforming to the OpenURL standard mentioned abouve. Libraries subscribe to many electronic resources, and library patrons that want a particular article need to be directed to the one resource among many that the library has subscribed to.
    • an internet business may want to deliver different resources depending on where in the world the user is. A good example of this is the GeoDirection service provided by GeoBytes. A business with a global presence might need to do this to comply with local laws.
    • Language customization. Websites are often maintained in multiple languages. A language redirector might inspect accept-languange headers and redirect the user to a language appropriate service.

  4. User authentication. Almost all modern single-sign-on user authentication systems employ redirection in some form. For example, in the Shibboleth system, an unauthenticated user is redirected to a "where are you from" (WAYF) service that then redirects to an authentication form, which then adds an authentication token to the url and redirects the user again back to the original target resource.
  5. URL shortening. I've previously written about tr.im and bit.ly and the challenges for the URL shortener businesses. In brief, the advent of twitter has created a need for short URLs.
  6. Usage tracking. Although I became aware of tr.im and bit.ly from their use as short URLs, the reason I continue to use them is that they provide a handy way to see if anyone is clicking on the links. Similarly, libraries that have deployed OpenURL link servers are finding that the usage logs they generate provide invaluable information about the usage of digital collections.
  7. User tracking. This is really the same as the previous use, except with a different focus. Advertising networks often serve ads through redirectors and try to deliver the most relevant ads depending on what they know about the user.
  8. Agent based metadata delivery. This is an application that has been exploited less often for good than for deception. You may have heard of "cloaking" which is the practice of providing keyword-filled pages to search engines so that they don't find out that a website is just a bunch of advertisements. A similar practice is recommended by the W3C in the context of Semantic Web metadata for "things".
  9. Link enhancement. Often, a redirector will be configured to change the format of a link or to add information into a link. An example of a service that does this is OCLC's LibraryLookup service. It not only translates a simple isbn based link into something a library catalog can understand, it also adds alternate ISBNs into the link.
  10. Link Framing. Some redirectors put a frame around the linked content. This can be done to provide the user a path back to the referrer or to present more services (and advertising) to the user.
Looking over this list, we can see that there are 3 types of things that redirectors can do.
  1. Dynamic routing.
  2. Data collection.
  3. Link enhancement.
Many redirectors perform more than one of these functions.

In a redirector chain mashup, one redirector points to a second, which points to a third, etc. These mashups are often composed on an ad hoc basis, or even inadvertently. For example, if you think that an OpenURL link is too horribly long and ugly to put in an email, you might consider shortening it with a shortening redirector. On clinking the link, the recipient of your email visits the library link server, which might repoint the link to a redirecting proxy server to check if the user is locally authenticated. The proxy server may in turn redirect the user to doi.org's global redirector, which will next point the user to a publisher's linking hub which finally redirects the user to the full text content server. The distributed and minimally coordinated development and deployment allowed by the redirector chain is perhaps its greatest advantage.

Dominoes lineImage via Wikipedia

The serial invocation of redirectors is the design pattern's main weakness. The processing times for each of the composed services add together and multiply transit time latencies. There is also an impact on reliability, as failure of any redirector component will result in a failure of the link. Similarly, the throughput of the composed service will be equal to that of the component with the lowest throughput. In most cases the technical performance and reliability of redirector services are not a big issue compared to institutional issues. In Owen Stephen's example, building and deploying a redirector is not hard, but assuring that the institution providing the redirector will be willing to continue to do so for the life of the deployed links is probably above his pay grade.

Using a standard format for redirection URLs could make it easier to swap one redirection service for another, partly addressing the institutional commitment issue. The idea of using the OpenURL standard for the URL persistence application seems promising, but as someone who served on the NISO committee that standardized OpenURL, I must admit that the existing standard falls a bit short of what would needed for an applications such as Telstar's. Still, there's not much competition. Proposals such as ARK, which is focused more on archives, might be worth a look. Although developer of urlShort has called for a shortener standard, the shortening application has rather different constraints.

What sort of things might a broader redirector standard include? Here's my very short list:
  1. A standard query parameter for target urls. The most common format seems to be [baseurl]?url=[url] .
  2. A recommendation on what to do with that standard query parameter and any others that might be sent.
  3. Something about preventing loops.
Doesn't seem too hard.

I wonder if "developing a standard" could itself be considered a design pattern?
Reblog this post [with Zemanta]

1 comment:

  1. Re #5, I would say there's been a need for short urls for quite a while. Unfortunately for them, it's another example of fail on the part of print publications (newspapers, magazines) and print marketers. Not that they didn't have some hairbrained ideas along the way. Remember the cuecat?

    ReplyDelete