In thinking about how to bring semantic technologies to bear on OpenURL and reference linking, it occured to me that "just in time" and "just in case" are useful concepts for thinking about linking technologies. Semantic technogies in general, and Linked Data in particular, seem to have focused on just-in-case, identifier-oriented linking. Library linking systems based on OpenURL, in contrast, have focused on just-in-time description-oriented linking. Of course, this distinction is an oversimplification, but let me explain a bit what I mean.
Let's first step back and take a look at how links are made. Links are directional; they have a start and an end (a target). The start of a link always has an intention or purpose, the target is the completion of that purpose. For example, look at the link I have put on the word "grad school" above. My intention there was to let you, the reader, know something about my graduate school career, without needing to insert that digressional information in the narrative. (Actually my purpose was to illustrate the previous sentence, but let's call that a meta-purpose.) My choice of URL was "http://ee.stanford.edu/", but I might have chosen some very different URL. When I choose a specific URL, I "bind" that URL to my intention.
"Zemanta" plug-in to help me. Zemanta scans the text of my article for words and concepts that it has links for, and offers them to me as choices to apply to my article. Zemanta has done the work of finding links for a huge number of words and concepts, just in case a user come along with a linking intention to match. In this case, the link suggested by Zemanta matches my intention (to provide background for readers unfamiliar with OpenURL). The URL becomes bound to the word during the article posting process.
At the end of this article, there's a list of related articles, along with a link that says "more fresh articles". I don't know what URLs Zemanta will supply when you click on it, but it's an example of a just in time link. A computer scientist would call this "late binding". My intention is abstract- I want you to be able to find articles like this one.
Similar facilities are in operation in scholarly publishing, but the processes have a lot more moving parts.
Consider the citation list of a scientific publication. The links expressed by these lists are expressions of the author's intent- perhaps to support an assertion in the article, to acknowledge previous work, or to provide clarification or background. The cited item is described by metadata formatted so that humans can read and understand the description and go to a library to find the item. Here's an example:
D. C. Tsui, H. L. Störmer and A. C. Gossard, Phys. Rev. Lett. 48, 1559 (1982).CrossRef, the description could then be matched against CrossRef's huge database of article descriptions. If a match is found, the cited item description is bound to an article identifier, the DOI. For my example article, the DOI is
10.1103/PhysRevLett.48.1559The DOI provides a layer of indirection that's not found in Zemanta linking. While CrossRef binds the citation to an identifier, the identifier link,
http://dx.doi.org/10.1103/PhysRevLett.48.1559, is not bound to the target URL,
http://prl.aps.org/abstract/PRL/v48/i22/p1559_1until the user clicks the link. This scheme holds out hope that should the article move to a different URL, the connection to the citation can be maintained and the link will still work.
If the user is associated with a library using an OpenURL link server, another type of match can be made. OpenURL linkservers use knowledgebases which describe the set of electronic resources made available by the library. When the user clicks on on OpenURL link, the description contained in the link is matched against the knowledgebase, and the user is sent to the best-matching library resource. It's only at the very last moment that the intent of the link is bound to a target.
NISO standardization process for OpenURL spent a great deal of time in making the framework extensible, but the extension mechanisms have not seen the use that was hoped for.
The level of abstraction of NISO OpenURL is often cited as a reason it has not been adopted outside its original application domain. It should also be clear that many applications that might have used OpenURL have instead turned to Semantic Web and Linked Data technologies (Zemanta is an example of a linking application built with semantic technologies.) If OpenURL and CrossRef could be made friendly to these technologies, the investments made in these systems might also find application in more general circumstances.
I began looking at the possibilities for OpenURL Linked Data last summer, when, at the Semantic Technologies 2009 conference, Google engineers expressed great interest in consuming OpenURL data exposed via RDFa in HTML, which had just been finalized as a W3C Technical Recommendation. I excitedly began to work out what was needed (Tony Hammond, another member of the NISO standardization committee had taken a crack at the same thing.)
BIBO, an ontology for bibliographic data developed by Bruce D'Arcus and Frédérick Giasson, and decided it would not be fun. There's nothing terribly wrong with BIBO.
One of the nagging difficulties was that OpenURL-RDF required the use of "blank nodes", because of its philosophy of transporting descriptions of items which might not have URIs to identify them. When I recently described this difficulty to the OpenURL Listserv, Herbert van de Sompel, the "irresistible force" behind OpenURL a decade ago, responded with very interesting notes about "thing-described-by.org", how it resembled "by-reference" OpenURL, and how this could be used in a Linked Data friendly link resolver. Thing-Described-by is a little service that makes it easy to mint a URI, attach an RDF description to it, and make it available for harvest as Linked Data.
In the broadest picture, linking is a process of matching the intent of a link with a target. To accomplish that, we can't get around the fact that we're matching one description with another. A link resolver needs to accomplish this match in less than a second using a description squeezed into a URL, so it must rely on heuristics, pre-matched identifiers, and restricted content domains. If link descriptions were pre-published as Linked Data as in thing-described-by.org, linking providers would have time to increase accuracy by consulting more types of information and provide broader coverage. By avoiding the necessity of converting and squeezing the description into a URL, link publishers could conceivably reduce costs while providing for richer links. Let's call it "Linked Description Data".
KBART) of providing more timely, accurate and granular target descriptions. If they ever start to view the knowledgebase vendors as bottlenecks, the Linked Description Data approach may prove appealing.
Computers don't learn "just-in-time" or "just-in-case" the way humans do. But the matching at the core of making links can be an expensive process, taking time proportional to the square of the number of items (N2). Identifiers make the process vastly more efficient, (N*logN). This expense can be front-loaded (just-in-case) or saved till the last momemt (just-in-time), but opening the descriptions being matched for "when-there's-time" processing could result in dramatic advances in linking systems as a whole.