Thursday, July 2, 2009

Linked Data Heresy? Under the Hood at AdaptiveBlue

Have you ever watched a web server log? Thirteen years ago, I was starting up a scientific e-journal, and it was very gratifying to watch the monitor and see the traffic coming in from all over the world. Occasionally I would turn on the referrer log to see where people were coming from. One time, I was surprised to see that somebody in Poland was coming to my e-journal site from a russian web site with "xxx" in the URL. Curious about what sort of site might be linking to my e-journal, I checked out the site, and found it to be about blond, naked women. I wasn't sure about what this indicated about my e-journal. Perhaps the Polish scientists found the e-journal and the xxx site equally stimulating? Perhaps their boss had just walked into the room, and they needed a work-oriented internet site to cover their other browsing?

My perspective on the privacy of my internet browsing changed that day. I've become mildly paranoid about things that might spy on me. I am very selective about the Facebook apps that I load, for example, but I don't bother to flush my browsing history or block web bugs or things like that. I enjoyed finding out "what Google knows about me" (post it to Facebook and tag your friends to do the same!). I really worry about Firefox extensions (or "Add-ons"), because I know how extremely powerful and/or intrusive they can be. Even so, the 3 or 4 things I add to Firefox are the main reason I don't use Safari, despite its integration advantages. I'm not surprised that IE and Safari have declined to support practical extension mechanisms; they're sort of scary. On the other hand, Firefox Add-ons have presented very few spyware-related problems; this is due in part to the fact that they must be written in Javascript and delivered as source. It's relatively easy to go and open an Add-on and inspect its code, so if an Add-on does something other than what it says it does, it's likely that sooner or later someone will discover the truth.

A really interesting Firefox Add-on called "Glue" is being offered by a venture-funded company called AdaptiveBlue. (no relation whatsoever to my company, Gluejar, Inc.) Glue watches you browse the internet and when it sees you on one of a set of sites that it knows about, it reports the pages you're on to AdaptiveBlue, enabling them to construct a "Social Network of Things", where the Things might be Books, Music, Products, Wine, Companies, etc.

Image representing AdaptiveBlue as depicted in...

Overall there are over 300 sites that the Glue Add-on does something with. A lot goes on in Glue, and I didn't take the time to sort everything out. For example, when you go to a topic page in Wikipedia or a book page in WorldCat, or a stock page in Yahoo Finance, the url that you visited is reported to AdaptiveBlue. Usually, the Add-on then slides down a Glue header which tells you about what the Glue Social Network thinks about the Thing you are looking at. Personally, I find this very distracting, and I don't plan to continue using Glue, but I can imagine that many people will appreciate the consistent interface to the social network and other services that is presented. Other sites handled by glue include LibraryThing, Epicurious, Last.fm, ESPN, theStreet, ToysRUs, Expedia, GameSpy, Metacritic, WineLibrary, Flixster, Connotea, Flickr, Technorati, Walmart and eBay, just to name a few. It was very difficult to find the official list of sites that Glue works with on the GetGlue web site; I wish the AdaptiveBlue people were more upfront about exactly what they do on these sites. Nonetheless, the Add-on appears to do what it says it does. I also would like to see the user given more control over the sorts of things that are reported to AdaptiveBlue- I'm much more relaxed about sharing my Wine and Sports browsing than I am about my Wikipedia and Stocks browsing. And I really don't want to share my Russian XXX site browsing!

It's interesting to compare Glue to the OpenURL linking services that have been almost universally adopted in libraries. (I developed one of the first OpenURL link servers, which is now owned by OCLC, Inc.) Like Glue, the OpenURL link servers present users with relevant information and links to services surrounding "things" which are typically journal articles or books. One library that I worked with even used a social network to connect users to other users who had viewed the same item, just like Glue. There was even a Firefox Add-on developed that routed "thing" links to link servers. The link server vendor community worked with publishers closely to enable OpenURL linking; although AdaptiveBlue promotes its "SmartLinks", I doubt that many of the sites Glue is aware of understand what they are doing.

Glue makes heavy use of Amazon web services, including the product information web service, the SimpleDB service and the S3 simple storage service. It's smart these days to outsource scalability and concentrate on your application's functions. Glue also makes nice use of the Dojo and Mochikit Javascript toolkits. In browsing the code, I noticed that many of the problems it addressed were exactly the same ones we encountered developing Linkbaton 9 years ago, and the solutions look quite similar (in otherwords, I think the developers have done a pretty good job!) except that the tools available today are so much more advanced than what we had to work with 9 years ago.

Given that AdaptiveBlue makes a big deal about the Semantic-ness of its technology, I was surprised to find out how it identifies "Things". The canonical way to identify a Thing on the semantic web is to give it a URI, and then attach properties to it. When I spoke with AdaptiveBlue founder and CEO Alex Iskold at the Semantic Technology Conference, he told me that they only use title and author strings to define book Things. In fact, they bundle these strings into keys (such as books/cryptonomicon/neal_stephenson), then use the keys as if they identified a book, when in the real world, it's more complicated. So the "Things" in the AdaptiveGlue "Social Network of Things" are entities that do not correspond to books, but rather correspond to descriptions of books. Interestingly, this is exactly the approach taken in OpenURL URI's, which are really descriptive metadata packages, not entity URI's.

The first of Tim Berners-Lee's "Four Rules" for Linked Data is "Use URIs as names for things". Both Glue and OpenURL, which were designed separately as practical solutions for linking to things, seem to break this rule. Instead they build URIs using descriptions of the things, and don't bother naming the things themselves. Maybe Tim BL's first rule is wrong!

4 comments:

  1. I'm currently working on a project at the Open University (http://www.open.ac.uk/telstar) which is looking at how we integrate references (generally bibliographic) into our teaching and learning environment - mainly online.

    One of the major questions for us is how we should link from a reference to an electronic copy of the resource (if available). I've worked with OpenURL since very early on, and my first instinct is to say that if we have the structured bibliographic data for a reference, then we can form the OpenURL and push the resolution to our resolver. This gets around the problem (as OpenURL was designed to do) of the library changing it's subscriptions over time without having to go back to the course material.

    However, I've also reflected that OpenURLs seem to be at odds with a linked data approach - where it seems we should be providing a link to the thing directly.

    Added to this is the question of using DOIs (which is actually the preferred method of linking to e-journal articles by our team of Teaching and Learning Librarians). Does the DOI identify the resource, or the description of that resource? Certainly an example like http://dx.doi.org/10.1144/0016-76492006-123 which then offers two possible sources for the actual article tells me that this URI is a link to the description.

    I'm interested in whether the project can comment on this problem, and if you have any pointers that might help me make sense of how OpenURLs (and other similar mechanisms) might fit into a linked data world then let me know!

    ReplyDelete
  2. Good post Eric!

    So we actually have URIs, http://getglue.com/books/$object_id is the URI in our system. In a little bit you are going to be able to fetch full RDF description of it via linked data.

    We are emmiting shorter keys for now just to keep things compact.

    Alex

    ReplyDelete
  3. I'm with Alex. Great post! Thanks!

    ReplyDelete
  4. Alex,

    That's great that you have URIs, important from a SEO perspective, but the deep question is what do the URIs identify? It won't be the book, but rather a description of one or more books. The distinction may be a bit talmudic, and it's very much the practical thing to do, but these sorts of URI's are iron nails for the 4-rules linked data architecture. But it's not your problem, I think.

    Eric

    ReplyDelete

Note: Only a member of this blog may post a comment.