Sunday, July 31, 2011

Library Data Beyond the Like Button

"Aren't you supposed to be working on your new business? That ungluing ebooks thing? Instead you keep writing about library data, whatever that is. What's going on?"

No, really, it all fits together in the end. But to explain, I need to talk you beyond the "Like Button".

Earlier this month, I attended a lecture at the New York Public Library. The topic was Linked Open Data, and the speaker was Jon Voss, who's been applying this technology to historical maps. It was striking to see how many people from many institutions turned out, and how enthusiastically Jon's talk was received. The interest in Linked Data was similarly high at the American Library Association Meeting in New Orleans, where my session (presented with Ross Singer of Talis) was only one of several Linked Data sessions that packed meeting rooms and forced attendees to listen from hallways.

I think it's important to convert this level of interest into action. The question is, what can be done now to get closer to the vision of ubiquitous interoperable data? My last three posts have explored what libraries might do to better position their presence in search engines and in social networks using vocabulary and Open Graph Protocol. In these applications, library data enables users to do very specific things on the web- find a library page in a search engine or "Like" a library page in a Facebook. But there's so much more that could be done with the data.

I think that library data should be handled as if it was made of gold, not of diamond.

Perhaps the most amazing property of gold is its malleability. Gold can be pounded into a sheet so thin that it's transparent to light. An ounce of gold can be made into leaf that will cover 25 square meters.

There is a natural tendency to treat library data as a gem that needs skillful cutting and polishing. The resulting jewel will be so valuable that users will beat down library websites to get at the gems. Yeah.

The reality is that  library data in much more valuable as a thin layer that covers huge swaths of material. When data is spread thinly, it has a better chance of connecting with data from other libraries and with other sorts of institutions: Museums, archives, businesses, and communities. By contrast, deep data, the sort that focuses on a specific problem space, is unlikely to cross domains or applications without a lot of custom programming and data tweaking.

Here's the example that's driven my interest in opening up library linked data: At Gluejar, we're building a website that will ask people to go beyond "liking" books. We believe that books are so important to people that they will want to give them to the world; to do that we'll need to raise money. If lots of people join together around a book, it will be easy to raise the money we need, just as public radio stations find enough supporters to make the radio free to everyone.

We don't want our website to be a book discovery website, or a social network of readers, or a library catalog; other sites to that just fine. What we need is for users to click "support this book" buttons on all sorts of websites, including library catalogs. And our software needs to pull just a bit of data off of a webpage to allow us to figure out which book the user wants to support. It doesn't sound so difficult. But we can only support to or three different interfaces to that data. If library websites all put a little more structured data in their HTML, we could do some amazing things. But they don't, and we have to settle for "sort of works most of the time".

Real books get used in all sorts of ways. People annotate them, they suggest them to friends, they give them away, they quote them, and they cite them. People make "TBR" piles next to their beds. Sometimes, they even read and remember them as long as they live. The ability to do these same things on the web would be pure gold.


  1. What kind of structured data is your favorite, as a potential consumer? For embedding in library web pages.

    My problem is that there's too many 'standards', none of which are obviously that great to me as far as both power and convenience to both producer and consumer. Tell me what you prefer, and i'll try to add it to Blacklight though ( if it's not there already ).

  2. @bibwild I find the processing model for microdata to be a whole lot easier to implement than RDFa. I also think microdata is also quite a bit easier to publish. And if you go with vocabularies there is the SEO opportunity for search engines to provide more context to your stuff in search results.

    Microdata intentionally sacrificed some of the extensibility that is built into RDFa. The extensibility of RDFa is definitely useful in the context of the Web, but it also adds to some of the complexity in producing and consuming it. Facebook's Open Graph Protocol is pretty useful "profile" of RDFa for making your stuff show up nicely in that universe.

    I think the reality is that the Web is a heterogeneous information space, and depending on what you are trying to do you will want to use different "standards". Starting out with is probably a good place to start publishing or consuming though.