Sunday, June 27, 2010

Global Warming of Linked Data in Libraries

Libraries are unusual social institutions in many respects; perhaps the most bizarre is their reverence for metadata and its evangelism. What other institution considers the production, protection and promulgation of metadata to be part of its public purpose?

The W3C's Linked Data activity shares this unusual mission. For the past decade, W3C has been developing a technology stack and methodology designed to support the publication and reuse of metadata; adoption of these technologies has been slow and steady, but the impact of this work has fallen short of its stated ambitions.

I've been at the American Library Association's Annual Meeting this weekend. Given the common purpose of libraries and Linked Data, you would think that Linked Data would be a hot topic of discussion. The weather here has been much hotter than Linked Data, which I would describe as "globally warming". I've attended two sessions covering Linked Data, each attended by between 50 and 100 delegates. These followed a day long, sold-out  preconference. John Phipps, one of the leaders in the effort to make library metadata compatible with the semantic web, remarked to me that these meeting would not have been possible even a year ago. Still, this attendance reflects only a tiny fraction of metadata workers at the conference; Linked Data has quite a ways to come. It's only a few months ago that the W3C formed a Library Linked Data Incubator Group.

On Friday morning, there was an "un-conference" organized by Corey Harper from NYU and Karen Coyle, a well-known consultant. I participated in a subgroup looking at use cases for library Linked Data. It took a while for us to get around to use cases though, as participants described that usage was occurring, but they weren't sure what for. Reports from OCLC (VIAF) and Library of Congress ( both indicated significant usage but little feedback. The VIVO project was described as one with a solid use case (giving faculty members a public web presence), but no one from VIVO was in attendance.

On Sunday morning, a meeting of the Association for Library Collections and Technical Services (ALCTS), Rebecca Guenther, Library of Congress, discussed, a service that enables both humans and machines to programatically access authority data at the Library of Congress. Perhaps the most significant thing about is not what it does but who is doing it. The Library of Congress provides leadership for the world of library cataloguing; what LC does is often slavishly imitated in libraries throughout the US and the rest of the world. started out as a research project but is now officually supported.

Sara Russell-Gonzalez of the University of Florida then presented the VIVO which has won a big chunk of funding from the National Center for Research Resources, a branch of NIH. The goal of VIVO is to build an "interdisciplinary national network enabling collaboration and discovery between scientists across all disciplines." VIVO started at Cornell and has garnered strong institutional support there, as evidenced by an impressive web site. If VIVO is able to gain similar support nationally and internationally, it could become an important component of an international research infrastructure. This is a big "if". I asked if VIVO had figured out how to handle cases where researchers change institutional affiliations; the answer was "No". My question was intentionally difficult; Ian Davis has written cogently about the difficulties RDF has in treating time-dependent relationships. It turns out that there are political issues as well. Cornell has had to deal with a case where an academic department wanted to expunge affiliation data for a researcher who left under cloudy circumstances.

At the un-conference, I urged my breakout group to consider linked data as a way to expose library resources outside of the library world as well as a model for use inside libraries. It's striking to me that libraries seem so focused on efforts such as RDA, which aim to move library data models into Semantic Web compatible formats. What they aren't doing is to make library data easily available in models understandable outside the library.

The two most significant applications of Linked Data technologies so far are Google's Rich Snippets and Facebook's Open Graph Protocol (whose user interface, the "Like" button, is perhaps the semantic webs most elegant and intuitive). Why aren't libraries paying more attention to making their OPAC results compatable with these application by embedding RDFa annotations in their web-facing systems? It seems to me that the entire point of metadata in libraries is to make collections accessible. How better to do this than to weave this metadata into peoples lives via Facebook and Google? Doing this will require the dumbing-down of library metadata and some hard swallowing, but it's access, not metadata quality, that's core to the reason that libraries exist.

Enhanced by Zemanta


  1. Could you elaborate on what you mean by dumbing down of library metadata and hard swallowing? Could you provide some specific examples?

  2. Example, here is the library view of the current Daily Show host:

    000 00000cz a2200000n 45 0
    001 oca04708549
    005 20070228120224.0
    008 980417n| acannaab |a aaa c
    010 |ano 98079562
    040 |aIAhCCS|cIAhCCS|dDLC
    100 1 |aStewart, Jon,|d1962-
    400 1 |aLeibowitz, Jonathan Stewart,|d1962-
    670 |aElmo palooza, 1998:|bcredits (Jon Stewart)
    670 |aInternet movie database, Apr. 16, 1998|b(Jon Stewart, b. Jonathan Stewart Leibowitz, Nov. 28, 1962; actor)
    670 |aNaked pictures of famous people, c1998:|bCIP t.p. (Jon Stewart)
    999 |a9079

    Here's how anyone outside a library would present that metadata: "Jon Stewart"

  3. VIVO's funding is from the National Center for Research Resources, a branch of NIH.

  4. Thanks for spotting the omission, Chris.