Tuesday, December 22, 2015

xISBN: RIP


When I joined OCLC in 2006 (via acquisition), one thing I was excited about was the opportunity to make innovative uses of OCLC's vast bibliographic database. And there was an existence proof that this could be done, it was a neat little API that had been prototyped in OCLC's Office of Research: xISBN.

xISBN was an example of a microservice- it offered a small piece of functionality and it did it very fast. Throw it an ISBN, and it would give you back a set of related ISBNs. Ten years ago, microservices and mashups were all the rage. So I was delighted when my team was given the job of "productizing" the xISBN service- moving it out of research and into the marketplace.

Last week,  I was sorry to hear about the imminent shutdown of xISBN. But it got me thinking about the limitations of services like xISBN and why no tears need be shed on its passing.

The main function of xISBN was to say "Here's a group of books that are sort of the same as the book you're asking about." That summary instantly tells you why xISBN had to die, because any time a computer tells you something "sort of", it's a latent bug. Because where you draw the line between something that's the same and something that's different is a matter of opinion and depends on the use you want to make of the distinction. For example, if you ask for A Study in Scarlet, you might be interested in a version in Chinese, or you might be interested to get a paperback version, or you might want to get Sherlock Holmes compilations that included A Study in Scarlet. For each  question you want a slightly different answer. If you are a developer needing answers to these questions, you would combine xISBN with other information services to get what you need.

Today we have better ways to approach this sort of problem. Serious developers don't want a microservice, they want richly "Linked Data". In 2015, most of us can all afford our own data crunching big-data-stores-in-the-cloud and we don't need to trust algorithms we can't control. OCLC has been publishing rather nice Linked Data for this purpose. So, if you want all the editions for Cory Doctorow's Homeland, you can "follow your nose" and get all the data you need.

  1. First you look up the isbn at http://www.worldcat.org/isbn/9780765333698
  2. which leads you to http://www.worldcat.org/oclc/795174333.jsonld (containing a few more isbns
  3. you can follow the associated "work" record: http://experiment.worldcat.org/entity/work/data/1172568223
  4. which yields a bunch more ISBNs.

It's a lot messier than xISBN, but that's mostly because the real world is messy. Every application requires a different sort of cleaning up, and it's not all that hard.

If cleaning up the mess seems too intimidating, and you just want light-weight ISBN hints from a convenient microservice, there's always "thingISBN". ThingISBN is a data exhaust stream from the LibraryThing catalog. To be sustainable, microservices like xISBN need to be exhaust streams. The big cost to any data service is maintaining the data, so unless maintaining that data is in the engine block of your website, the added cost won't be worth it. But if you're doing it anyway, dressing the data up as a useful service costs you almost nothing and benefits the environment for everyone. Lets hope that OCLC's Linked Data services are of this sort.

In thinking about how I could make the data exhaust from Unglue.it more ecological, I realized that a microservice connecting ISBNs to free ebook files might be useful. So with a day of work, I added the "Free eBooks by ISBN" endpoint to the Unglue.it api.

xISBN, you lived a good micro-life. Thanks.

4 comments:

  1. Following up on the conversation from twitter -- I'll agree, letting go of the xISBN service isn't that big of a deal. However, all the services built around the xId services -- those will really be missed. OCLC included a number of very handy API -- the xOCLC and xId services which would allow you to take an OCLC number and in one step find out those items that relate to it. When doing record merges -- this was a really, really handy resource as the data that you need doesn't always show up in the 019, 035, etc. It also made going from the OCLC number to the Work Id a trivial process. You can do this via WorldCat.org -- but the problem is that there seems to be a lag occasionally when data is merged and updated. I've found more than a couple times when data in the linked data elements point to a work id that no longer exists. This shouldn't happen, but it seems to.

    I realize that the problem OCLC had with the Xid services is that these weren't generated off of the production data in real time. This data was always stale, and getting the systems updated wasn't something that was done easily. But as I think about OCLC and their desire to be one of the switching points for libraries in the linked data work -- I see the deprication of these types of tools as a bad thing. I'm a firm believer that data should be as easy to resolve and infer meeting without need to have an MLS or intimate knowledge of the individual systems that are being referenced. Using the XId service -- I could take an obscure identifier (the oclc number) and get an identifier that was meaningful. Today, I have to know that I go to worldcat.org, ask for the json page, know that OCLC is embedding data in a linked data tagged block, know the tag that they are using to point to the work (which isn't self-describing) and then go get my data. If the goal is to make this data only accessible to people in libraries -- then shutting these services off is a great idea. If it's to make data easier for people outside of libraries to take our obscure "control numbers" and make some sense of the meaning -- this didn't do anyone any favors.

    But that is just my opinion.

    ReplyDelete
    Replies
    1. It's certainly a shame that OCLC is surrendering here. I've been thinking about ways to pick up the baton and make lightweight data services more sustainable. You'd think that xID services would be core components of the identifier infrastructure they're layered on top of, but it just doesn't happen. Perhaps the missing component is a "Github for data"; I've been toying with various ideas around that for longer than Github has existed. Maybe 2016 will see an awakening of the ideas that seemed so hopeful a decade ago.

      Delete
  2. When I follow your directions I don't find ANY ISBNs in the associated work record.

    ReplyDelete
    Replies
    1. yes, in fact the work record points at other oclc numbers which in turn have isbns. so for example, the first related record in the work points to http://experiment.worldcat.org/oclc/835898008.jsonld which gives you the isbn. As terry says, it's a hassle if all you want is isbns.

      Delete

Note: Only a member of this blog may post a comment.