Wednesday, June 19, 2013

Book Metadata Under a Bushel

Full story at the Verge
They don't allow witnesses, spectators or journalists to carry cell phones or kindles or iPads into the Federal Courthouse in New York. But books are OK. So every publishing executive at the iBookStore antitrust trial carries a book with them instead. For example, The Verge spotted Penguin's David Shanks sporting Robert B. Parker's Wonderland . The press takes a picture, and the next day the book, which just so happens to be an exciting new release, gets its cover onto the front page of the business section, not to mention Go To Hellman.

This opportunistic book publicity reminded me of the biblical parable:
No man, when he hath lighted a candle, putteth it in a secret place, neither under a bushel, but on a candlestick, that they which come in may see the light. Nor doth a scroll seller speak its name so no man canst hear. Nay, he shouteth from high mountain tops the holy numbers of the scroll.
- Luke 11:33 (more or less).
So you would think that book publishers would also be spreading metadata for their books far and wide, and would make it as easy as possible for developers to propagate the word. But the tyranny of "the way we've always done things" still holds sway in that world. And so, the HarperCollins OpenBook API and the BookSmash developer competition, which I ranted about in my last post, need to be understood as the positive steps they are. They are opportunities for publishers and developers to engage in ways that aren't chiseled in stone.

For my part, I've been engaging with some very helpful people at HarperCollins. Together, we found some documentation issues that had me unsure about the resources being offered to challenge participants.

First of all, the entire text of the 196 books listed in the resources spreadsheet are being made available. This is very cool. Also, 20% samples of all EPUB books in the HarperCollins catalog are available through the standard API.

Hints:
  • If you're participating in the challenge, you need to use a different endpoint than the one offered by the API demo tool to get un-truncated text. Yes, you copy the url it gives you (host name "diner") and replace the endpoint url with one reported in the text on the demo tool (host name "api").
  • If you want to use the catalog API to get ISBNs to use in the content API, note that only books/ISBNs with Sub_Format='EPUB' have preview content associated with them.
  • The API does request throttling in a funny way. If you make too many requests in a short period of time, the API tells you "Developer Inactive". That result seems to get stuck in a server-side cache.
  • The HC people seem eager to improve the API, so don't hesitate to report issues in their forums. If you've ever developed an API, you know that you have to whack at it a bit to get things right.
If you play with this API a bit, it'll be pretty obvious to you that "building an API" is not the way things have always been done in the book industry. Here's how things are done: Publishers cause ONIX XML files that describe their books to come into existence. These files are shipped to "trading partners". The reason, more or less, that the publishers do this is because way back when, Amazon forced them to do it that way instead of the horrible old ways they used to do things.

So the reason that the HarperCollins API, and others like it, are significant, is not because they'll be useful in their current form. It's because big publishers have realized that getting bossed around by Amazon might not be a smartest thing to do, and maybe having more direct relationships with developers would be a good idea.
Enhanced by Zemanta

0 comments:

Contribute a Comment

Note: Only a member of this blog may post a comment.