Wednesday, July 14, 2010

What IS an eBook, Anyway?

One of my secret pleasures at American Library Association meetings is going to Standards sessions. Now before you think I have a completely hopeless case of nerdiness, let me explain myself.

There's never just one Standards session at ALA, there are at least two and often three or more. I'm not sure why, but I think it's because librarians feel that standards are Important, and because there are so many Standards in the library world that people forget which ones were the subject of a Standards session at the last meeting. Its not that librarians are interested in Standards, it's just that they have lots of data problems that might magically go away, if only there were a Standard. Or not.

Because there are so many session on Standards, each one tends to be sparsely attended. That's why I like them. You can go and sit in a room with some really smart and influential people (the panelists), ask them bizarre Standards questions, have some other really smart audience member join in the discussion, and feel like you're a member of some hidden clique of powerful numerologists.

One of the things that has the Standards people concerned this year is the way ISBNs are being applied to ebooks. At the session I went to, Brian Green, the Executive Director of the International ISBN Agency, was giving his standard ISBN Standards update. Brian has been doing this long enough that he expects and parries my pestering questions with aplomb.

So here's this year's burning question: How many ISBN's should be issued when ebooks are published in different formats? Should the ebook have the same ISBN as the print book? If a different ISBN, should different file formats get separate ISBNs?

And here's the burning answer from ISBN International: each ebook file format for a book should get its own ISBN:
Do different formats of an electronic or digital publication (e.g., .pdf, .html) need separate ISBNs?

Different formats of an electronic or digital publication are regarded as different editions and therefore need different ISBNs in each instance when they are made separately available
And here's the language of the Standard itself, (ISO 2108:2005) adopted through the international standards process in 2005:
Each different format of an electronic publication (e.g. ".lit", ".pdf", ".html", ".pdb") that is published and made separately available shall be given a separate ISBN.
So forgive me for having been confused in March, when I read that the “E-book ISBN Mess Needs Sorting Out,” Say UK Publishers. Why are the publishers still talking about this, more than ten years after the question was raised and thoroughly discussed? Why are we having panels at ALA to learn about this? Has the numeracy of the world's book industry been entirely depleted during ISBN's switch to 13 digits???

ISBN stands alone in the world of identifiers because of its widespread pre-internet adoption and success. Even the Internet Engineering Task Force set aside some URI space for it back in the days before "HTTP" became a religious invocation. But most people outside the book industry have had no idea of what it really identified- they usually think it identifies a book or perhaps a book version.

If the book industry had a Facebook profile, it would list its relationship with ISBN as it's complicated.  Consider ISBN 978-1593967574. It is a "Year 5 Harry Potter Bust" manufactured by Diamond Comics. It has no author, pages or even words; it is not a book in any sense. Yet it is well-behaved in the ISBN world, because it is (or was) an item distributed by the world's book supply chain to bookstores and ultimately consumers.

In the print world, it is more or less understood that a paperback has a different ISBN from the hardcover, which has a different ISBN from the library-bound version, and may have a different set of ISBNs when issued in a different country. At the deepest level, the ISBN is just a solution to a problem: "How does an item get tracked through the book supply chain?"

If you see a book on the shelves of a bookstore, you can be pretty sure that it got there through the "supply chain". Book publishers don't sell books to book stores, they mostly sell to distributors such as Ingram and Baker & Taylor. Bookstores use ISBNs to order books, and the distributors use the ISBN to report sales back to the publishers. When books don't sell, they get shipped back to warehouses, which track them using...ISBN.

When there's a question about whether a different ISBN should or should not be issued, the overriding principle is "a product needs a separate identifier if the supply chain needs to separately identify it." This clarity about the function of an ISBN is what has resulted in its overwhelming success. When people try to use the ISBN for other things, it's less successful. Supplemental services such as xISBN (which I helped put into production at OCLC), thingISBN, and emerging identifiers such as ISTC are useful for filling in the gaps between what ISBN really is and what people would like it to be.

Let's look at ebooks with the prism of the supply chain. If an ebook is issued in print, PDF and EPUB formats, it's important to the publisher to know how many of each are sold, thus the separate ISBN's. Similarly, if different DRM wrapping is used by two different channels, in many cases the publisher will need to track sales or manage the product separately. Although in many cases the DRM could be tracked by retailer, and thus wouldn't need a separate ISBN, the ISBN Standard says to give it a different ISBN. As Green has written previously,
Where publishers are selling e-books exclusively from their own websites or through another single channel and do not wish to have them listed in books in print databases then [...] publishers may not wish to bother with ISBNs. However, publishers should beware of taking a short-term view that makes them reliant on a single channel.
Unfortunately some publishers have obstinately refused to give separate ISBNs to ebooks in different formats. The US division of Random House is perhaps the most prominent example. There are excellent arguments for the "single ISBN" approach, but the worst possible situation for the emerging supply chain is for each publisher to use their own inconsistent rules for applying ISBN to ebooks. However strong the argument is for "single ISBN", its inconsistent application negates the advantages and threatens the ISBN system as a whole.

The ultimate problem with ISBN and ebooks is that ebooks are sufficiently adaptable that they expose  ambiguities and limitations of the ISBN identification architecture. For example, suppose you're in the business of selling customized digital coursebooks. You allow professors to choose 10 chapters from 100 available. That means there are exactly 17,310,309,456,440 different ebooks that you could sell. That's about 9,000 times more books than can be identified by all the ISBNs in the galaxy. But you don't need to give them ISBNs, because you sell direct and the ebooks never touch the supply chain. The chapters themselves may need to be tracked so you can pay author royalties, but you need only 100 ISBNs to do that.

How about if a retailer changes (or eliminates) the DRM wrapping an ebook? Do the ISBN's of the ebooks on a consumer's ebook reader magically change? (Transubstantiation is one of my favorite words!) The answer is no, and that's because the the supply chain is not involved.

Are there enough ISBNs for the ebooks that could be sold? The EPUB format is actually an archive file format that uses a dialect of XHTML for its insides, so you might imagine that any website or portion thereof can be packaged as an ebook. In fact, BookGlutton has a tool that (sort of) does this. As of May 2009, over 100 million websites operated, so you can easily imagine that ebooks could use up all available ISBN's almost overnight.

The "supply chain" for ebooks is rapidly mutating. The adoption of an "agency model" is an example of a change that has put new demands on ISBN; "agency" requires a retailer to identify an item's publisher before the moment of sale so that the correct sales tax can be applied. The agency model shift won't be the last or biggest change to the ebook supply chain, either. As one example, I've previously written about ebook pay-per-view and demand-driven acquisition. Another huge change would occur if  a substantial advertising revenue stream for ebooks, such as Apple's iAd system, emerges. Advertising would put new demands on reporting systems and thus on the ISBNs that enable them.

What is an ebook anyway? Ten years ago, a committee of the American Association of Publishers came up with this not-so-useful definition:
An ebook is a literary work in the form of a digital object consisting of one or more standard unique identifiers, metadata, and a monographic body of content, intended to be published and accessed electronically.
I'll bet you never realized that blog posts were really ebooks!

The truth is that we really have no idea what an ebook is or what it will become. There are certainly e-things that correspond to print books, and these are easy to recognize as ebooks. But don't be surprised if there comes a flood of things to read on our connected devices that are too long to be called "articles" or "posts". For these, "eBook" may be the best label we can come up with.

Unless of course they get shackled by a supply chain.
Enhanced by Zemanta


  1. Thanks for this.

    You will obviously understand the reason of my question: is there a a universal identifier for the _work_ itself? I just want to refer to, say, Mark Twain's Hucklebery Finn, regardless of its edition... (I guess that is the FRBR Work)? Clearly, ISBN does not do it...



  2. ISTC is intended to be a work identifier:

    If you want a URI for a work that is out there NOW, you could do worse than to use the OCLC Work ID, which is exposed through the xISBN service. for example, the Huck Finn OCLC Work ID is owi21660, which can be put into a URI like this: The API for this is part of "xOCLCnum":

  3. In the workshop FRBF and Identifiers at ELAG 2010 that I attended (see, ISTC was classified at the Expression level, not the Work level. For this we had ISWC, ISAN, OWI.

    In my own blog post I treat ebooks a bit differently. Depending on the nature of the content it can be a Manifestitation or an Expression in the existing FRBF model. But it can be much more compicated than that.

    I must say, it was only art ELAF that I saw the true, commercial, nature of the ISBN, as you clearly describe here.

  4. I fear that applying FRBR to ebooks might turn out to be like making layer cake out of pudding. But Lukas Koster's post does a good job of stirring up that pot. Also, I usually get work, expression and manifestation confused. That must be a manifestation of working too hard on my expression!

  5. The ISBN enabled machines to handle all the informational and transactional processes that involve the physical book via one unambiguous identifier for the physical object. It bridged the physical format and the electronic processes.

    When what you are transacting is an electronic object itself, the compelling need for that unambiguous identifier lessens dramatically. Elements in the transaction--even simply the name of the file--can convey file format, DRM data, price, duration, etc.

  6. Although xISBN has a certain post facto value, it seems it would be sensible for publishers/distributors to make connections between the various ebook ISBNs that are derived from the same text. My understanding is that epub is used as a base format and the other formats are derived from that digital file.

    And couldn't we apply URI hash extensions to ISBNs? Something like

    uri:isbn:382649382x#pdf (or #epub, or etc.)

  7. So let me get this straight: publishers need separate ISBNs to track sales per format...but publishers aren't interested in applying separate ISBNs. Am I the only one who sees a contradiction there? And why should there be any necessity to track a title based on DRM. Does a hardcover need a separate ISBN for the dust jacket?

    The fact of the matter is everyone is trying like mad to make an identifier work for something it was never intended to be applied to then coming up with reasons for why it needs to be done after the fact.

    As a publisher, if I wish to micromanage my ebook sales, I will do what any retailer does with products that come in multiple "formats": assign an in-house SKU for each item I want to track. The problem is that the industry has become so fixated on using ISBNs as identifiers they've developed tunnel vision.