Thursday, June 2, 2011

EPUB Really IS a Container

"It's OK for libraries to put things in their EPUB books." That's what Bill Kasdorf, a member of the EPUB Working Group, told me last week at the IDPF Digital Book 2011 Meeting. He checked with EPUB Revision Co-Editor Markus Gylling to make sure. I had been curious if libraries could put all their cataloging information inside an EPUB file instead of siloing it in their catalog system.

It may seem an odd question if you don't know a few things about EPUB. EPUB is a standard format for ebooks. It's used by Apple, Barnes and Noble, Kobo, Overdrive and many others not named Amazon. EPUB is near the end of a revision process that will result in EPUB 3.0.

The EPUB specs define a lot more than just a file format. Both EPUB 2 and EPUB 3 define a container format (in EPUB 3 it's called the EPUB Open Container Format (OCF) 3.0, and then go on to define a number of file formats for files that go inside this container. These files are the resources- texts, graphics, etc. that make up the ebook.

OCF uses the ubiquitous ZIP format to wrap up all a book's resource files into a neat, transportable package. That's pretty much standard these days. Java ".jar" and ".war" files use the same mechanism, as do MacOS' ".app" files.  As a consequence, you can use any unzip utility to look inside an EPUB file and manipulate its contents.

There's even a reserved name for a file to contain book level metadata in OCF: META-INF/metadata.xml, as well as another file for rights information, META-INF/rights.xml. Another file, META-INF/signatures.xml can be used to prove who made parts of the file and determine whether anyone has mucked with them. When Gluejar issues Creative Commons editions of newly relicensed works, we'll use the rights.xml file to make sure the CC declaration is explicit.

The new EPUB revision is coming fast. Last Monday, Bill McCoy, Executive Director of the International Digital Publishing Forum (IDPF) announced the release of the full EPUB 3 proposed specification. My guess is that when we look back on this event 10 years hence, we'll recognize this as the moment EPUB began to revolutionize the world of information, and with it, the book industry.

Although Amazon still uses the aging MOBI format on its kindle devices, it seems only a matter of time before the infrastructure accumulating behind EPUB pushes them into the embrace of the IDPF. Already, most of the content flowing into the Amazon system is being produced in EPUB and converted to MOBI. Don't expect this shift to happen soon though; in his IDPF presentation, Joshua Tallent of eBook Architects described rumors that this would happen soon as "bunk"- but it will happen sometime.

EPUB 3 comes with lots of goodies. The revision adds several modules of sorely needed capability. It includes MathML, SVG and JavaScript over a substrate of HTML5 and CSS2.1. While MathML and SVG are essential for education and technical markets, JavaScript has been somewhat controversial because of the difficulty of making sure things work securely and without connections. Most of the reading systems inherit javascript capability from the WebKit rendering engine they're based on, so a lot of javascript functionality will work in ebook readers regardless.

(left) Autography Founder and Author  T. J. Waters
All this capability will remain latent unless people find compelling uses for it. I'm not worried. As the BookExpo itself got started, I met two different companies who were manipulating ebook files to solve the same problem: how can an author sign a book when the book is digital? Both companies, Autography and InScribed Media, create personalized experiences that leave artifacts of an author-consumer interaction inside ebook container files. Both of these companies have compelling solutions; they differ in their business models. Autography is structured as an author focused bookstore; InScribed is developing partnerships with existing bookstores.

InScribed Media Founder and Author Alivia Tagliaferri
To some extent, InScribed and Autography are forced to be a bit convoluted in the way they deliver their product because they need to live inside DRM green zones; users don't have access to the files inside books without cracking the DRM (which is rather easy, by the way!). It's unfortunate, because personalization of ebooks could be a good way to encourage responsible use. I certainly don't want that picture of me torrenting around the world!

Libraries face a similar dilemma. The insides of an EPUB file could be greatly enriched by  libraries, which have every motivation to enhance discovery both of the book and the information inside of it. But DRM gives the publisher and its delivery agents the exclusive ability to build context inside ebook containers. Libraries and readers are locked out. I think that for DRM systems to survive they will need to accommodate a more diverse set of user manipulations; author signatures are just the tip of the iceberg.

Coming soon, I'll report on EPUB 3 metadata.
Enhanced by Zemanta


  1. Thought-provoking as usual, Eric.
    There is an industry now built around enhanced book metadata for libraries (OCLC, Serials Solutions, Bowker, etc.) This would be a fantastic opportunity for publishers to reclaim some of this ground if not for the fact that (imho) .epub is not the best delivery vehicle for libraries in the longer term. Libraries need to focus not on objects, but rather on access, which argues for online delivery, where metadata association is perhaps easier for both libraries and publishers (or at least aggregators.)

  2. I'm trying to wrap my head around the idea of an ebook as a container of files (or as you have already put it, books as software objects).

    Using music as an comparison, it's as if when one buys "a song" one could get the mp3, the lyric sheet, the album cover, *and* the music video all in one package.

    I'm curious what the boundaries of the EPUB 3.0 format are. Could a music group release a song using EPUB? Because EPUB incorporates HTML5, could we envision it as a "play anywhere App"?

    I'm not saying we should do these things... I'm just curious if it could.

  3. Mita, could, but not should! EPUB would be a bad format for sheet music. Also, the "supply chain" has imposed some arbitrary size restrictions on EPUB files that the video would break.

  4. I completely agree with the notion of ebook as a container. I dug up a rant I wrote about it on a former blog

    The boundaries of said container are arbitrary and will need to be negotiated between creators and human readers.

    I'm very much looking forward to your post on metadata within EPUB
    As to the notion of keeping the metadata with the object it references, it's really the best hope we have in terms of keeping context and allowing for authenticity in digital preservation see Michael L. Nelson's PhD thesis Buckets: Smart Objects for Digital Preservation - he calls it smart objects/dumb archives.


Note: Only a member of this blog may post a comment.