Friday, October 14, 2016

Maybe IDPF and W3C should *compete* in eBook Standards

A controversy has been brewing in the world of eBook standards. The International Digital Publishing Forum (IDPF) and the World Wide Web Consortium (W3C) have proposed to combine. At first glance, this seems a sensible thing to do; IDPF's EPUB work leans heavily on W3C's HTML5 standard, and IDPF has been over-achieving with limited infrastructure and resources.

Not everyone I've talked to thinks the combination is a good idea. In the publishing world, there is fear that the giants of the internet who dominate the W3C will not be responsive to the idiosyncratic needs of more traditional publishing businesses. On the other side, there is fear that the work of IDPF and Readium on "Lightweight Content Protection" (a.k.a. Digital Rights Management) will be a another step towards "locking down the web". (see the controversy about "Encrypted Media Extensions")

What's more, a peek into the HTML5 development process reveals a complicated history. The HTML5 that we have today derives from a a group of developers (the WHATWG) who got sick of the W3C's processes and dependencies and broke away from W3C. Politics above my pay grade occurred and the breakaway effort was folded back into W3C as a "Community Group". So now we have two, slightly different versions of HTML, the "standard" HTML5 and WHATWG's HTML "Living Standard". That's also why HTML5 omitted much of W3C's Semantic Web development work such as RDFa.

Amazon (not a member of either IDPF or W3C) is the elephant in the room. They take advantage of IDPF's work in a backhanded way. Instead of supporting the EPUB standard in their Kindle devices, they use proprietary formats under their exclusive control. But they accept EPUB files in their content ingest process and thus extract huge benefit from EPUB standardization. This puts the advancement of EPUB in a difficult position. New features added to EPUB have no effect on the majority of ebook user because Amazon just converts everything to a proprietary format.

Last month, the W3C published its vision for eBook standards, in the form on an innocuously titled "Portable Web Publications Use Cases and Requirements".  For whatever reason, this got rather limited notice or comment, considering that it could be the basis for the entire digital book industry. Incredibly, the word "ebook" appears not once in the entire document. "EPUB" appears just once, in the phrase "This document is also available in this non-normative format: ePub". But read the document, and it's clear that "Portable Web Publication" is intended to be the new standard for ebooks. For example, the PWP (can we just pronounce that "puup"?) "must provide the possibility to switch to a paginated view" . The PWP (say it, "puup") needs a "default reading order", i.e. a table of contents. And of course the PWP has to support digital rights management: "A PWP should allow for access control and write protections of the resource." Under the oblique requirement that "The distribution of PWPs should conform to the standard processes and expectations of commercial publishing channels." we discover that this means "Alice acquires a PWP through a subscription service and downloads it. When, later on, she decides to unsubscribe from the service, this PWP becomes unavailable to her." So make no mistake, PWP is meant to be EPUB 4 (or maybe ePub4, to use the non-normative capitalization).

There's a lot of unalloyed good stuff there, too. The issues of making web publications work well offline (an essential ingredient for archiving them) are technical, difficult and subtle, and W3C's document does a good job of flushing them out. There's a good start (albeit limited) on archiving issues for web publications. But nowhere in the statement of "use cases and requirements" is there a use case for low cost PWP production or for efficient conversion from other formats, despite the statement that PWPs "should be able to make use of all facilities offered by the [Open Web Platform]".

The proposed merger of IDPF and W3C raises the question: who gets to decide what "the ebook" will become? It's an important question, and the answer eventually has to be open rather than proprietary. If a combined IDPF and W3C can get the support of Amazon in open standards development, then everyone will benefit. But if not, a divergence is inevitable. The publishing industry needs to sustain their business; for that, they need an open standard for content optimized to feed supply chains like Amazon's. I'm not sure that's quite what W3C has in mind.

I think ebooks are more important than just the commercial book publishing industry. The world needs ways to deliver portable content that don't run through the Amazon tollgates. For that we need innovation that's as unconstrained and disruptive as the rest of the internet. The proposed combination of IDPF and W3C needs to be examined for its effects on innovation and competition.

Philip K. Dick's Mr. Robot is
one of the stories in Imagination
Stories of Science and Fantasy
January 1953. It is available as
an ebook from Project Gutenberg
and from GITenberg
My guess is that Amazon is not going to participate in open ebook standards development. That means that two different standards development efforts are needed. Publishers need a content markup format that plays well with whatever Amazon comes up with. But there also needs to be a way for the industry to innovate and compete with Amazon on ebook UI and features. That's a very different development project, and it needs a group more like WHATWG to nurture it. Maybe the W3C can fold that sort of innovation into its unruly stable of standards efforts.

I worry that by combining with IDPF, the W3C work on portable content will be chained to the supply-chain needs of today's publishing industry, and no one will take up the banner of open innovation for ebooks. But it's also possible that the combined resources of IDPF and W3C will catalyze the development of open alternatives for the ebook of tomorrow.

Is that too much to hope?


