Tuesday, May 30, 2017

Readium's New Licensed Content Protection May Result in Better Reader Privacy

Libraries offering ebook lending are between a rock and a hard place. They know in their heart of hearts that digital rights management (DRM) software is evil, but not allowing users to borrow the ebooks they want to read is not exactly the height of virtue. Saintly companies like Amazon will be happy to fill the gaps if libraries can't lend ebooks. The fundamental problem is that "borrowing" is a fiction, a conceptual construct, when applied to the ones and zeroes of a digital book. An ebook loan is really a short-term license. Under today's copyright law, a reader must have a license to read an ebook, and ebook rights-holders don't trust users to adhere to short-term licenses without some sort of software to enforce the license.

Unless the rock becomes a marshmallow, libraries that want to improve the ebook lending experience are hoping to make the hard place a bit softer. The most common DRM system used in libraries is run by Adobe. Adobe Content Server (ACS) is used by Overdrive, Proquest, EBSCO and Bibliotheca's Cloud Library. Adobe Content Server is a hard place for libraries in two ways. First, a payment must be made to Adobe for every lending transaction processed through ACS. Second, use of ACS affects reader privacy. When ACS first came out, Adobe got to know the identity of every borrower. Adobe says this about these records:
"Adobe keeps internet protocol (IP) address logs related to Adobe ID sign-ins for 90 days"
I wish they also said they destroyed these logs. Their privacy policy says:
"Your personal information and files are stored on Adobe’s servers and the servers of companies we hire to provide services to us. Your personal information may be transferred across national borders because we have servers located worldwide and the companies we hire to help us run our business are located in different countries around the world."
... and generally says that reader should trust Adobe to not betray you.

Thanks in part to demand from libraries and the companies that serve them, Adobe changed ACS so that borrower identities could be de-identified by intermediaries such as Overdrive. So instead of relying on Adobe's sometimes lax privacy protections, libraries could rely on vendors more responsive to library concerns. But still, the underlying DRM technology was designed to trust Adobe, and to distrust readers. Its centralized architecture requires everyone to trust participants closer to the center. A reader's privacy requires trust of the library or bookstore, which in turn have to trust a vendor, who in turn have to trust Adobe.

This state of affairs has been the motivation for the Readium Foundation's new DRM technology, called Readium Licensed Content Protection (LCP). LCP's developers claim that it offers libraries a low cost way to improve the library ebook lending experience while providing readers with the privacy assurances they expect from libraries. In addition, Readium describes LCP as Open Source... except for a few lines of code. To understand LCP, and to see if it delivers on the developer's claims, I took a close look at the recently released spec. The short description of what I found is that it can do what it claims to do... but everything depends on the implementation. Also, DRM may be a Hofstadter-Moebius loop.

Now for the longer description:

Every DRM system uses encryption and secrets. Centralized DRM systems such as ACS keep a centralized secret, and use that secret to generate, distribute and control keys that lock and unlock content. LCP takes a somewhat different approach. It uses two secrets to lock and unlock content, a user secret and and ecosystem secret. An "ecosystem" is all the libraries, booksellers, and reading system vendors who agree to interoperate. Any software that knows the ecosystem secret can combine it with a user's secret to unlock content that has been locked for a user. This way multiple content providers in an ecosystem can independently lock content for a user- there's no requirement for a central key server.

The LCP DRM system has some interesting usability and privacy features. If you want to read on several devices, you just need to remember your encryption secret, and you can move files from one device to another. If you want to share an ebook with a family member or close friend, that's ok too, as long as you're comfortable sharing your encryption secret. If you want to read anonymously, can have have a trusted friend borrow the book on your behalf. But to get publisher buy-in for these usability features, the system has to have a way for content providers to limit oversharing. Content providers don't want you to just post the file and the password on a pirate file-sharing service. So ecosystem software applications are required to "phone home" with a device identifier and license identifier when they are connected to the internet.

As you might imagine, the LCP phone-home information could have an impact on reader privacy, depending on the implementation. So for example, if you borrow a book from the library, and your reader app contacts the library to say you've opened the book, your privacy is minimally impacted since the library already knows you borrowed the book. But if the phone-home transaction is unencrypted, or if it contains too much information, then your employer might be able to find out about the union-organizer book you're reading. If the libraries or booksellers can aggregate all their phone-home logs, then your detailed reading profile could be compiled and exploited. Or if users are not permitted to select their own encryption secret, it might be much harder to read a book anonymously. (Note: my suggested changes for improving these parts of the spec were accepted by the spec's authors.) But if everything is implemented with a view to reader privacy, LCP should offer much better reader privacy than possible with existing systems.

There's some bad news, however. Because the ecosystem secret has to be protected, the openness of the reader software is not quite what it seems. The code will need to be obfuscated before distribution, and the secret will only be available to developers and to distribution channels that are willing and able to "harden" their software. If you want to fork the software to add a feature, your build will not be able to unlock ecosystem content until the ecosystem overlords deign to approve your changes. So don't expect reader software with lots of plugins and options. Don't expect a javascript web-reader.

The code obfuscation raises another issue: it will be difficult to audit reader software to make sure it doesn't harbor spyware, even if the source code is open (except for the ecosystem secret). You still have have to trust app provider, your library and the people who sell you books. But it's hard to get far without trusting somebody, so this isn't a new problem, and when was the last time anyone audited library software? And because the ecosystem overlords distribute the ecosystem secrets to trusted developers, the topology of trust and accountability is very different from Adobe's centralized system.

If you didn't like that bad news, that cloud may have a silver lining, or maybe a lead lining, depending on your perspective. If LCP becomes widely used, the ecosystem secret will inevitably leak, and an anti-ecosystem could form. There will be a Calibre plugin to strip encryption. There will be grayware that does everything that the ecosystem software isn't permitted to do. And it might even be sort-of legal to use. Library ebook lending might flourish. Or collapse. Because in the end, ebook lending requires trust to flow in both directions; while it's not perfect, LCP is a baby step in the direction of mutual trust between readers and content providers.

In Stanley Kubrick's 2001: A Space Odyssey, the computer HAL 9000 goes insane. The reason:
HAL's crisis was caused by a programming contradiction: he was constructed for "the accurate processing of information without distortion or concealment", yet his orders, directly from Dr. Heywood Floyd at the National Council on Astronautics, required him to keep the discovery of the Monolith TMA-1 a secret for reasons of national security. This contradiction created a "Hofstadter-Moebius loop", reducing HAL to paranoia. 
Readium LCP software is sort of like HAL 9000. It's charged with opening up information to readers, with expanding minds everywhere, transporting them to worlds of new knowledge and imagination, yet it must work to keep a secret and prevent users from doing things that copyright owners don't want them to do. Let's hope that the P in LCP doesn't stand for "Paranoia".