Tuesday, June 25, 2013

Magic Rights Management for eBooks

The Fraunhaufer Institute in Germany is apparently marketing some "new" technology they're calling SiDiM which embeds digital data into texts by changing word in the text. They're telling publishers that it will fight piracy by making it easy to track files uploaded to torrents and file lockers back to their reprehensible sources.

It sounds kinda dumb, doesn't it?

The internet has over-reacted of course, perhaps everyone is hypersensitive because of the revelations about the NSA and its data collection practices. Nick Harkaway jumped the shark a bit and called it "surveillance". The idea of changing words in books is easy to ridicule and deserves to die, but let's please take a deep breath.

There are lots of ways to put information in an ebook file, and license information is no different. For example, I've been advocating that Creative Commons licensed books should embed a digitally signed license so that the license can be relied upon. When you buy an ebook, an embedded license could protect you from accusations of infringement. Digital signatures can also tell you that a books content hasn't been tampered with.

When you buy a Harry Potter ebook direct from Pottermore, your identifying information gets digitally stamped into the ebook. According to the The Digital Reader, Pottermore uses watermarking technology from Booxtream, and I've been evaluating this technology myself for an Unglue.it project. So far, I'm impressed.

The rationale behind Pottermore's watermarking is that it prevents people from sharing the book beyond what their license allows. If the book gets on a public filesharing site, it can be traced back to the purchaser, and consequences could ensue.

Booxtream claims to be using 9 different watermarking techniques to make the embedded data hard to remove. For example, Booxtream adds digital codes to the names of the content files inside the EPUB, and adds data into image files. Although it's straightforward to strip some of the embedded info, Booxtream needs only to make it uncertain that stripping has been complete to retain some deterrence value.

For the user, the bottom line is that nothing the purchaser does or would want to do is impeded by the Booxtream watermarking. Nothing visible to the user is altered except for an ex libris page that tells the user that the copy has been personally licensed to him or her- it's customizable by the vendor.

A close analogy to ebook watermarking is the bullet serialization that's been proposed as an alternative to gun control. If every bullet was traceable to a purchaser, investigation of weapons related crime would be reduced to finding the bullet and looking it up in a database. Law abiding gun owners shouldn't notice the difference. Or maybe it would be de facto ban on ammunition. YMMV.

The argument against digital watermarking is that there will always be ways to remove the embedded data, no matter how clever you are at hiding it. Someone will make a one-click watermark stripper, and the value in watermerking will be diluted. But almost two years after Pottermore launched their digitally watermarked ebooks, it's quite hard to find watermark stripping tools. But why would anyone bother? There's nothing that the vast majority of ebook purchasers want to do that's impeded by the watermarking. Contrast that with the ease of finding tools to strip PDF watermarks, which are annoying.

You might wonder why SiDiM would be selling their technology with such a scary-dumb sounding marketing pitch. It's because publishers are the customers. I've been talking to a lot of publishers, and they're very clear that they want DRM. Or at least they THINK they want DRM. What they really want is magic. The want their ebooks to come with a magic bullet that stops piracy and over-sharing dead in its tracks. They don't understand the technology behind DRM, but many of them swallow the story that comes with it- that nobody would pay for digital files if they can get them for free from piracy sites.

The truth is that if there's magic in the kind of DRM that comes with Adobe, Apple and Kindle, it's of the variety that Voldemort would use. If there's magic in the watermarking techniques used by Pottermore, it's of the Dumbledore variety. If there's magic in SiDiM, it's like Neville Longbottom's Switching Spell that put ears on a cactus.

I'm here to tell you that magic is real. There's real magic in the stories that authors tell. There's real magic in communities and in relationships between people, between authors and readers. There's real magic in libraries. It's that real magic that will stop piracy and help authors earn a good living in the digital future.

Dumbledore's fictional magic can help make the real magic manifest, and thats what we should work towards
Enhanced by Zemanta

Wednesday, June 19, 2013

Book Metadata Under a Bushel

Full story at the Verge
They don't allow witnesses, spectators or journalists to carry cell phones or kindles or iPads into the Federal Courthouse in New York. But books are OK. So every publishing executive at the iBookStore antitrust trial carries a book with them instead. For example, The Verge spotted Penguin's David Shanks sporting Robert B. Parker's Wonderland . The press takes a picture, and the next day the book, which just so happens to be an exciting new release, gets its cover onto the front page of the business section, not to mention Go To Hellman.

This opportunistic book publicity reminded me of the biblical parable:
No man, when he hath lighted a candle, putteth it in a secret place, neither under a bushel, but on a candlestick, that they which come in may see the light. Nor doth a scroll seller speak its name so no man canst hear. Nay, he shouteth from high mountain tops the holy numbers of the scroll.
- Luke 11:33 (more or less).
So you would think that book publishers would also be spreading metadata for their books far and wide, and would make it as easy as possible for developers to propagate the word. But the tyranny of "the way we've always done things" still holds sway in that world. And so, the HarperCollins OpenBook API and the BookSmash developer competition, which I ranted about in my last post, need to be understood as the positive steps they are. They are opportunities for publishers and developers to engage in ways that aren't chiseled in stone.

For my part, I've been engaging with some very helpful people at HarperCollins. Together, we found some documentation issues that had me unsure about the resources being offered to challenge participants.

First of all, the entire text of the 196 books listed in the resources spreadsheet are being made available. This is very cool. Also, 20% samples of all EPUB books in the HarperCollins catalog are available through the standard API.

Hints:
  • If you're participating in the challenge, you need to use a different endpoint than the one offered by the API demo tool to get un-truncated text. Yes, you copy the url it gives you (host name "diner") and replace the endpoint url with one reported in the text on the demo tool (host name "api").
  • If you want to use the catalog API to get ISBNs to use in the content API, note that only books/ISBNs with Sub_Format='EPUB' have preview content associated with them.
  • The API does request throttling in a funny way. If you make too many requests in a short period of time, the API tells you "Developer Inactive". That result seems to get stuck in a server-side cache.
  • The HC people seem eager to improve the API, so don't hesitate to report issues in their forums. If you've ever developed an API, you know that you have to whack at it a bit to get things right.
If you play with this API a bit, it'll be pretty obvious to you that "building an API" is not the way things have always been done in the book industry. Here's how things are done: Publishers cause ONIX XML files that describe their books to come into existence. These files are shipped to "trading partners". The reason, more or less, that the publishers do this is because way back when, Amazon forced them to do it that way instead of the horrible old ways they used to do things.

So the reason that the HarperCollins API, and others like it, are significant, is not because they'll be useful in their current form. It's because big publishers have realized that getting bossed around by Amazon might not be a smartest thing to do, and maybe having more direct relationships with developers would be a good idea.
Enhanced by Zemanta

Saturday, June 8, 2013

Publishing Hackathon a BookSmashing Success

Thursday, HarperCollins announced its BookSmash Programming Challenge. The book industry is nothing if not trend-driven, and after the success of the Publishing Hackathon, the BookSmash announcement qualifies "hacking" as a book industry trend.

The Hackathon turned out to be more significant than I expected. We should never underestimate the power of juxtaposing people with non-overlapping ignorance. I had the chance to talk to some of the other hacking teams last week, and they feel they learned a whole lot about the publishing industry. I also talked with Rick Joyce, one of the drivers of the event and Chief Marketing Officer at Perseus Books Group. He wrote me that his two revelations were "1) the importance of putting 'shareable data' (i.e. metadata) into a form developers want to work with (i.e. API's) vs. the feeds [publishers] traditionally supply to [their] trading partners. The world of developers are not going to incorporate you into their brilliant new lego creations if you don't give them lego-bricks to build with. And 2) this whole Open Innovation model is pretty mind expanding."

The organizers of the Publishing Hackathon got a lot of things right. The space was wonderful, the food was publisher-quality, and the publicity was excellent. (I admit that even the hype-laden website blurb that I criticized did its job well.) The variety of sponsors lent an open and collaborative atmosphere to the event. Even libraries were represented. It was a good decision to set a theme of "book discovery" for the event; this helped focus the participants and created a set of discussions that are likely to continue. Having the final presentations on the floor at BEA was brilliant. The party afterwards was fantastic.

The projects that were created at the hackathon won't solve the book discovery problem. The winning project, Evoke, won because it's both plausible and totally out of left field. But it's likely the knowledge gained by hackers and publishers during the process will advance the state of the art.

As with anything new, there are a number of things that could be improved on in future hackathons. Here's my list:
  1. Everyone is a VIP. During the presentations, three rows of chairs in the front were set aside for "VIPs". No one sat in them. Next time, make the hackers the VIPs.
  2. More prizes, more fun prizes. The gift economy of hacking and the cash economy of startups both need nurturing and cross-pollinating. Having one cash prize of $10,000 is less motivating than 5 $1000 prizes, and how do you split it if you have a big team? A prize consisting of dinner at a nice restaurant or some theater tickets might be a stronger motivation for participation.
  3. Hacker Judges. None of the 10 judges for the 2013 Publishing Hackathon actually do any hacking. Only 3 of the 10 qualify as technologists. None of them are designers. (As far as I know.) If you want to send a message that design, technology, and code are important to publishing, then build that dialogue into the judging process as well.
Now about BookSmash.

At first, I was seriously underwhelmed by the BookSmash challenge. It seemed to be a way for HarperCollins to prop up the sad, desolate ghost towns that are the OpenBook API and the OpenBook Content API. (The OpenBook API was launched in April of 2012 with the support of Mashery; the forums had attracted exactly one developer in the last year.)

But perhaps I judged prematurely. The competition website claims that a number of authors, including Peter Drucker, Eloisa James and C. S. Lewis, will be "making their full works available via the BookSmash Challenge version of the OpenBook API." This could be really exciting, but as far as I can tell, it seems to be a bit of an exaggeration. I checked James' Desperate Duchesses; the content API returns the first 20% of the work. I tried Prince Caspian and got this result:

epubFetch unable to display this book

Sorry, we have not loaded this book into the system as yet.
We are loading books on a regular schedule, so please check back.

 Still, I can imagine some interesting things that might be done with this data.

Update June 18: Have been working with the friendly people at HarperCollins to iron out documentation issues that had been blocking my access. I'll put some hints in a new post.

/START RANT/ In 2013, Metadata APIs like Harper's are NOT enough. The metadata is not very good, and there's not enough of it. Why would a sane developer go to HarperCollins for product metadata when she could go to Amazon or Google Books and not have it limited to HarperCollins products, not have it limited by HarperCollins Terms and Conditions (which forbid any commercial use), AND have the selling price included, too??? Also Prince Caspian Movie Tie-in Edition (digest) is NOT the title of a book! If you want interesting things to happen with your metadata, let developers download THE WHOLE DATASET! That's how you get the data to Amazon and BN, and that's how you should get it to developers! /END RANT/

It's like Rick Joyce said. If you want people to build cool things, you have to give them lots of cool bricks.
Enhanced by Zemanta

Tuesday, June 4, 2013

Four Corners of the Sky: The End

This is the last of my Big Library Read diary posts.

The big reveal is that Sam and Jack's crazy mom killed their father the Judge. But wait!!

I'm still not sure about the Queen. How many of them are there? Are they all fake? Where did the jewels come from, and how did Jack find them in the first place? Were they really buried on the farm? Were they fake too? Did Ruthie know about them? Why on earth would Jack have hidden a jewel in the Spirit of St. Louis in Lambert airport? Why was the King of the Sky a good place to stash the queen for 20 years? Why do the feds care about this, again? How did Raffie's mom get in on this? Do banks anywhere ask you three riddles before they give you your deposit? Will Brad and Melissa live happily ever after? Georgia and Trevor? No one misses the million dollars?

So the book was reasonably fun.  I have a suspicion that I missed a lot of the allusions and layers. Would not have read it but for Big Library Read. I think it was a good choice for the program because it's easy to talk about.

The Goodreads data shows some gradual uptake over the course of the free availability.

I think that one's affinity for a book or for an author depends a lot on whether you know and love the characters in the book. Or if you don't know the characters, perhaps they're sufficiently exotic that they fascinate you all by themselves. It helps if the book is well written and cleverly constructed, but that's not enough to make you love the characters. Love strikes like lightning and disrupts like a tornado, destroying one house while leaving the next unscathed. That's why the "book discovery problem" is so hard, and why a friend's recommendation is so much more useful than any algorithm could be.

Monday, June 3, 2013

Four Corners of the Sky: Chapters 42-48

This is the 8th installment of my Big Library Read diary. I'm through with Part 3 and on to Part 4.

Parts one-four have been titled North, South, East, West, presumably the title's four corners. North is centered in Emerald, North Carolina; South takes us west to St. Louis, East is Miami, and West seems to be Key West. Part 5 being "Home" is not much of a spoiler.

So what was "North" about Part 1? Is this about Oz's good witch? A game of Bridge? Why not North, West, South, East? Did you know that in Chinese, the order is always East-South-West- North (Dong Nan Xi Bei, 西) which also means "all directions".

The Anemoi, or four winds, had personalities that were cold, hot, unlucky and mild. Maybe. Winter, spring, summer, fall?

I guess everything will be revealed at the end.

I'm much happier with Ruthie as Annie's mother than some girl friend from Barbados, but I'm guessing this is a misdirection from the author.

Annie seems to have fallen hook, line and sinker for the scam, whatever it is, and I bet Sgt. Dan is in on it.

GoodeReader has an article about Big Library Read.

Next Entry

Sunday, June 2, 2013

Four Corners of the Sky: Chapters 39-41

This is installment 7 of my Big Library Reading diary, covering chapters  39-41, through the end of Part 2.

I had been reading Four Corners on my train commute into Manhattan for Book Expo America. But on Friday Overdrive pushed an iPad update that somehow left my Overdrive app in a half-updated state. So that slowed me down a bit. I need to get it in gear; I have only 5 days left on my checkout!

Maybe it's the influence of BEA, but there seemed to be a LOT of alcohol consumed in these 3 chapters. So I have a few things to say about that today.

Sgt. Dan Hart drinks ALMOND LIQUEUR????? Is that an alusion to a movie I didn't see? Because if Brad was drinking almond liqueur I would think nothing of it, but Dan Hart? What kind of love interest (I got that right, didn't I?) drinks almond liqueur? And after a bottle of Cuervo, too. Annie drinks pitchers of mojitos after saying she doesn't drink. Also, Brad's bottle of beer.

Speaking of mojitos, congratulations to the Evoke team, which won the Publishing Hackathon book discovery competion with their website that makes emotional connection of characters in books more salient. The mojitos at the afterparty were just AMAZING, they set this reading diary back about 10 chapters worth.

So Melissa Skippings is a vodka martini girl. Good for her.

Zemanta is recommending blog posts about Nutella Martinis. So they win the competition for drink discovery via books.

Next Diary Entry
Enhanced by Zemanta