Go To Hellman

Tuesday, October 29, 2013

eBook Heaven

Do you believe in heaven?

Well, why not? There are many ways to think about heaven. Most people will admit that there's something of us that lives on after we die, even if it's just the memories that we leave in others or the impact of our lives on the material world. And whatever that something is, it doesn't need food to eat or even air to breathe. It certainly doesn't require a paycheck. Truth and beauty and wisdom, those qualities don't really die with our bodies, do they? If the bit that we leave behind has of its essence some truth or beauty or wisdom, doesn't that sound like heaven?

If you have a favorite library, you know what book heaven is like. Words can live on long after their creators have turned to dust. Libraries work each and every day to bring all the truth, beauty and wisdom in their collections to their communities, both present and future. They cooperate with each other, so that even if your library is missing the book you need, another will fill the void. The rules governing our society have recognized how important all this is, and allow us all to benefit from the labors of those whose existence has faded to memories.

I believe in ebook heaven. In ebook heaven, there are no royalties to pay to Herman Melville or William Shakespeare or Dante Alighieri. There's even a slushpile in ebook heaven, where the weight of the world presses a diamond or two from unpublished graphene sheets.

The ebook heaven I believe in – some call it Open Access.

There's ebook hell, too, and that's what libraries live today. In ebook hell, books don't live forever, they disappear after a year. Or they're snatched into the kindles of eternal damnation by digital rights demons, lawyers and engineers. Every read must be monetized to feed some hungry monstrosity, and truth and beauty and wisdom are memories like the smells of leather bindings and musty paper.

Or maybe it's ebook purgatory. Dante imagined purgatory as a mountain that souls must climb before being admitted into paradise. In each circle around the mountain, the deadly sins that have stained the souls – envy, greed, lust, etc. – are purged by suffering, sanctified by fire and purified by agony. At last, the remains enter into the Garden of Eden, where everything has returned to its original perfection.

As we look to the future of ebooks, all we can see today is a long circle of purgatory. Our copyright theology posits that we must track millions, perhaps hundreds of millions of creators and their deaths into the far future so that we may say if a work has passed into ebook heaven. In more circles around Purgatorio mountain, we must track national boundaries, regional rights, governing laws, inheritance claims, contract disputes, international conventions, and perhaps even patent rights.

But still, I believe in ebook heaven.

Libraries are still endeavor to create little circles of ebook paradise. Within the bubble of a library, ebooks can be free to read. Digital archivists see that some books really do outlast us. New generations of minds encounter all sorts of new knowledge and enlightenment.

Libraries still work with each other to connect their bubbles and make their ebook paradises bigger. We need to enlarge those heavens, book by book, year by year, library by library. And we can't restrict ebook paradise to academia, any more than a belief system can restrict spiritual paradise to its priesthood. We need to find ways to expand the boundaries of availability for every book, to build bridges between today's best sellers and the far future of the public domain.

eBook heaven is worth working towards, together.

Do you believe in it?

Monday, October 7, 2013

NYLSLR: The eBook Copyright Page is Broken

Somehow it slipped my mind that my article "The eBook Copyright Page is Broken" was published in the New York Law School Law Review in April. And I am still not a lawyer! Here's the meat of the article:

The traditional copyright statement is thoroughly and fundamentally broken. Consider the simplest possible case of a single copyright holder:
© Eric S. Hellman, 2013. All Rights Reserved.

This is broken in the following ways:

Since there currently are not any copyright formalities, the copyright symbol means nothing. The work is subject to copyright with or without the copyright symbol.

The work may also not be subject to copyright, for example, if Eric S. Hellman is a government employee, a robot, or a non-creative compiler of factual information. In these cases there is no copyright even if there is a copyright symbol present. There is no legal duty for a publisher to put a copyright symbol only on a copyrightable work. How is the ebook user supposed to know the true copyright status of a digital work?

“Eric S. Hellman” is an uncommon name. But suppose the author is named ”John Smith.” What use, then, is the copyright statement? It does not specify which Eric S. Hellman or which John Smith is the author.

The asserted name of the copyright holder can’t be relied on because text in a digital file can be altered without a trace. It’s simple to take a digital copy of Merchants of Culture and change its asserted copyright holder to “John Smith,” then redistribute it. This is a negligible problem in the print world.

The asserted date of publication may be unrelated to the date of the underlying copyright. For purposes of copyright (for example, when a work is produced as a work-for-hire), re-publication of a book does not change the copyright expiration date of the underlying text.

There is no specification of the work being copyrighted. In print there’s not much ambiguity, but digital books are composite objects (text and graphics are always separate entities in a digital book file) and are frequently distributed in pieces. Some ebooks even have front matter distributed as a pdf file completely separate from the chapters. In other cases, an ebook may be displayed on a website that has a separate set of copyright statements.

If the digital book is legally on your ebook reader, then, somehow, the rights holder has granted you some rights, perhaps under the terms of an explicit license or with the license implicit in its availability on a website. Either way, “all rights” have not been reserved. Licenses are not needed for printed books, but they may be needed for ebooks.

In February, I wrote about ebook front matter and back matter and there's more work to be done in this vein.

The last footnote deserves some glossing. In it, I assert that the ccREL submission for marking Creative Commons status of web pages is currently in conflict with the EPUB 3 standard for ebooks. While that's technically true, it's a bit misleading. A better way to say it is that developments in HTML5 and EPUB3 have made ccREL's approach archaic. The metadata machinery in EPUB3 and HTML5 is fully up to the task of expressing and applying Creative Commons licenses. What's lacking is consensus around which of the available mechanisms to use. Since the RDFa vs. Microdata in HTML5 controversy has not yet fully shaken out, you can't really follow ccREL as written, so we'll need to have some patience.

Wednesday, October 2, 2013

Internet Plumbing: Mixed Redirect Chains

There's a lot of plumbing that underlies a website like unglue.it. Some of the more complicated plumbing involves the connections and links to other sites. Unglue.it has connection plumbing for Google, Goodreads, Twitter, Amazon, LibraryThing, Readmill, Internet Archive, Facebook, MailChimp and Gravitar; we've worked on some more that have yet to see daylight.

The rest of this post is about plumbing surprises. If you're not interested in website plumbing, feel free to go watch a cat video.

I've written more than you want to read about redirection, a rather important bit of website plumbing. HTTP redirects enable things like link shortening (e.g. bit.ly), long term link maintenance (e.g. crossref.org and purl.org), and just-in-time linking (e.g. OpenURL). If you redirect to another redirector, you have what's known as a redirect chain.

You can easily imagine the kind of mischief that can go on with redirects, the redirect loop being the most obvious. The plumbing in your web software has to know how to avoid getting stuck in redirect loops or endless redirect chains, and for the most part it does.

Security issues can also arise with redirects, especially with mixed redirect chains. A mixed redirect chain is one that includes both secure (HTTPS) and non-secure (HTTP) links. Here's an example trace for a shortened ebook download link on the unglue.it website (it's the latest unglued ebook, Feeding the City, about the amazing human "plumbing" that delivers lunches to workers in Mumbai). You can try it yourself: https://bit.ly/19Ncaz7

The first thing your web browser does is it sets up a secure connection to bit.ly. While doing this it checks bit.ly's X.509 certificate with bit.ly's OCSP responder, Digicert. OCSP stands for "Online Certificate Status Protocol", and the result is that you can be reasonably sure that your connection is to bit.ly and that no one but maybe the NSA can snoop on your communication with bit.ly. In particular, no one can see what link you ask to be resolved, and no one but you can see bit.ly's answer.

ask bit.ly:

(verify bit.ly, at http://ocsp.digicert.com/ )https://bit.ly/19Ncaz7GET /19Ncaz7 HTTP/1.1Host: bit.ly

bit.ly's answer:

HTTP/1.1 301 MovedLocation: http://unglue.it/download_ebook/986/

In this example, bit.ly is redirecting to a non-secure URL, making the redirect mixed. Anyone between you and the destination can see what you're asking for if you follow the redirect. If you're in a Starbucks using wifi, Starbucks could conceivably send you a book about coffee instead. So the secure rigamarole you went through with bit.ly seems a bit wasted. But at least no one can see your bit.ly cookie and find out all the shortened links you've followed.

ask unglue.it

http://unglue.it/download_ebook/986/GET /download_ebook/986/ HTTP/1.1Host: unglue.it

unglue.it's answer:

HTTP/1.1 302 FOUNDLocation: https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epub

Unglue.it still wants your ebook download to be secure, so it sends you to a secure archive where the file can be found. ebooks are increasingly containing Javascript and you really don't want to give bad guys the opportunity to insert malicious scripts in your ebook, even if most of today's reading platforms won't execute the scripts.

Since it's a different website, your web software needs to verify archive.org with their OCSP responder, GoDaddy.

ask archive.org:

(verify archive.org, at http://ocsp.godaddy.com/ )https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epubGET /download/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1Host: archive.org

archive.org's answer:

HTTP/1.1 302 Moved TemporarilyLocation: https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub

The Internet Archive operates jillions of servers, and to save it the trouble of rebuilding its index whenever they move a file, they use a redirector to get you to the server where your ebook is living today. It's yet another server, so you have to check its certificate, too:

ask ia801008.us.archive.org:

(verify ia801008.us.archive.org at http://ocsp.godaddy.com/ )https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epubGET /4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1Host: ia801008.us.archive.org

ia801008.us.archive.org's answer

HTTP/1.1 200 OK

And so we get our ebook. Since it comes on a secure connection, we can be sure it's the one that Internet Archive meant to give us. Since there was an insecure link in the redirect chain, we can't also be sure that it's the one that bit.ly meant to send us to.

You can see that there are a lot of steps in this chain. At every step of the way, your web plumbing needs to decide whether it's ok to send things like cookies or referers along with the request. For example, it should never be sending cookies received from a secure site to the insecure version of the same site. If a mixed redirect chain delivers you a javascript, you shouldn't mark a web page as secure even if the web page is securely delivered and it uses only https links to retrieve the javascripts.

A great example of how to implement this plumbing is the Requests module for Python. (Also a great example of clear, readable source code and documentation!)

An example of a buggy implementation of this plumbing is the open-uri code in Ruby. From the source code:

# This test is intended to forbid a redirection from http://... to# file:///etc/passwd, file:///dev/zero, etc. CVE-2011-1521# https to http redirect is also forbidden intentionally.# It avoids sending secure cookie or referer by non-secure HTTP protocol.# (RFC 2109 4.3.1, RFC 2965 3.3, RFC 2616 15.1.3)# However this is ad hoc. It should be extensible/configurable.

At least this code errs on the side of security. If you use Ruby to try downloading something via a mixed redirect chain, open-uri will raise an exception labeled "redirection forbidden". Perhaps it would be more accurate to label this a "too dicey for Ruby" exception.

You might argue that mixed redirect chains should not be allowed. Or at least that https-to-http redirects should be forbidden. There are two main faults with this:

When when a links span multiple site, there's no practical way to ensure that your links don't get mixed. Even if they're not mixed now, that could change in the future.
If you forbid https to http redirects, you're preventing sites from migrating to a more secure stance. A secure bit.ly would be impossible.

I tripped over the Ruby issue when implementing a connection to a partner that has built its site with Rails. They couldn't download some of our ebooks. Working together, we figured out what was wrong and implemented a work-around.

That's what us plumbers do for kicks.

Notes:

One thing you CAN'T do is redirect https to http if your certificate expires. To fix broken security, you need to fix the security.

Thursday, September 19, 2013

Booksmash's Lust-O-Meter Shows How Innovation Happens

When HarperCollins decided to sponsor a hacking competition called BookSmash, they probably expected the participants to be a rag-tag collection of smart students, hungry young startups, and underemployed misfit coders. It's very unlikely that they expected Nobel Prize winners or seasoned tech entrepreneurs to show up. But, as I pointed out in June, they had made some interesting and fun resources available as part of the competition: 196 full-text books from some popular authors. I'll let you in on a secret: despite what you may hear elsewhere, it's fun, more than anything else, that drives innovation.

The results of the competition were unveiled yesterday. Some of the teams I was already familiar with: I met the BookCities, Coverlist and LibraryAtlas teams at Publishing Hackathon. ReadUp, from the the great folks at ReadSocial, is a neat idea definitely worth checking out. But Text Textures was the submission that popped out at me. The Text Textures team is Mira and Frank Wilczek, a father-daughter team. Frank is a Nobel Prize winning physicist, Mira is a ethical-coding serial tech entrepreneur. (Lyric Semiconductor and Red Panda Security. A new project is BookGobble.)

Text Textures starts out by imagining how fun it would be if you could just skip to the "juicy parts" of a book. It turns out that with access to the full text of a book, a pretty simple combination of weighted word counts supplemented with pacing heuristics allows a text analysis engine to measure things like lustiness (hence the "Lust-O-Meter"), affection, violence and occult themes. By graphing each of these attributes versus page number, it's easy to see where the "juicy bits" of a book are. But that's not where the fun ends. You can density-plot one attribute versus another. And so we find out that "the lustiest scenes in For A Few Demons More appear to have almost no affection". You can plot compare multiple books, and use the measures to decide what sort of book to read next.

I asked Mira about the genesis of Text Textures. She responded:

I've always been neural-net-curious. So when I found myself with a nice nest egg and some free time, I took the opportunity to round out my education. My dad (Frank) has conveniently also been curious about neural nets -- although he was more intrigued by the analogy to human cognition -- so we decided to work through Hinton's Machine Learning lectures on Coursera together. We've been doing fun technical projects together for as long as I can remember. When I was seven, we built a foot-stomping robot using Lego MindStorms. When I was sixteen, we used genetic algorithms to solve N queens.

As we went through the Hinton course, we started to think about real-world problems it might be interesting to tackle using some of those mathematical tools. Eventually we started playing with tracking characters through Sherlock Holmes .... then finding the action scenes where those characters appear ... then looking at other ways to classify scenes ... and thus the underlying idea of Text Textures was born.

The Lust-O-Meter in Text Textures is a fun toy. Which is to say that I would like to be able to play with it myself. I would build a snark-o-meter. I'm not sure if a "Skip to Good Bits" button is something people want in the reader applications, and even if they wanted it they might not admit it. But eBooks don't have inherent page numbers, so new ways to navigate ebooks would be really useful. It's rather a shame that today's prevailing ebook environment of walled-garden DRM-encumbered marketplaces is hostile to innovations such as Text Textures. Even libraries are prohibited from doing textual analysis of most of the ebooks they buy. And because lustiness data, for example, is not protectable by copyright, rightsholders such as HarperCollins typically deploy restrictive terms of use on anyone they allow to access the full text of their works. It's not enough to open up just a crack for a hacking competition.

Everyone should be able to have fun with their books.

Note: you can vote for Text Textures or any of the other BookSmash submissions until September 27 at 5:00pm EDT by going here.
Update: @skyberrys notes that the Illuminate entry also has roots in #pubhack. I note that it's yet another contribution to the book world by a physicist!

Friday, September 6, 2013

I Hired a Book

I have an article up at Library Journal about startups that have been getting hired for reading ecosystem jobs over the past three years. The startups that I profile are GoodReads, Wattpad, Readmill, SIPX and Zolabooks.

This view of hiring companies for jobs comes from Clayton Christensen's concept of Milkshake Marketing, explained here.

Christensen describes the case of a fast food restaurant that wanted to improve sales of its milkshakes, but really didn't understand the job that consumers were hiring it for. Customer observations revealed that almost half of the milkshakes sold were to early morning customers who had long boring commutes; the milkshake was being hired to relieve boredom and postpone hunger.

Libraries too need to think about the jobs their users are hiring them to do. Sometimes it's just to relieve boredom or escape bad weather.

silo schematics
by Jerry Yeti

Over Labor Day weekend, I hired a book for a very specific job. I went to an actual bookstore and bought a book to occupy myself during a cross country flight from LAX to EWR, with a connection I was sure to miss in DFW. I chose Hugh Howey's Wool. Given the book's publishing history, you might think it perverse that I bought the print version. Wool was, for a long while, an ebook-only self-published series; Howey did a precedent-setting deal with Simon & Schuster for the print rights only. But consider the job I was hiring it to do: why would I buy an ebook that I couldn't read during the long waits on the runways? (My son had dibs on the window seat.)

Having spent the outbound trip coding new features for unglue.it, I needed a mental break, and Wool enveloped me in a completely absorbing self-contained world that did the job I hired it for quite nicely. Wool tells the story of the inhabitants of an isolated post-apocalyptic silo, built and supplied with technology from the year 2012. "IT" is the villain.

There was one continual distraction for me. My engineer's brain couldn't stop calculating the dimensions of the silo. The narrative made the silo seem large- after all it's the whole world for its inhabitants. But of necessity, it has to be compact. But how compact? Towards the end of the fifth part, I get some measurements: the bottom 8 floors are flooded, a depth of "70 to 80 feet". So 10 feet per level. That's pretty cramped!

Wool seems so relevant to our current environment, especially today's revelations about the extent of the NSA's decryption effort. Books have a way of doing jobs other than what you hired them for, just like the best employees. Just like libraries. Think about it.

Update 9/9: I purchased the DRM-free ebook package for Shift, the second book in the series. Which so far reveals that the silo tech is supposed to be from 2050 and that the top level is 10 meters, not feet.

Go To Hellman

Tuesday, October 29, 2013

eBook Heaven

Monday, October 7, 2013

NYLSLR: The eBook Copyright Page is Broken

Wednesday, October 2, 2013

Internet Plumbing: Mixed Redirect Chains

Thursday, September 19, 2013

Booksmash's Lust-O-Meter Shows How Innovation Happens

Friday, September 6, 2013

I Hired a Book

Blog Archive

Popular Posts

Me

Go To Hellman Fan Page

Labels

Go To Hellman

Tuesday, October 29, 2013

eBook Heaven

Monday, October 7, 2013

NYLSLR: The eBook Copyright Page is Broken

Wednesday, October 2, 2013

Internet Plumbing: Mixed Redirect Chains

Thursday, September 19, 2013

Booksmash's Lust-O-Meter Shows How Innovation Happens

Friday, September 6, 2013

I Hired a Book

Blog Archive

Popular Posts

Subscribe To

Me

Go To Hellman Fan Page

Labels