Thursday, March 27, 2014

The Asterisk behind NYPL and Bookish

Joe Regal grew up in a family that moved around. Granger, Indiana.  Lewiston, New York. Towanda, Pennsylvania. In every town there was a library, which young Joe would seek out as a haven of virtual stability. Regal remembers that in Fairfield, Connecticut, he picked up Breakfast of Champions, because the cover looked like a cereal box. He opened it and was thrilled to discover, right there on page 5, the "drawing" of an anus/asterisk. And the text of Kurt Vonnegut's novel was even more subversive than the drawing.
"Vonnegut was one of those writers who made me feel less alone.  He also made me understand that it was OK to break the rules, because often the rules were insane.  That message - captured even in the asterisk/anus drawing, though of course more deeply, richly, and powerfully in the actual writing! - meant so much to me at 13, it's hard to convey or even fully remember the totality of it.  The freedom, the sense that you could explore without fear of punishment or retribution - that's a lot of what the library meant to me as a kid.  It's easy for us to forget as adults that a book can literally save your life. Or even on a more prosaic level, if there was literally no cost to taking out a book, I could take out anything without worrying whether it was right for me. I could browse, read a bit, take it out, get bored, return it."
As an adult Joe Regal translated his passion for books to a successful career as a literary agent. He believed so deeply in Audrey Niffenegger's The Time Travelers Wife that he ignored countless rejections until he found a publisher for it. ("I do not publish science-fiction." was the complete text of one rejection.)

As an agent, Regal could see first-hand what ebooks and Amazon were doing to the ability of authors, publishers and bookstores to sustain their livelihoods. He thought about what an seller of ebooks could and should be. There should be space for curation and community. Authors should be able to connect with readers. As he talked with others about his ideas, the concept of a new kind of website for ebooks began to take shape. (I got to know Regal and his family around this time.)

A few years later, Zola Books is a reality. Initially funded by friends of Regal (including Niffenegger), Zola has recently closed a $5.1 million seed round. The round includes a variety of authors and prominent individual investors led by Charles Dolan, founder of Cablevision and HBO. Even considering the funding, Zola's ambition is breathtaking. They've built a commerce platform like, a social platform like GoodReads, an HTML epub reader with proprietary DRM (not yet launched), and partner curation tools like- (stretching a bit) sort of a TripAdvisor for books. Not to mention a solid catalog of ebooks.

A recommendation engine has been a big space on the Zola development roadmap from the beginning. It's not easy technology, so when the recommendation engine built by Bookish became available (along with the Bookish website) at a fraction of its development cost, Zola, newly funded and in a hurry, snapped it up at a bargain-basement price.

The Bookish recommendation engine uses "finger-prints" of books in its algorithm. In other words, it works more like Pandora than like Netflix. The fingerprints are not just metadata and are not just text analysis, but use elements of both along with human-powered analysis.

recommendations for
Breakfast of Champions
On Monday, New York Public Library announced that it had integrated the Bookish-powered recommendation engine into their NYPL BiblioCommons-powered web catalog, fulfilling Regal's dream of being able to give back to the libraries he loved growing up, opening up unexpected books like Breakfast of Champions to new generations of readers.  The recommendations are live on the NYPL website, so you can decide for yourself if the recommendations are good or not. I found them to be intriguing, at least.

Apparently NYPL has been looking to add a recommendation feature to its website for a few years. They tracked potential partners along with Bookish to determine the best option, and had the benefit of seeing some advance demos before "Bookish Recommends" launched online. NYPL was impressed by Bookish's "big data back-end" and that it was not driven by sales; the number of titles the it covered at the outset was impressive.  NYPL will be assessing  performance over the first year to ensure that the recommendations are valuable to readers.

According to Patrick Kennedy, Co-founder and President at BiblioCommons,
"The background to this story is the interest a number of libraries have shared with us in broadening their role as a source of book recommendations in their communities.  The initiative will allow for better visibility and sharing of librarian recommendations and reviews, the integration of other third-party recommendations databases such as LibraryThing and NoveList.  Our goal is provide a neutral platform that allows libraries to integrate the sources of their choice.  In all cases the integration API is made available by the third parties to BiblioCommons with the understanding that any library on the BiblioCommons platform may license the content."
Zola is hoping to make the Bookish API widely available to libraries and is considering a variety of licensing models. As Kennedy points out, there are recommendation services already available to libraries. The LibraryThing service (marketed by Bowker), is based on activities in the LibraryThing social network and is incredibly deep; the NoveList service from EBSCO takes a more traditional reader's advisory approach. The Bookish recommendation engine may not be based on sales the way Amazon's is, but if it doesn't help Zola sell ebooks, it will die. Can the mission of a library be advanced by using a tool whose ultimate purpose is to sell books? Or does it depend on the sort of bookseller behind the tool?

This conflict is probably why booksellers and libraries haven't been sharing as much book information infrastructure as you might expect. A library has different goals for a recommendation system than does a bookseller. Libraries need to steer users toward books of their collection that are less used, while booksellers need to present the user with books that the patron is most likely to buy. Which might ALWAYS be 50 Shades or Hunger Games.

But bookselling and libraries are both changing rapidly. With the big-box bookstore dying before their eyes, publishers are scrambling to find ways to continue putting books in front of readers. One possibility is that libraries will respond to this need and evolve a closer connection to commerce, and that booksellers will figure out how to tighten their connections to communities and their libraries. The alternative is that libraries and ebookstores grow apart to serve very different populations and needs – Amazon Prime and library subprime, if you will.

My guess is that libraries sharing infrastructure with booksellers will become the norm rather than the exception it is now. Monday's announcement by NYPL and Zola is more than just a website usability widget, it's about a vision of what libraries and booksellers can become. Zola has sent a love letter to the library world.


  1. started out as a joint venture of Penguin, Hachette, and Simon & Schuster. Bookish spent a vast amount of money developing the site.
  2. Competition between LibraryThing and Bookish might well lead to some changes. Bookish uses some content from LibraryThing, such as reviews, on its website. When Bookish launched, LibraryThing founder Tim Spalding wrote 
    Besides reviews, Bookish has access to some other LibraryThing data, including edition disambiguation and recommendations. A glance at their recommendations, however, will show you that they're not using them "cold," but as some sort of factor."
  3. I wrote about BiblioCommons when they came out of stealth a few years ago. They've won the business of some very high profile public Libraries, NYPL and Seattle Public Library included. They have the big  benefit of starting from scratch with current web technology, and as a result have been innovating quickly.
  4. I took a look at how the integration was done. The Bookish API is a straightforward REST and JSON with access keys. ISBN-based queries such as<token>&apiKey=<key>&isbn13s=9780670024902

    return JSON like:
      "basic": {
        "isbn13": "9780671742515",
        "bookUrl": "<token>",
        "imageUrl": "",
        "title": "Long Dark Tea-Time of the Soul",
        "subtitle": "",
        "authors": ["Douglas Adams"]

    The library-side integration done by BiblioCommons is ajaxy and javascript based; a javascript calls the api, pulls out the ISBNs and sends them back to BiblioCommons, which checks for the recommended ISBN in the catalog. A list of holdings is sent back to the browser for rendering. It looks like Bibliocommons itself does not call the bookish API, which could lend itself to easier integration with other recommender APIs.
  5. Another interesting recommender system in the library world is bX from ExLibris. It's a usage based system focused on article links, rather than books. Currently, bX will return book recommendations based on articles, but doesn't provide recommendations based on books.
  6. Don't confuse, the company acquired by Zola Books with, the company acquired by Overdrive
  7. Not that we haven't had this problem at, but why does NYPL list Robert Egan as the author of the ebook version of Breakfast of Champions? (Update: Answer from Amy Geduldig at NYPL- "The catalog entry here refers to the play Breakfast of Champions by Robert Egan, which is based on the novel by Vonnegut, but in and of itself is a different work, which is why Egan is listed as the author. ")
  8. All the book links in this post point at the NYPL BiblioCommons catalog so you can see try out Bookish Recommends for yourself.
Enhanced by Zemanta

Saturday, March 22, 2014

eBook ILL is silly. The reason why will bore you.

When we try to think about digital things as if they are still the real things they used to be, we can lose touch with the parts of reality that are important. It's silly.

If you're not of the library world, let me explain what ebook ILL is and why it's not silly per se. ILL stands for Inter-Library Loan. In the print world, libraries have finite collections and they depend on other libraries to make sure that even if they don't have a book that a user needs, another library will step in and fill the gap. For the user, it means that their small library can provide them with books from a huge virtual collection. A book might take a few days to arrive.

There are significant costs involved in ILL. Most libraries charge the borrowing library a fee to cover the expenses of packaging the book and sending it to the recipient library. The fee might be 10 or 20 dollars, and it might be waived for closely cooperating libraries. At the same time, libraries pay the same fees to other libraries, so in the end, it all evens out. But many libraries run a significant surplus, rewarding them for smart acquisition policies of the past.

Library lending cooperatives have figured out that the combination of Amazon and modern warehouse logistics have partly upset the economics of ILL; a library can often purchase a used copy of a needed book on Amazon for less than ILL transaction costs (Especially with Amazon Prime!). But ILL is still an important part of the library ecosystem.

For digital content, the buy vs. borrow equation shifts back a bit. In principle, there's no shipping cost and modern databases can retrieve a digital item in milliseconds. But if a library can do digital ILL, what is to prevent libraries from sharing a resource so widely that only one library in the world needs to buy the item?

The solution that e-journal publishers typically use is the "print-and-ship" solution. In other words, a library is allowed to send articles from a subscribed journal only if they print it out first. The transaction is thus identical to what it was back in the dark ages of ink and paper and xerox machines. For publishers, the friction of print-and-ship discourages libraries from canceling subscriptions; besides, the big-deal model of bundling many subscriptions into one has been much more advantageous for publishers than the document-delivery model that ILL competes with. (Also, when they first went digital, journal publishers were poorly equipped to do article-by-article e-commerce.)

Printing article PDFs and mailing them is a stretch, but mapping this model into ebooks is a farther stretch. The book ecosystem has never included libraries creating copies of in-print printed books. And why should library A ever acquire a book if the copy owned by library B works just as well? Since most ebooks never really go "out-of-print", the inter-library loan system will be competing directly with publisher sales.

To see why it still makes sense for publishers to allow ebook ILL, consider what it is competing against: "patron-driven acquisition" (PDA). The core idea behind PDA is that a library doesn't buy an ebook until a patron shows up that wants to use it. For many books that libraries buy, this means that they don't buy the book at all, and for the rest, there might be no purchase until many years after publication.

It's often better for the publisher to encourage "just-in-case" acquisition, because the resulting revenue can be put to work immediately to publish more books. For books with low demand, inter-library loan encourages just-in-case acquisition by increasing the likelihood that somewhere, sometime, someone will need the library's copy of even the most obscure book.

eBook licensing with ILL has very similar economic characteristics as licensing to a library with many users. The larger the library, the more demand can be aggregated, and thus books can remain economically viable even at very low levels of user demand. In the limit of large user bases, ILL looks very much like the Open Access collective funding such as has been demonstrated by Knowledge Unlatched.

"Just-in-case" acquisition has benefits for libraries, too. Coupled with an effective archiving strategy, the library can make sure a resource doesn't disappear if a publisher has to withdraw an ebook title. Or perhaps the publisher goes out of business, or decides to change their business model.

But ebook ILL is still silly. Admittedly, the one-user-at-a-time licensing model has proven to be a useful conceit for selling ebooks. People are used to paying for a copy of a book, so it seems natural to buy a copy of an ebook. But stretching that model to inter-library lending turns the conceit into an outright lie. Just because one library has bought an ebook copy doesn't mean that they should be able to lend it instantly to anyone in the world.

Clinging to the pretend-its-print conceit when developing licensing models for "just-in-case" acquisition results in harmful misunderstandings for both publishers and for libraries. Publishers focus on sales substitution, and libraries misunderstand what they're paying for. Pricing and terms for such licenses will better benefit both libraries and publishers if the license is seen for what it really is rather than something it's pretending to be. There's no sense in locking in the negative attributes of the old when developing the new.

So maybe we need a new acronym. How about "Interacting Libraries License"?

ILL is dead, long live ILL.

Saturday, March 1, 2014

The DMCA Takedown of a Feynman Lectures eBook Converter

The Feynman Lectures on Physics was one of my favorite textbooks in college. It wasn't the assigned textbook, it was recommended reading. I think the reason it doesn't work as a textbook is that every chapter is so deep that students would get sucked so far into every topic that they would never finish the course. It's the sort of book that transforms your life and way of thinking about the physical world. When I started, The Feynman Lectures was one of the first books I investigated for ungluing.

My friends at Caltech informed me that the rights situation with the Feynman Lectures was exceedingly complicated, and it would be a cold day in hell before the Feynman Lectures would be free to the world in digital form. It seems that Caltech and the book publishing world had made an awful hash of the rights, with print rights being owned by Pearson, and the audiovisual rights being owned by competing publisher Perseus. Heroic efforts by Caltech lawyer Adam Cochrane and some dedicated physicists and educators resulted in the untangling of rights, leading to a revised edition available through Perseus imprint Basic Books.

And last year, a miracle happened. An authorized free digital version of the lectures appeared on the web! There is sanity in the world! The Feynman Lectures had been unglued!

Vikram Verma, a software developer in Singapore, wanted to be able to read the lectures on his kindle. Although PDF versions can be purchased at $40 per volume, no versions are yet available in Kindle or EPUB formats. Since the digital format used by kindle is just a simplified version of html, the transformation of web pages to an ebook file is purely mechanical. So Verma proceeded to write a script to do the mechanical transformation – he accomplished the transformation in only 136 lines of ruby code, and published the script as a repository on Github.

Despite the fact that nothing remotely belonging to Perseus or Caltech had been published in Verma's repository, it seems that Perseus and/or Caltech was not happy that people could use Verma's code to easily make ebook files from the website. So they hauled out the favorite weapon of copyright trolls everywhere: a DMCA takedown.

I am not a lawyer, but I think that this use of a DMCA takedown was improper and possibly illegal. I'm pretty certain that use of Verma's script for personal use would be protected fair use in the United States, under Betamax. There are no terms of use at the Feynman Lectures website for Verma's script to violate; there wasn't even a robots exclusion. So even a legal theory that Verma's code was inducing others to violate website terms falls flat on its face.  But alas, there's no penalty for abusive DMCA takedowns, so Perseus' main downside is having to read annoying blog posts like this one. And Perseus does need to look out for their authors' rights – they probably aren't in a position to asses what some ruby code does.

Luckily, Github has a policy of publishing every DMCA takedown notice it receives, which is how I found out about Perseus' action, and Verma's counternotice. Perseus had 10 days to respond to the counter-notice and since they failed to do so, Github has re-opened the repository.

In the meantime, the Feynman Lectures website has taken some steps to break Verma's script. For example, instead of a link to (my favorite chapter), the table of contents now has a link to javascript:Goto(2,18). This will take about 10 minutes for Verma to work around. In addition, the website now has a robot exclusion (except for Googlebot).

Michael Gottlieb, the editor of The Feynman Lectures on Physics New Millennium Edition added this issue to the repo:
The online edition of The Feynman Lectures Website posted at and is free-to-read online. However, it is under copyright. The copyright notice can be found on every page: it is in the footer that your script strips out! The online edition of FLP can not be downloaded, copied or transferred for any purpose (other than reading online) without the written consent of the copyright holders (The California Institute of Technology, Michael A. Gottlieb, and Rudolf Pfeiffer), or their licensees (Basic Books). Every one of you is violating my copyright by running the script. Furthermore Github is committing contributory infringement by hosting your activities on their website. A lot of hard work and money and time went into making the online edition of FLP. It is a gift to the world - one that I personally put a great deal of effort into, and I feel you are abusing it. We posted it to benefit the many bright young people around the world who previously had no access to FLP for economic or other reasons. It isn't there to provide a source of personal copies for a bunch of programmers who can easily afford to buy the books and ebooks!! Let me tell you something: Rudi Pfeiffer and I, who have worked on FLP as unpaid volunteers for about a decade, make no money from the sale of the printed books. We earn something only on the electronic editions (though, of course, not the HTML edition you are raping, to which we give anyone access for free!), and we are planning to make MOBI editions of FLP - we are working on one right now. By publishing the script you are essentially taking bread out of my mouth and Rudi's, a retired guy, and a schoolteacher. Proud of yourselves? That's all I have to say personally. Github has received DMCA takedown notices and if this script doesn't come down pretty soon they (and very possibly you) might be hearing from some lawyers. As of Monday, this matter is in the hands of Perseus's Domestic Rights Department and Caltech's Office of The General Counsel. 
Michael A. Gottlieb
Editor, The Feynman Lectures on Physics New Millennium Edition

(Note: Gottlieb's description of the website copyright notice is inaccurate- it says nothing about "downloaded, copied or transferred for any purpose")

This is kind of sad. Here Caltech did the right and noble thing and made the Feynman Lectures free as a website. That they can make money from the work via sales of print and other versions is great. But having done that, trying to control what people do with the free digital version (other than sell it) is a hopeless endeavor, and they should just stop.

I was wrong. The Feynman Lectures hasn't been unglued.

Update, March 3: Verma made a one-line change to the script to un-break it. But it's not a polite script, so don't all go and run it. Better to ask Caltech to use the script to make epubs and mobi's for sale; I would certainly pay for my DRM-free copy!

Update, March 4: Gottlieb e-mailed me to say that Perseus didn't respond to the counter-notice because Github's email notice went to a spam filter, and that more takedowns would be coming. He seemed to think that I am one of the developers and warned that I have put myself "in a precarious legal position". To me clear, I am not involved in the development or publication of I hope its existence is not used as a pretext to take down or lock down the FLP website. Also, high-quality epub and mobi are on the way!

Update, March 7: Verma e-mailed me to say he is voluntarily taking down his repo:
I'm taking down my copy of the repository on Monday morning, in worry its continued availability will lead Caltech to discontinue free online access to FLP. You're each welcome to adopt maintainership if you prefer, though I would rather if you did not.
Techdirt has a post and commentary.

Update, March 10: Verma's repo is now history, but forks of it remain in 15 places, including, bizarrely, Gottlieb's own Github page
Enhanced by Zemanta