Tuesday, January 26, 2010

Deconstructing the Attributor Book Piracy Study

It was a lot of fun to have a post slashdotted and read by 100 times more people than any of my previous posts, however, the fact that it made fun of the way a study on piracy was "spun" overshadowed some serious points.
  1. The study in question, from Attributor, in fact has some substance beneath the spin.
  2. The number of books people get from libraries is comparable to the number they get from bookstores.
First, the Attributor study. Looking past the silly projections of book industry lost sales, there is some useful information there. In the study, illicit copies of 913 books were found on four one-click download sites which display download counts. The reported download counts for these 913 books totaled 3.2 million over 90 days. On average, each book was downloaded 3,500 times total, or 39 times per day.

These numbers are in rough agreement with a much smaller sample I collected for myself. I looked at only 10 books in a single category on a single site, and observed that download totals ranged for 0 to 5000, with an average of 578.

I followed up with some questions for Rich Pearson, the General Manager at Attributor. Most interesting to me, and probably of likely concern to publishers, was this: of 913 titles, chosen from Amazon lists to get a broad distribution rather than to focus on popularity, Attributor was able to find illicit copies of 90% of the books they looked for. I had expected that the most popular titles would be easy to find, but this breadth surprised me.

The extrapolations made to translate these download numbers into economic impact, while understandable from an marketing point of view, are multiply lacking in rigor.

The first assumption made by the study is that the "downloads" reported by  download sites represent potential readers of books. Attributor does not have any special relationship with the download sites to gain access to download statistics, they simply rely on the download counters published by the sites.  Assuming that the counts are not simply manufactured (and I've seen download sites that do just this) it's unlikely that the download sites bother to distinguish between robot activity and human activity. Robot activity is important because many of the download sites do not offer search themselves. This is a tactic that allows them to be unaware of, and thus not liable for,  the content that they host. The sites depend on other sites linking to them to drive traffic; they monetize the traffic by throttling downloads and offering "premium" subscriptions to people want to remove the throttles.

The second assumption made in the Attributor study is the way that they extrapolate numbers reported by 4 sites to the 25 sites that were monitored. What they did was to weight the sites based on the distribution of the 52,000+ takedown notices that Attributor has sent out since launching their monitoring service in July of '09. One problem with this is that while the scope of takedowns was limited to Attributor customers, the 913 monitored books were limited to non-customers. If Attributor's customers were focused in textbooks, for example, this would result in a bias towards sites used to host illicit textbooks. The other problem is "lamp post" bias- the extrapolation is based on what Attributor can find; sites hidden from Attributor would result in undercounting.

Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (P.S.)Third, the sampling extrapolation to cover the entire industry has significant uncertainty. I worry most about price bias. The 1,132 downloads of the book Freakonomics were reported, which seems insignificant against sales of 2.5 million copies. Freakonomics, ranked #77 at Amazon,  can be bought new for $9.35. By contrast, Architect’s Drawings, which was downloaded over 10,000 times and ranks #547,491 at Amazon,  is a $60 hardcover. (The high download count is likely due to use as a textbook). Thus the "lost sales" for Architect’s Drawings will be hugely overweighted in the extrapolation.

Architect's Drawings: A selection of sketches by world famous architects through historyFinally, while the Attributor study does note that the actual effect of downloads on sales is speculative, there is a significant question as to whether the people downloading the book copies could have purchased the books even if they wanted to. It is evident from the chatter around the book downloads that many of the downloaders speak Spanish, French, Portuguese, Arabic, and Indonesian. According to Pearson, Attributor scans sites in many languages linking to illicit books, including Chinese, Japanese, Korean, French, Spanish, German, Italian, Czech, Polish, Russian, Portuguese and Austrian. The impact of piracy is likely to be very different in export markets. Not that this is shocking news to anyone.

Attributor is currently targetting its service (which starts at around $10,000) at large publishers, though it is working with a reseller called Author Guard to service smaller publishers. It points to its superior ability to find and take down illicit content as its comparative advantage. (I've previously surveyed other companies in this space; somehow I managed to overlook Attributor.)

Although many publishers will look at the numbers and conclude that services like Attributor are not worth the expense, I think the real danger from piracy is collective rather than individual, and thus the response should be collective rather than individual. The book publishing industry will only suffer if piracy becomes so widespread that piracy gains cultural acceptance. To prevent this from happening, book publishers need to lead the culture with both carrot and stick. Legal, allmost-free access to books must be provided such as occurs today in libraries, while illicit content should be taken down in as efficient manor as possible. This means that services such as Attributor's should be deployed on behalf of the entire industry, perhaps through a consortium, and not just by individual publishers. Finally, the book industry needs to figure out how to use ebooks to effectively address the needs of the developing world, or else huge markets will be forever out of reach.

The book publishing industry is entering some scary times and needs to decide who its friends are. A don't know whether technology companies with scary marketing will prove to be reliable friends or not. Amazon and Apple might end up being saviours, but publishers shouldn't expect them to be buddies. I'm pretty sure of one thing, though. Libraries are definitely not the enemy.
Enhanced by Zemanta


  1. I love that you used the term "illicit books". You make me feel like a kid again, hiding my books from my parents, stashing them in boxes and under pillows. There were times when I would read books from the adult section of the library, devouring stuff which even then I knew was age-innapropriate; reading and re-reading things I couldn't yet understand.

    You also terrify me. Will we someday have to pay for every piece of information we read? Will authors jealously guard against wikipedia entries which reference their precious research, because it isn't being paid for by the end user? This is the information age, right now; will it become the age we locked information in discrete little cells, requiring cash bribes to liberate?

  2. I like that you present a critical approach to the study. They identified 1,600 illegal downloads for Angels and Demons, which is an absolutely irrelevant amount. Any medium sized bookstore could´ve sold that in the weeks following the book publication date.

  3. As IPFI stated in their RIN 2010, 3 out of four downloaders then buy the music that they downloaded. (Their quote was actually "1 in 4 downloaders never buy music"). This means that "sampling" would appear to be similar to your radio play for free versus buy to own annecdote with only the technology altering and not the inherent 85% honesty of the population. (Curiously, this leaves a 15% balance, which just happens to be the defaulters amount on Mastercard, Visa and prior to the GFC, housing loans.


Note: Only a member of this blog may post a comment.