Friday, October 8, 2010

Consumer Demand for Pirated eBooks Stopped Growing in 2010

Online piracy of ebooks has been a persistent worry for book publishers who look at the successes and failures of other media that have moved to digital forms. A surprising number and variety of ebooks are easily availabile on file sharing websites and peer-to-peer networks that use bitTorrent and similar protocols. The possibility that this availability will cut into sales of licensed ebooks and even print books is a scary one for an industry that has had many decades of relative stability. At Digital Book World in January, Brian Napack, President of Macmillan, "delivered a passionate call to arms for publishers to fight piracy in the ebook space or risk permanent damage to the underpinnings of publishing as a commercial enterprise".

Adding to the ebook piracy hysteria have been studies of the prevalence of ebook piracy produced by Attributor, a company that sells anti-piracy services. I've previously written critically about Attributor's report that purported to find evidence that "Online Book Piracy Costs U.S. Publishers Nearly $3 Billion".

In their most recent report, Attributor has taken a rather clever approach to the measurement of ebook piracy. Instead of trying to track downloads, Attributor has begun to use Google Trends to gain an understanding of consumer demand for ebooks. Although there are many potential difficulties in using Google for this purpose, Google Trends is a powerful and useful tool for gaining insight into the things that web users around the world are looking for.

Attributor presents their data along with an alarming narrative of growing and pervasive ebook piracy, and points to the iPad as a contributing factor to an increase in demand for pirated ebooks. After playing around with Google Trends for a while, I've come to the conclusion that Attributor has narrowly selected data to fit their narrative; taken as a whole, Google Trends data broadly supports a rather different narrative: that the growth of consumer interest in pirated ebooks slowed significantly in 2009 and stopped in early 2010.

To understand how Google Trends informs the debate about the prevalence of ebook piracy, it helps to understand what activity is being measured. Google Trends measures the frequency that search terms are used. A consumer looking for a free copy of a particular work will typically search on the book title, adding  terms like "free" or "download" or "pdf" to locate downloadable files. A more sophisticated strategy, one that is quickly learned, is to add the name of a preferred download site. If the user prefers peer-to-peer networks, the word "torrent" can be added to locate "seed" files for the item. The file sharing sites most commonly used for this purpose are currently RapidShare, Megaupload, 4shared, and Hotfile. To use Google trends to measure the demand for a pirated ebook, you give it keywords that reproduce these searches. For example, demand for Stephanie Meyer's book Breaking Dawn can be assessed with a query such as this one.

To assess the overall state of ebook piracy, I used data from this query. Note that since the search is for ebooks generically, there's no telling for sure that the ebooks being searched for are really pirated; for the purposes of this study, I assumed that none of the ebooks being searched for are legally available on these sites. Calling them "pirated books" may be inaccurate, but I'll use that term anyway.

Some features of the data are immediately apparent. First of all, searches for pirated ebooks have increased a great deal over the past 5 years. It's worth noting however, that the most intense interest measured by Google occurs in India, the Philippines, Indonesia, Vietnam, Malaysia, Singapore, and eastern Europe. Less than half the search volume comes from the US. It's also easy to see seasonal peaks that obscure the shorter term trends. The peak periods for pirate ebook seeking are the December holidays and the beginning of September, presumably because of the start of school.
To eliminate seasonal variations, I computed the year over prior year growth of pirate ebook search activity. The resulting plot is quite smooth. After a few years of 100% per year growth, 2008 showed a clear slowing of growth. This slowing of growth continued up to the beginning of 2010, and then  flat-lined. Since February of 2010, the growth of interest in pirated ebooks has stopped completely.

It should be noted that this stabilization has occurred during a period of strong sales of ebook reader devices, including Kindle, Nook, and the iPad. Indeed, the unveiling of the iPad was coincident with the stabilization of demand for pirate ebooks.

It's hard to know for sure what's happening, but one interpretation of these patterns is that a broad increase in consumer-friendly availability of properly licensed ebooks over the last 2 years has squelched the growth of demand for ebooks from illicit sources. In that light, the remaining demand can be interpreted as a sign of poor availability for appropriately priced ebooks on college campuses and in developing countries.

While this data has to be seen as an encouraging sign for the book publishing industry, it's too soon to know if it will last. It's entirely possible that too-high prices, cumbersome DRM, or new technologies could reinvigorate the demand for illicitly shared ebook files. For the moment at least, the book publishing industry can exhale.
Enhanced by Zemanta


  1. A factor that needs to be considered regarding the data is that ebook-published authors--those with the independent ebook publishers--regularly scan the internet to see if their books are being offered somewhere free. How much of a factor that might be is likely indeterminable, but it still, I think, needs to be kept in mind.

  2. Elizabeth, as you point out, there are reasons other than piracy that someone might do a search that would get included in this data set. It's hard to say how much.

    I've used adwords to put numbers on the search volume. The total current search volume on the terms I've included is about 12,000 per day around the world. I have not been able to get anywhere close to the 1.5-3 million searches per day asserted by Attributor. Their numbers would imply that ebooks account for 20-40% of all searches for torrents and shared files, which would be a delusional estimate.

  3. Might be worth crossing this data with information on international availability of books -- since most publishing contracts parcel out international rights, and since the electronic distributors generally respect the publishers' regional availability demands, it may be that the areas where these searches are prevelant are also areas where the legal ebook is last to be made available.

  4. nonnihil- Literary agents would be in the best position to do that sort of analysis; if there are any reading this who would like help doing it, they should feel free to contact me.

  5. I wrote on Twitter my reaction to the statistical searches. I don't think they hold up. Google Trends isn't data about something. It's a sort of flickering shadow data casts on the wall. The shadow is fuzzy and the fire keeps flickering. The fundamentals of the situation--more devices, easy pirating--are so very simple and strong that I at least would need to see a hundred graphs pointing downward to believe ebook piracy was not growing.

    Also, I wish that you weren't drawn so strongly to "blaming the victim." This is a familiar trope from piracy battles of the past—that people who sell intangibles are to blame for the fact people are stealing from them. Thus we get "too high prices" or the unavailability of "appropriately priced" ebooks as a potential cause for future piracy. While higher prices certainly spur piracy, the injection of "too" and "appropriately" bothers me. Who are you to say what's an appropriate price for my creative work? Is the appropriate price merely the price at which I can convince more people to buy it than steal it?

    The finger should be pointed and the analysis focused on the real causes--people's ever-present desire to get things without paying for them, and technologies and legal structures that make this easy to do. I propose no solution for that. Technologies that make it harder to do make the product much less attractive, and laws sufficient to stop it would be dangerous laws indeed. But let's stop blaming the victim.

  6. Tim- Observation of shadows can be excellent science if you know how they get cast. Thus an astronomer can discover Neptune by observing the orbit of Uranus. The difficulty with using Google Trends is that our understanding of its inner workings are incomplete. An example of this is something you pointed out - the treatment of plurals. It appears that the google trend for a plural is a proper subset of the trend for the singular, i.e. "ebook" includes "ebooks", but I've not found any documentation that spells this out.

    If you are interested in automotive break-ins, you might want to consider the effects of: 1. stronger penalties for people who break into cars; 2. better car locks; 3. urging people to keep valuables in their cars out of site. I don't think it's "blaming the victim" if I suggest that #3 is actually the most effective measure. Book publishing is a business, and book publishers should focus on anti-piracy measures that yield the best returns.

    I'm not sure how I might have analyzed this article's data in terms of "people's ever-present desire to get things without paying for them". "the growth of immorality stopped in 2010"? "The last remaining bad person figured out how to avoid paying for ebooks in March"? "Tea-party movement shames people away from stealing ebooks"? "Pirates stop using Google, start using Bing"? "Google Trends losing handle on ever increasing ebook piracy?"

    Also, the Google Trend for "library catalog" clearly shows the reinvigoration of consumer interest in library catalogs caused by LibraryThing for Libraries!

  7. I think we agree more than we disagree on Google Trends. But I don't think it's just about the inner workings. It's also about the fact that it's at least one step away from what you actually want to know. You want to know how much people are doing something so you look for evidence of the intent to do it. You can't look for the intent, so you look at some shadow that intent casts--in this case the possibility that their intent manifests in a Google search of a particular shape.

    The data is almost epidemiological in its ability to be interpreted in different ways. Are people getting autism more, or is it being reported more? Does the fact people are searching for "ebook" in conjunction with certain words less mean they are searching for pirated ebooks less, or does it mean that, as the opportunities grow and the novelty wears off, people find the word "ebook" is a lame way to search for what they really want? What do you make of random, weird swings that seem correlated to nothing? Well, I don't the data can bear much interpretive weight.

    Your final paragraphs speak to a desire to avoid moral questions and focus on the business aspect. But your language is replete with the normative ethics of the anti-copyright movement, and its assumptions about what's effective and what's not. You don't suggest that publishers slash prices, increase availability and discontinue DRM to stop piracy, you merely assume this is the best course of action. The data simply doesn't speak to the issue.

    I might be more sympathetic if this argument didn't have a long and chequered history. Years ago, frankly, I bought it. But arguments get tested, and this one has failed. The music industry was told for years that people would buy their music if it was available online. They were told to let people put it on YouTube so people wouldn't have to pirate it to find out if they like it. They were told to remove DRM. They were told to stop suing people. They've done it all and nothing's worked. Legal downloading is tiny fraction of illegal downloading, and the music industry survives off old people who don't have the habit, and less pirateable things like concerts, t-shirts and licensing songs out to TV and movies. Books are fortunate in having a lot of old people in their customer based. But they don't have the same potential for secondary revenue.

    The key question on publishers minds should be this: How do we avoid becoming the music industry? One answer is that they should roll over and play dead even faster than the music industry. Another would be to resist it more effectively than the music industry.

    That resitance has, I think, started, with authors increasingly speaking out against book piracy. I think it's having an effect. Authors are sympathetic characters in a way that big rock stars and label executives just aren't. Another strategy has been to price things with an eye to making money, not to satisfy the public's wrongheaded notion that books cost what they cost because paper and glue are expensive substances. There's more they could do. And if there isn't, well, maybe they should wring out every dollar they can for as long as they can. Dying industries are often profitable ones.

    Anyway, thanks for the reply.

  8. Tim, On behalf of epidemiologists everywhere, I take offense at your use of the word "epidemiological" as a pejorative. It's imply not true that epidemiologists interpret their data any way they feel like.

    As a scientist myself, I believe in causality. If I make a measurement, there's a reason somewhere for the data that results from the measurement. In the present case, I've given an explanation for the data that I think is plausible; in his comment, nonnihil has suggested a way to test this explanation with some further analysis.

    There's an interpretation of quantum mechanics that says that the only reality is what you measure; the wave function is just a mathematical construct that works to explain the results of the measurement, and anything you can't measure doesn't matter, so you shouldn't worry if it exists or not. Piracy of ebooks certainly exists, but there's absolutely no evidence that iPad, Kindle, Kobo and Nook have resulted in an increase in piracy. Meanwhile, eBook revenues are going through the roof.

    The hypothesis that older people are to blame for the surge of non-pirated ebook activity is a testable one, even in our shadowy Google Trends laboratory. We could correlate piracy interest with the demographic profiles for given books, for example. It would not surprise me if Napster created a "lost generation" that will never pay for digital content, but until you look at the data, you'll never know.

    I'm not sure what "the normative ethics of the anti-copyright movement" are; my guess is that I don't subscribe to them, assuming that they are based on "information wants to be free".

    Were you aware that in the past year, the music industry had the exact same revenue, adjusted for inflation that it had in 1983? There's a lot more going on in the music business than piracy; it's easy to think you know more than you really do about it.