Extracting numbers using the tools used by Attributor is rather involved, and it's taken a while for me to carefully examine the available data. After doing this work, I've decided that when Attributor wrote "can be estimated at 1.5-3 million", they left out the word "blindly". As far as I can tell, Attributor is recklessly inflating the magnitude of ebook piracy; using the very same traffic measurement tools, I estimate the truth to be about 10% of the number they claim.
The Attributor numbers come from data generated by Google's AdWords service. AdWords is designed to help advertisers select advertising keywords and to manage budgets. For example, AdWords will tell you that the keyword "PDF" is used in approximately 101 million searches per month, worldwide, or 3.32 million searches per day. "PDF" is a keyword that a searcher might use in the course of a search for a pirated ebook, so you could reasonably assume that some percentage of these searches involve a consumer looking for a book they can avoid paying for. The trouble with this assumption is that most searches that include "PDF" have nothing to do with ebooks.
Another AdWords tool designed to assist Google advertisers is the keyword suggestion tool. In practice, you use this tool to refine keywords. Here is a table of the top ten refined searches for "PDF":
|Keyword||percent of "pdf"|
|doc to pdf||6.03%|
|pdf to swf||3.30%|
|pdf to xls||2.70%|
|pdf to word||2.21%|
|pdf to rtf||2.21%|
A review of AdWords' suggested refinements for the term "rapidshare" reveals that searcher interest in ebooks is negligible compared to that for movies, TV, music and games. For example, Rapidshare is a "file-locker" site, and might be expected to appear in search terms for illegally distributed files. Of 743 suggested keywords, only one, accounting for 0.24% of "rapidshare" queries, or about 4,000 per day, is clearly related to ebooks:
|Keyword||percent of "rapidshare"|
|download from rapidshare||6.03%|
|rapidshare free download||2.70%|
|free rapidshare downloader||2.70%|
|free rapidshare download||2.70%|
|rapidshare download free||2.70%|
|free download rapidshare||2.70%|
|rapidshare free downloader||2.70%|
|download rapidshare free||2.70%|
|free rapidshare downloads||2.70%|
|download free rapidshare||2.70%|
|search on rapidshare||1.80%|
|rapidshare windows 7||1.21%|
|windows 7 rapidshare||0.81%|
|rapidshare file download||0.44%|
|download rapidshare files||0.36%|
|rapidshare files download||0.36%|
|rapidshare windows xp||0.36%|
|rapidshare premium accounts||0.30%|
|xbox 360 rapidshare||0.30%|
|rapidshare premium account||0.24%|
|premium account rapidshare||0.24%|
|rapidshare account premium||0.24%|
|premium rapidshare account||0.24%|
|rapidshare engine search||0.24%|
Although direct interest in ebook torrents is so small that AdWords can barely measure it (~1500 searches per day), torrent search sites can give us another way to estimate the magnitude of interest in pirated ebooks. According to "KickassTorrents", the torrents active recently had this composition:
All in all, I estimate that about 210,000 searches made on Google per day represent possible interest in pirated ebooks. About 30,000 of these come from the US. The "real" number for all countries could be as high as 300,000 or as low as 100,000. The 1.5-3 million numbers reported by Attributor are not within the range of plausibility.
One difficulty with using Google AdWords to gain insight into piracy is that it measures only a "shadow cast by piracy", as expressed by a commenter on my previous post. Nonetheless, AdWords sheds considerable light on patterns of demand. For example, the tools show clearly that it's common for people to search for movies and TV shows and acquire them extralegally. Also, they indicate that most of the demand, about 82%, for pirated ebooks comes from outside of the US, UK and Canada. Publishers should plan antipiracy strategies accordingly, based on data that can be confirmed independently.
Update: I have a followup post.