Friday, February 19, 2010

Notes from the Google Books Fairness Hearing

The Fairness Hearing was even more interesting than I expected; every time a speaker started droning on about something we'd all heard ten times before, Judge Chin would interrupt with a snippy or pointed comment. Judge Chin definitely runs a no-nonsense courtroom.

ResourceShelf has a nice round up of the news reporting from the fairness hearing; the best summaries are from Norman Oder at Library Journal: Part One and Part Two.

Here are some of my observations.

How Many Books?

In Dan Clancy's declaration (PDF, 149 KB) in support of the settlement, there are some interesting numbers (which actually come from Google's Jon Orwant).
  • Google pays approximately $2.5 million per year to license metadata from 21 commercial databases of information about books.
  • Google has gathered 3.27 billion records about Books, and analyzed them to identify more than 174 million unique works.
These numbers seemed to cause a great deal of confusion at the hearing. Several speakers opposed to the settlement combined this number with the information from the Declaration of Tiffaney Allen, Settlement Administrator for Rust Consulting, (PDF, 2.1 MB) that
As of February 8, 2010, Rust Consulting has received 1,846 completed hard copy claim froms, and 42,604 claim forms were completed using the settlement website. The total number of Books claimed by those 44,450 claimants is 1,125,339. [...]

Of the 1,107,620 Books claimed online, 619,531 are classified as out-of-print (not Commercially Available) and 488,089 are classified as in-print (Commercially Available).
Some objectors subtracted 1 million claimed books from 174 million unique works to get the eye-opening number of 173 million unclaimed works supposedly being exploited by Google. This is silly math, and the use of silly math is a good indicator of speakers not doing their homework.

It's known that one of the bibliographic databases licensed by Google is OCLC's Worldcat; it's probably not a coincidence that Worldcat currently contains 174,618,797 bibliographic records. There's a big difference between a bibliographic record and a book subject to the settlement. Later in the day, Daralyn Durie, an attorney representing Google, tried to clarify what the numbers meant. (updated February 22 with text from the transcript)
  • 174 million is NOT the number of books in the settlement. 
  • Google estimates that there are 42 million different books in US libraries. 
  • 20% of these are in the public domain.
  • About half of those left are written in foreign languages.
  • Of the 42 million, less than 10 million of these works are affected by the settlement in any way. 
  • Of these, about 5 million are out-of-print books implicated by the settlement. 
These numbers are in line with reality. Michael Cairns, a veteran of the book data supply chain business, has published his own estimates of the number of orphan works which more or less square with these numbers.

So what are the other 160 million works? They're duplicates (different editions of the same work), works that aren't books, and works published in countries excluded from the agreement and not registered with the US copyright office.

Update, February 20: Jon Orwant was kind enough to send me some clarifications.
The only correction I'd make is that it actually *is* a coincidence that OCLC cites 174M records and we cite 174M books. 

One thing to add to your "silly math" bit is that the 174M number also includes public domain books (hence not part of the settlement), and (this is the part that everyone messes up, and was ambiguous in Dan's declaration) 174M is a count of *manifestations*, not *works*.  Hamlet is one work but hundreds of manifestations.  The actual number of works is closer to 120M, but I haven't checked our most recent analysis.

Phrase of the Day: "Identical Factual Predicate"

It became clear at the hearing that Judge Chin's decision would turn on a determination of whether the settlement and the complaint it is meant to resolve have "identical factual predicates." I'll do my best to explain why.

A significant hurdle that the parties (i.e., Google, the Authors, and the Publishers) have to overcome is that the settlement is truly innovative and forward looking, and seeks to bind absent class members to business models that would not otherwise be allowed under copyright law. In their brief justifying the use of a class action, the parties cite a 1986 Supreme Court decision nicknamed "Firefighters", Local Number 93, Int’l Assoc. of Firefighters v. City of Cleveland. In this case, in which the petitioner tried to overturn a consent decree designed to redress past racial discrimination using ongoing obligations, the Court clarified that a judicial decree may go beyond the bounds of an original complaint.

In their filings, objectors countered with the “identical factual predicate” doctrine. This doctrine arises from a case known as "Super Spuds" in which it was held that a class action settlement could not go beyond the complaint of the original lawsuit. Judge Chin seemed interested in the apparent conflict and even asked Amazon's lawyer, famed copyright attorney David Nimmer, for his views on how to reconcile the precedents.

Nonetheless, attorneys from both sides wanted to argue whether the settlement satisfied the "identical factual predicate" test. Michael Boni, attorney for the Authors Guild, appear to be digging himself deep into a hole when Judge Chin asked him "Isn't it true that this case started out about snippets?" Boni argued that the case was really about the fears that publishers had about the scanning that Google was doing, and who knew what else? I thought to myself that publishers seem to fear much about the future of their industry, and following Boni's line of reasoning, the settlement could have included air rights because authors and publishers feared that the sky was falling.

Daralyn Durie's subsequent argument went a long way to recovering the ground lost by Boni. Of all the hot-shot lawyers making arguments at the hearing, Durie was by far the most impressive. She persuasively argued that since the original complaint included the Google's distribution of scan files to the libraries that contributed books for scanning, the settlement's provisions for selling access to scan files indeed constituted an identical factual predicate.

Judge Chin's eventual decision will turn on his evaluation of the "factual predicates".

What, Exactly, is Copyright's "Head"?

By the end of the hearing, I was sick and tired of hearing the phrase "turning copyright on its head". Even Bruce Keller, attorney for the Publishers' Association, was eager to use the phrase in its negative form. Have you ever tried repeating a word over and over again, so that its sound becomes grotesquely detached from its meaning? That's my feeling about the copyright-head phrase. It's meant to express that copyright usually means that copying requires the rightsholders permission, and the settlement would allow Google to make copies unless the rightsholder refuses permission.

On repetition, I began to ask myself: What part of copyright is the head? Are there brains in copyright? Is copyright blind? Does copyright have legs? Is there an invisible hand of copyright? When you eviscerate copyright, do copyright intestines spill out onto the floor?

Judge Chin Wants to Fix It

I got the impression that Judge Chin would like to approve a settlement. At least twice he asked objectors how they would "fix" the settlement to remove their objections. He asked EFF's Cindy Cohn how to fix the privacy problems she called attention to, and he sounded unhappy when EPIC's Marc Rotenberg told him that privacy problems with the settlement couldn't be cured. He asked Irene Pakuscher (representing the Federal Republic of Germany) if the settlement could be fixed to satisfy Germany's concerns about treaty compliance and effective representation. He also wanted to explore with more than one questioner Hadrian Katz' suggestion that all problems would go away if the settlement shifted from being opt-out to being opt-in.

State Laws Aren't Relevant

In an article last year, I suggested that Judge Chin might be tempted to used state unclaimed property laws as an alternate way to unravel the Orphan Works mess. Looks like I was wrong- he expressed open skepticism at the argument of Norman Marden, representing the Commonwealth of Pennsylvania, that the settlement should be rejected because of incompatibility with state laws.

Blind People had the Best View

The National Federation of the Blind made sure to have a very visible presence at the hearing to emphasize the benefits of the settlement for the reading disabled. It worked- photographs of blind people made the New York Times.

Spectators for the hearing filled two courtrooms. For the morning, I was in the overflow room, which featured a video screen small for the room and a distorted sound system. The view of the courtroom was fixed, and omitted any view of Judge Chin. Ironically, the seats closest to the video screen were filled with people who couldn't see it. Let's hope that's not emblematic of the case.


Contribute a Comment