Monday, November 18, 2013

Google Books and Black-Box Copyright Jurisprudence

Last week, eight years after the first lawsuit was filed to stop the Google Books Project, Judge Denny Chin finally ruled on the core merits of the case. The decision is being widely hailed on one side as a "tremendous victory for fair use" and on the other side as a "fundamental challenge to copyright". But these are short-term perspectives. I think that the long term impact of the decision may turn on the acceptance of Chin's approach to technology's transformation of copyright, which I would characterize as Black-Box Jurisprudence.

In my view, the core holdings about fair use were never in much doubt. The argument saying that indexing or lexical analysis or data-mining of books always requires the permission of a rights holder was never very defensible, or even seriously argued. A holding that display of snippets was not fair use would have made scholarly writing in the digital age impossible; a decision the other way on snippets would have been swimming up a judicial stream. But fair use is always a weighing of factors, and the untold story in the Google Books case is about the factors that didn't get weighed.

The reason that Google got sued in the first place was less about "what Google did" than about "how Google did it".  Google made huge numbers of copies of books without permission of the rights holders. Judge Chin's ruling said, effectively, that all those copies were incidental to the fair use.
[I]f there is no liability for copyright infringement on the libraries' part, there can be no liability on Google's part.
In the end, it didn't matter how Google did what it did. In Judge Chin's analysis, copyright is concerned only with the ends, not the means. Copyright seems not to be concerned with what happens inside the black box.

Chin is not alone in this approach. His opinion follow's Judge Baer's ruling in the Hathitrust case, which featured a ringing endorsement of the Library's fair use
I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants' [Mass Digitization Project] and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the [Americans with Disabilities Act].
But for, me, the surprise in Baer's opinion was his transformation of the Arriba Soft case into a broad license for infringement. In that case, display of thumbnail images by a search engine was held to be fair use, and the copying of the images in the course of producing thumbnails was held to be necessary for the protected use. Judge Baer wrote that the fact that the images were on websites available for anyone anywhere to download was not relevant to the analysis, which he then applied to Google's scanning and OCR of physical books.
Although Plaintiffs assert that the decisions in Perfect 10 and Arriba Soft are distinguishable because in those cases the works were already available on the internet, Aug. 6, 2012 Tr. 19:2–4, I fail to see why that is a difference that makes a difference. As with Plaintiffs’ attempt to bar the availability of fair use as a defense at all, this argument relies heavily on the incorrect assumption that the scale of Defendants’ copying automatically renders it unlawful.
Baer thus reduces and equates Google's million dollar scanning operation with Arriba Soft's one line of code because they're in a fair use black box.

The Black Box approach to copyright can cut both ways. In Chin's dissenting opinion in the Aereo case, he wrote that it didn't matter that Aereo had engineered a way to use completely legal technical means to stream television signals over the internet.
In my view, by transmitting (or retransmitting) copyrighted programming to
the public without authorization, Aereo is engaging in copyright infringement in clear violation of the Copyright Act. [...] The system employs thousands of individual dime-sized antennas, but there is no technologically sound reason to use a multitude of tiny individual antennas rather than one central antenna; indeed, the system is a Rube Goldberg-like contrivance, over-engineered in an attempt to avoid the reach of the Copyright Act and to take advantage of a perceived loophole in the law.
In the Aereo case, Chin argued that since the end result of Aereo's engineering was a system with copyright infringing intent, the under-the-hood details of Aereo's system were not compelling. (Read James Grimmelmann for more on this case and copyright arbitrage in general.)

So when presented with cases where copyright law and technology collide, Chin has more or less adopted a consistent approach that isn't inherently pro-copyright or pro-fair-use.

If Chin's ruling had focused on the infringing means (i.e. massive copying) rather than on the fair-use ends in the Google Books case, Google could have gone back to the drawing board to devise a non-infringing means to accomplish the same ends. It would have been more expensive (à la Aereo), but the plain fact is that ten engineers can run technical circles around a thousand lawyers. In the end, Google would have lost the battle but would be far ahead in the war.

As the case now stands, while Google has a free hand to go back and improve and expand its scanning operations, it is still constrained in what it can deliver. For example, since Chin's decision cites the lack of advertising on snippet result pages in his fair use analysis, Google can't put advertising there without risking another $100 million lawsuit. Another innovator in the space can't go and do things differently without worrying about another judge's fair-use analysis.

The advantages of a black box legal approach is its practicality. Judges don't have to understand the intricacies of technology in order to decide legal questions. Technical processes are opaque for business reasons, too. But perhaps more importantly, a black-box approach to copyright law means that engineers can't use clever hacks to get around copyright.

The danger of the black box is that it pretends that technology doesn't matter, that code isn't law. Copyright law is rooted in technology, that of the printing press, and turning it into an abstraction that can also govern digital media wile ignoring what goes on behind the curtain is a dubious project. A complex enterprise like Google Books is a long journey from inception to delivery. Imagine if highway safety was addressed by regulating total travel times. Does it make sense to regulate a new technology like airplane travel in the same way?

Perhaps there ought to be a fifth factor in fair use analyses of systems more complex than a printing press. In addition to the usual four factors, Judges could also be weighing whether the steps involved in accomplishing a fair use would stand under their own 4 factor analysis. In the Google Books case, the analysis could have incorporated a weighing of the scanning operation by itself. Similarly, Aereo's meticulous adherence to legal means could weigh in favor of a fair-use determination.

My worry is that in other situations, perhaps with technologies we haven't imagined yet, the black box legal approach will end up with very wrong technical results. And then we'll be stuck, waiting for Congress to fix things. Look at what's happening as digital surveillance collides with crypto-security. There, the courts have uniformly refused to look inside the black box of the NSA, and the results may end up being disastrous.

(Gary Price has a thorough opinion round-up at Infodocket.)

