Tuesday, October 13, 2009

The Revolution Will Be Digitized (By Cheap Book Scanners)

It's always a good sign when you meet a literary character at a conference. Last June, I wrote about meeting a Bilbo Baggins at the Semantic Technology Conference; on Friday I met a character out of a Neal Stephenson novel at D is for Digitize.

D is for Digitize was a small conference organized by James Grimmelmann of NYU Law School. It brought together legal luminaries with people from publishing, business, academia, advocacy, technology, and the press. It had been organized to coincide with the scheduled Fairness Hearing for the Google Book Search Settlement. As it turned out, the Fairness Hearing was postponed, to be replaced by a brief "status conference". The effect of the postponement on the conference was beneficial- with the Google settlement officially on the shelf, the participants were able to have real discussions on the future of book digitization without getting too bogged down in legal argument.

That future was brought into very clear focus by the two digital cameras in Daniel Reetz' do-it-yourself book scanner. Reetz's presentation and demonstration blew away everyone in the room. Like Stephenson's Waterhouse characters in Cryptonomicon
and the Baroque Cycle, Reetz is a tinkerer and a liberator of information. He spent some time in Russia and became accustomed to the conveniences of digital books in a society that doesn't pay much attention to copyright laws. On his return home to North Dakota, he was shocked at the high price of textbooks and the low price of digital cameras. He resolved to build himself a book scanner and went dumpster diving for materials, then posted instructions for how to make the scanner online.

In May, he was awarded the Grand Prize (a laser cutter) in the Epilog Challenge, a competition sponsored by the manufacturer of a laser cutter to promote "open design" manufacturing. The laser cutter has enabled Reetz to refine his scanner design to use precision-cut plywood. His first third-generation scanner, which folds up neatly for portability, was finished just in time for him to bring to the conference. (He had fun getting it through airport security!).

Compared to robotic scanners such as the one manufactured by Kirtas the DIY Book Scanner is strikingly simple. It is built with rubber bands, drawer sliders, white LEDs and two commercial off-the-shelf digital cameras. Some Russian friends of Reetz's have figured out how to hook into the camera's firmware so that scan acquisition can be triggered by pressing a single button. Open source software is used to do image management and post-processing. An operator turns the pages and average throughput is about a thousand pages per hour. The total cost of the scanner parts are under $300, including cameras. For more pictures of Reetz's new scanner, he's posted some here.

Reetz is not the only one building cheap scanners based on his design. A small but vital community is growing around the open-source design. Although book publishers might unthinkingly assume that this group is primarily interested in book piracy, they would be wrong. Several people just want to read books they've purchased in print on their iPhones or Kindles. An engineering student in Arizona is reading disabled and must digitize to be able to read his textbooks. One Indonesian man built a scanner with donated cameras because his town's property records had been damaged in a flood. More than one book aficionado has turned to scanning in response to a too-many-books spousal ultimatum.

For other perspectives on Reetz's presentation, see Harry Lewis' post at Blown to Bits and Robin Sloan's post at The Millions.

In my article on the impact of the Americans with Disabilities Act on selling non-accessible books, I speculated that the as cost of digitizing books drops, society's expectations for the bookselling industry would change. Now that I've come face-to-face with a cheap book digitizer, I realize that much will be transformed. For example, let's assume that an effective book digitizer can be built and deployed for $500. (Even if DIY turns out not to be the way this happens, commercial manufacturers such as ATIZ are likely to be able to meet similar price points.) Then the cost of putting a book scanner in 20,000 libraries would be $10,000,000. If these libraries digitized an average of even one book per day, they could digitize 10,000,000 books in two years. Since 10 books per day should be well within the capabilities of an inexpensive digitizer, the libraries should have no technical difficulties with digitizing 4 million books per month.

If libraries acquired the capability of digitizing millions of books per month, then Google's erstwhile monopoly on digitized out-of-print books could evaporate quickly in an appropriate legal environment. Rightsholders who have been angry at Google for working with libraries on digitization should think ahead to a future in which their works can be ripped, mixed, and burned by cheap book digitizers in millions of homes and offices. The world will be different.

In Stephenson's Cryptonomicon, Randy Waterhouse develops a data haven in a Pacific island country to evade crude laws governing cryptography. I hope that Daniel Reetz doesn't have to retreat to a digitization haven country to able to bring the sensible benefits of book digitization to people who need it.

Reblog this post [with Zemanta]

1 comment:

  1. Thanks for the details, Eric. These kind of inexpensive hardware implementations eliminate a big barrier to mass-digitization. The last major expense would then seem to be the labor to drive the machine(s) at each library. Was the discussion of the creation of open source software to drive the workflow of scans into usable digital artifacts? (Taking necessary diversions into good metadata capture and OCR correction, I imagine...)