Tuesday, June 30, 2009

Open Database License is Released and My Brow is Sweating

Engineers and technologists generally resent needing to know anything about the law, because most often the lawyers are telling them they can't do something for some inane reason. For their part, many lawyers are surprisingly interested in technical matters, but even the most technically informed lawyers resent having to acknowledge that technology often trumps the law into irrelevance.

Today's announcement by the Open Knowledge Foundation of the release of version 1.0 of the Open Database License (ODbL) will create resentment in both professions- information technologists need to understand some of its complications, and lawyers will need to understand some technological limits of the license. In this post, I will try to articulate what some of the hard bits are.

The goal of the ODbL is to provide a means in which databases can be made widely available on a share-alike basis. Suppose for example, that you spent a lot of time assembling a cooperative of volunteers to compile a database of conferences and their hashtags. If you then made it available under the ODC Public Domain Dedication and License (PDDL), a commercial company could copy the database and begin competing with your cooperative without being obliged to contribute their additions and corrections to your effort. Under a share-alike arrangement, they would be obligated to make their derivative work available under the same terms as the original work. So-called "copyleft" licenses with share-alike provisions have proven to be very useful in software as the legal basis for Open Source development projects.

The difficulty with applying copyleft licenses to databases is that open source licenses that implement them are fundamentally rooted in copyrights which cannot easily applied to databases, hence the need for the work of the Open Knowledge Foundation. Usually, databases are collections of facts, and you can't use copyright to protect facts. However, it gets more complicated than that. In the US, it's also not possible to copyright collections of facts which can in fact be copyrighted in Europe under the "Sweat of the Brow" doctrine.

So copyright protection (and thus licenses including GPL and Creative Commons) can be asserted on entire dataspaces, but that protection is invalid in the United States. What the ODbL does to address that issue is to invoke contract law to paper over the gaps created by international non-uniformity of copyright for databases. The catch is that contract law, and thus the share-alike provisions it carries in the ODbL, can only be enforced if there is agreement to the license by the licensee. That's the thing that causes such user-experience monstrosities as click-through licenses and the like. So pay attention, engineers and technologists (Linked Data people, I'm talking to you!), if the provisions of the ODbL are want you want, you'll also need to implement some sort of equivalent to the click-through license. If you expect to involve machines in the distribution of data, you'll need to figure out how to ensure that a human is somewhere in the chain so they can consent to a license, or at least you'll need to socialize the expectation of a license.

Pay attention too, you legal eagles. Be aware that mechanisms of the click-through ilk can be prohibitively expensive if implemented without thought about the full system design. The most valuable databases may have hundreds of millions of records and can be sliced and diced all sorts of ways, so you want to avoid doing much on a record-by-record basis. Also be aware that databases can be fluid. Legitimate uses will mix and link multiple databases together, and the interlinks will be a fusion that should not be judged a derived work of either of the source databases. Records will get sent all over and recombined without anyone being able to tell that they came from a database covered by ODbL.

Any organization considering the use of ODbL should study the criticisms of ODbL posted by the Science Commons people. My own view is that there are lots of different types of databases with different characteristics of size, application, and maintenance effort. ODbL provides an important new option for those situations where neither PDDL or a conventional proprietary license will maximize benefits to the stakeholders in the database. But most of all, technologists and engineers need to consider the requirements needed for successful open licensing early on in the development of database distribution infrastructure.

More on Linked Data Business models to come.
Reblog this post [with Zemanta]


Contribute a Comment