Monday, November 30, 2009

ZIP vs. PAF: Has Database Copyright Enabled Postcode Data Business?

Have you ever noticed that there no such field as "Legal Science"? That's because the scientific method is hard to apply to the development of laws. Just imagine applying an experimental law to one population while giving a placebo to a control population. Occasionally a circumstance appears where we can look for the effect of some particular bit of jurisprudence. Today's example is the database copyright. In the UK and other European countries, there is a special type of copyright (lawyers call it sui generis) that applies to databases. In the US, there has been no copyright for databases as such since 1991, even if they are the product of substantial investment.

In the US, databases can only be protected by copyright if they are expressions of human creativity. This is intended to be a fairly low bar. If the selection of data, for example, represents the judgement of a person, then the database can be protected by copyright. What isn't protected is the mindless labor, or "sweat of the brow" effort that someone has made to accumulate the data. The 1991 Supreme Court decision that established this rule was a unanimous one written by Justice Sandra Day O'Connor. It retrospect, the opinion seems prescient, as if the Court had anticipated a day when sweating brows would be banished by scraping computers and global networks of information.

Rob Styles has a post on his blog that got me reading and thing about these database copyrights. His key point is a suggestion that distributed, Linked Data will disrupt database intellectual property rights as profoundly as P2P distribution networks have disrupted the music and entertainment media businesses.

Like all great blog posts, Styles' is at the same time obviously true and obviously wrong- i.e., thought provoking. First, the obviously true part. When technology makes it trivial to reaggregate data that is readily available in a dispersed state, then businesses that rely on exclusive access to the aggregate become untenable. The example discussed by Styles is that of the Royal Mail's Postcode Address File. It turns out that in the UK, the Royal Mail has made a modest business of selling access to this file, which lists every address in the country that receives mail together with geographical coordiantes. This arrangement has been recently in the news because of Ernest Marples Postcodes Ltd., a small company which attempted to provide free API access to Postcode data, but was shut down by a threat of legal action from the Royal Mail. Apparently the royal Mail won't let websites use the postcode data on a website without paying a £3750 license fee. They also offer per click licenses which cost about 2p per click. To all appearances, the Royal Mail supports a healthy ecosystem of postcode data users- they list 217 "solutions providers" on their web site.

Styles' point is that the facts contained in the postcode file are in the public domain, and with Semantic Web technology, a large fraction of these facts could be made available as Linked Data without infringing the Royal Mail's copyrights. Once the data has entered the cloud, it would seem impractical for the Royal Mail to further assert its copyright. My posts on copyright salami attempted (unsuccessfully, I think) to construct a similar evasion for books; Rob's suggested postcode copyright evasion is clean because the "slices" are truly in the public domain, rather than simply being fairly used, as in my scenario.

How does the US differ in the availability of postcode data? In the US, the data file that corresponds most closely with the Royal Mail's PAF file is the USPS Topological Integrated Geographic Encoding and Reference/ZIP + 4® File (TIGER/ZIP+4). In the US, not only is there no database right, but works of the government are considered to be in the public domain. In general, government agencies are only allowed to charge for things like TIGER/ZIP+4 to cover distribution costs. Thus, it's not so surprising that the USPS doesn't even list a price for the TIGER/ZIP+4 file. I called up to ask, and found out that it costs $700 to get a full dump of the file. USPS does not offer updates; I was told that the release file is updated "every 2-3 years". The USPS, unlike the Royal Mail, seems uninterested in helping anyone use their data.

Since the USPS doesn't put any license conditions on the file, companies are free to resell the file in most any way they wish, resulting in a wide variety of services. For example, ZipInfo.com will sell you a license to their version of the Zip+4 file, suitable for use on a website, for $1998, updated quarterly. This is about 1/3rd of the price of the similar offering by the Royal Mail. Zip-codes.com has a similar product for $2000, including updates. On the low end, "Zip code guy" says he'll send you a file for free (the data's a bit old) if you link to his map site. On the high end, companies like Maponics provide the data merged with mapping information, analysis and other data sets.

The purpose of copyright has historically been "for the Encouragement of Learning" according to the Statute of Anne and "To promote the Progress of Science and Useful Arts" according to the US Constitution. The different copyright regimes used for the UK and US now present us with an experiment that's been running for over 18 years as to the efficacy of database copyrights. In which country, the UK or the US, have the "Useful Arts" surrounding postcode databases flourished the best?

After a bit of study, I've concluded that in the case of postcodes, database copyright has so far been more or less irrelevant to the development of the postcode data business. And even though the governmental organizations have completely different orientations towards providing their data, the end result- what you can easily buy and what it costs- is not all that different between countries. Although it's argued that the shutdown of ErnestMaples.com and the higher cost of data in the UK are a result of database copyright, there is clearly more at play.

In theory, one way that copyright promotes commerce is by providing a default license to cover standard use of protected material. In fact, there are very few database providers that rely solely on copyrights to govern usage terms. In both the US and UK, the "good" postcode databases are only available with a license agreement attached. These licenses preserve the business models of postcode data merchants; it's not clear that ErnestMaples.com was complying with license agreements even if it wasn't infringing a database copyright.

Since UK database copyrights don't have effect in the US, we might imagine setting up Royal Mail Postcode business in the US to exploit the absence of copyright. Would we be able to do something that we couldn't do in the UK? Well, not really. We'd probably still want to get a license from the Royal Mail, because £3750 is not a lot of money. It would cost us more to ask a lawyer whether we'd run into any problems. And at least in theory, the Royal Mail would have the freshest data. This is the reason I think Styles' post is "obviously wrong"- the distributed availability of data won't have a big effect on the core business of the Royal Mail or any other database business. It would have exactly the same effect as the absence of copyright protection in the US has had on the UK postcode market. In other words, nil.

My main worry about licensing from the Royal Mail would be in the area of allowed uses; I don't don't really trust an organization with the words "royal" and "mail" in its name to be able to understand and fairly price all the crazy smashed-up uses I might invent. Database copyrights give producers like the Royal Mail the ability to arbitrarily disallow new uses. Since it's hard to prove that any given fact has been obtained free of database copyright; the threat of an infringement lawsuit by the Royal Mail could even stifle non-infringing postcode applications.

What I don't see in the postcode data "experiment" is evidence that database copyright has had any great benefit for "the useful arts" in the UK compared to the US. If that's true, then why bother having a special copyright for databases?

As data lives more and more on the web, and becomes enhanced, entailed, and enmeshed, it makes less and less sense to draw arbitrary lines around blocks of data with copyright of autonomic aggregations. Although we need innovative licensing tools to build sustainable business models for data production, maintenance, and reuse in a global data network, we don't really need the database copyright.

Reblog this post [with Zemanta]

6 comments:

  1. Have you read Boyle's _The Public Domain_ yet? He has examples of European database copyright stifling potential markets, plus some killer humor toward the end on the subject.

    ReplyDelete
  2. Hi,

    You say "the distributed availability of data won't have a big effect on the core business of the Royal Mail or any other database business."

    I agree - the key issue is provenance and freshness of the data, and therefore trust - issues that apply much less to media obtained via P2P networks; that old episode of Fawlty Towers is still funny years after the event.

    What will cause problems for the Royal Mail is not the availability of distributed data, but the *expectation* of the *availability* of data. Initiatives such as data.gov.uk are changing our expectations about availability, and Linked Data and the Semantic Web are changing our expectations about ease of integration and therefore reuse. In this context the Royal Mail position is increasingly untenable.

    Cheers, Tom.

    ReplyDelete
  3. Eric, good thought provoking post but I have just a few small clarifications. The database right is distinct from copyright and has different terms (15 years I think). It's not copyright for databases. Also all European countries have it and it's a WIPO accord so the US are supposed to implement something like it too (see http://www.cptech.org/ip/cpt-dbcom.html). Also the postcode system is copyrightable in its own right regardless of any database right. The Post Office invented the scheme and all postcodes are created by them an assigned on request. This is quite similar to the situation with the Dewey Decimal system that OCLC controls the copyright on.

    ReplyDelete
  4. Under US copyright law, at least, the UK postcode system is NOT copyrightable, exactly because it is a "system". Systems are protectable by patent rights. Dewey Decimal system would be copyrightable on the basis of the human creativity needed to define any subject heading (see the Delta Dental ref in the comments on Rob's post); the numbering system itself can't be copyrighted.

    ReplyDelete
  5. I must have read Chapter 9 of The Public Domain in another consciousness. It presents a much broader version of the same discussion. Here's a sample:

    Even before the directive, most European countries already gave greater protection than the United States to compilations of fact. The directive raised the level still higher. The theory was that this would help build European market share. Of course, the opposite is also possible. Setting intellectual property rights too high can actually stunt innovation. In practice, as the Commission’s report observes, “the ratio of European/U.S. database production, which was nearly 1:2 in 1996, has become 1:3 in 2004.”7 Europe had started with higher protection and a smaller market.

    ReplyDelete