Friday, June 12, 2009

Linked Data vs. Google Fusion Tables

Wouldn't it be cool if you had an idea for a collection of data and there was a way you could set up the database and then invite people to contribute data to the collection, visualize the collection, re-use the collection, and link it to other collections of data?

I think it's really cool, and its also the idea behind Linked Open Data, an initiative of the W3C's semantic web activity. The Linked Data people have amassed an impressive array of datasets available in the RDF xml format, and by using a foundation of URIs as global identifiers, they've enabled these datasets to be linked together. I've been reading a really good explanation of how to publish linked data. But you know what? I have never actually made any data collections available via linked open data. It seems cool, but I'm not sure I would ever be able to get anyone to help me build a data set using Linked Open Data. Linked Open Data seems designed for machines, and there seems to be very little infrastructure that could help me collaborate with other people with common interests in building data sets.

This morning, I set up an online database for twitter conference hashtags, using the data I collected for my last posting about conference hashtags. I used a "pre-alpha" service from Google Labs called "Google Fusion Tables". If you have a gmail account, you can view the table yourself, export the data, and visualize it in various ways. If you email me, I'll authorize you to add records yourself. It would be nice if I could make the table visible to people without gmail accounts, but I assume that's what they mean by "pre-alpha".

Pre-alpha is a good description. I found two bugs in a half hour of working with Google Fusion Tables, but I got e-mail from the developers acknowledging the problems within an hour of reporting the problems, so I wouldn't be surprising if the bugs get squashed very rapidly. (For example, some cells had problems getting edits saved.) From the Linked Open Data point of view, Fusion Tables is very disappointing, as it doesn't seem to be aware (from the outside) of semantic web technologies.

My experience has been that technology is never the real problem, and that building social practice around the technology is always the key to making a technology successful. The Fusion Tables team appears to have looked at the social practice aspect very carefully. Every row in the database, and every cell in every row can be annotated with a conversation and attached to an authorship. These features seem to me to be fundamental requirements for building a collaborative database, and they're annoyingly hard to do using so-called semantic technologies.

The visualizations available hint at some possibilities for Fusion Tables. For example, records can be visualized on a map using geographical location. It's easy to imagine how visualizations and data-typing could be the carrot that gets data set creators to adopt globally known predicates. An ISBN data type could trigger joins to book related data, for example. Or perhaps a zoology oriented dataset could be joined via genus and species to organism-oriented visualizations. Fusion Tables doesn't expose any URI identifiers, but it's hard to say what's going on inside it. In stark contrast, Linked Data sites tend to really hit you in the face with URI's, and it's really hard to explain to people why URI's belong in their databases.

A lot of the posts I've seen on Fusion Tables seem to miss its focus on collaboration, and the types of social practice that it might able to support. Just as Wikipedia has created a new social practice around encyclopedia development and maintenance, there is a possibility that Fusion Tables may be able to engender a new and powerful social practice around the collaborative maintenance of record-oriented databases. If that happens, companies in the database development business could find themselves going the way of Encyclopedia Britannica and World Book in the not so far-off future.


  1. The bugs I reported seem to be fixed. Bravo Google!

  2. Interesting stuff.

    #agile2009 is the tag for the Agile Alliance conference (, August 24th to 28th in Chicago.

  3. Eric,

    The LOD cloud is a collection of data sets where each data item is endowed with an HTTP URI. By implication this means that RDF/XML is but one of the many metadata description formats a user agent can request when de-referencing these URIs.

    I am desperately trying to kill the RDF/XML (one representation format) and RDF Model conflation issue :-)

    Re. Fusion Tables, once Google decides to derive and generate HTTP URIs from their Fusion Table GUIDs we will end up with another major contribution to the burgeoning LOD Cloud.

  4. Kingsley,

    Let's hope applications like Fusion tables develop in the direction of generating URIs for Linked Data. Is it too much to hope that MS Excel would add similar support?