Thursday, June 25, 2009

The Bilbo Baggins of the Semantic Web

It's not at every conference that you encounter a Tolkein character. So when it happened to me at last week's Semantic Technology Conference I knew that the conference was something special.

On Wednesday, at the end of the plenary, I notice that the guy sitting in from of me had circled an abstract that had intrigued me. When I had read the abstract in the morning, it occurred to me that the the talk could be really good or really bad. I asked if he knew what it was about he said "not really", and we agreed that it might be prudent to sit in the back in case it turned out to be dreadfully dull. Here is the abstract:
The "A-tree" - A Conceptual Bridge between Conventional Databases and Semantic Technology
Harry Ellis
Babel-Ease Ltd.

In this talk Harry will introduce and demonstrate an embryonic data storage and distribution product that has the potential to form a universal bridge between semantic applications and all types of stored data wherever located. The key innovation is that all types of both application and metadata are expressed as independent assertions (enhanced RDF triples) where the subject is regarded as parent within a single tree structure (A-tree). Lineage through this parental path carries all inheritance and other mandatory relationships as determined by the predicate which is also defined within the same tree.
After coffee, I made my way to the appointed room, passing a woman who was studying the same abstract on the conference schedule board. I told her it looked interesting to me, and she thought my back-row plan sounded sensible. Which plan soon revealed itself to be flawed, as we found that the room was almost standing-room only. I squeezed myself into one of the last remaining seats, and resigned myself to possibly being both bored and squashed for the duration.

I was not bored. Harry Ellis turned out to be a semi-retired veteran of the entity modeling wars. He worked for the British Army for 12 years in the field of "battlespace information management" and developed a language used for the semantic modeling of "dynamic information across a complex enterprise". I seriously do NOT want to know what that's a euphemism for, but I assume it has something to do with the forges of Mt. Doom. Harry has been working 10 hours a week on this "self-funded research" from his home, "Little Twitchen" in Hobbiton Devon for the last 5 years.

Harry's talk was oozing in soundness. I found myself agreeing with just about everything that he said, and based on the audience reaction, I was not alone. You can go visit his website (don't neglect the sitemap) and get a pretty good feel for what he's doing, but the website's not been updated in a while, and he says a new website, with software implementing his vision, will be available soon at An extra session was scheduled at SemTech 2009 for Harry to show his creation in action, but alas, I was not able to attend.

Harry has looked at the existing sementic web infrastructure and has found a number of problems. They are ( my summary, don't blame Harry):
  • a profusion of ontologies that don't talk to each other
  • difficulty to resolve an entity when there may be many duplicates
  • difficulty in handling information that changes rapidly
  • difficulty in tracking the provenance of information
His solutions to these problems are very well thought out.
  • he suggests that all semantic web entity classes should build on a single global class, just as all Java Objects inherit from java.lang.Object, and that there should be clearly defined mechanisms based on properties to derive new classes from previously defined classes. This is what he calls the "A-tree". (When you hear about about making entities from trees, I challenge you not to think of Treebeard>)
  • he suggests that the units of information should be self contained "assertions" which include provenance and context information, rather than RDF triples or graphs. They should be "indivisible and immutable versions which are semantically complete and have their own provenance".
  • he proposes a publish and subscribe mechanism to make sure that current information is distributed to agents that need it.

The aspect of the A-tree proposal that will be received with most skepticism will be its one-ontology-to-rule-them-all orientation. People working on the Semantic Web are fond of their ontologies, and the thought of possibly needing to revise them all to join with the A-tree is hard to swallow. I don't know for sure whether that is in fact the case, but I can imagine ways that that necessity might be avoided.

What strikes me is that everything about the A-tree proposal seems so well worked out and sensible. I was also impressed at how Harry has tried to imagine an ecosystem in which the creation of ontologies could be accessible outside the priesthood of ontologists. As I've argued here, I believe that the Semantic Web must be defined as the establishment of a social construct and practice, and that the tools for particpation in the Semantic web need to be accessible to a wide range of users.

My guess is that Harry's work will be most useful to organizations that want to particpate in the Semantic Web, but who are serious about ensuring that the information that they receive and supply is always current and can be traced back to its source, either to judge its reliability, to build website traffic, or just to satisfy lawyers that disclaimers of liability can be reliably attached to the "grains of information" that they hope will find fertile soil to grow new understanding. If I were working to help the New York Times enter the data cloud, this is the sort of infrastructure I would want.

Now if only I can figure out which conferences Gandalf attends...


