Thursday, May 28, 2009

Part 3: Reification Considered Harmful

About two years ago, we had some landscaping done on our yard. Part of the work was to replace the crumbling walkways. I suggested making some of the walkways curved, to create more esthetic shapes for the garden beds in front of our house. The landscape designer we were working with suggested that I reconsider, because it is well known in the landscape design world that people never, ever, follow a curved path. Studies have been made using hidden cameras showing that people always walk in straight paths, no matter what the landscape design tries to coax them into doing. The usual result of a curved pathway is the creation of a footworn path that makes the curved path straight. As you can see, we took our designer's advice.

This is the third part in a series of posts on reification. In Part 1, I tried to explain what reification is; in my second post I gave some examples of how to use reification using RDFa. In a philosophical interlude on truth on the internet, I made it pretty clear why I think it's really important to include and retain sourcing and provenance information whenever you try to collect information from the internet. In this part 3, I promised to discuss the pros and cons of reification. I lied. RDF Reification has been nothing but disastrous for the semantic web. The problem is that RDF tries to lead implementers along a strangely curved path if they want to do the "right" thing and keep track of sourcing and provenance of the knowledge loaded into a triple-store. I have a strong suspicion that no one, anywhere, ever in the history of RDF, has made significant use of the reification machinery. I have asked a fair number of semantic web implementers and none of them have ever used reification.

Semantic Web implementers certainly don't ignore the imperatives of sourcing and provenance, but what they do instead of using reification is to make the equivalent of straight worn dirt paths. Typically they won't use pure triple stores, instead treating triples as first class data objects that can be joined to separate tables with provenance information, or else they build knowledge models which make the provenance and source explicit, as do Google's models for reviews that they are supporting in RDFa.

Alternatively, Semantic Web implementers may choose to ignore the retention of provenance and sourcing and treat their RDF triple-store as a pristine, never-changing, collection of truth. For many applications, this works quite well. It rapidly becomes unworkable if it is required to merge many sources of information. RDF works great for the collection, transmission and processing of unchanging, unpolluted, uncontroversial knowledge; on this blog, I will from now on refer to this sort of information as UnKnowledge.

To my mind, there is a deeper problem with reification. and that relates to what an RDF triple really means. My view is that an RDF triple means absolutely nothing, and that it is only the action of asserting a triple that has meaning. The deep problem with reification is that it's hard to do, and thus nobody does it. It also forces implementers to think too much about semantics, and thinking too much about semantics is always a bad thing. Too often you end up dizzy like a dog chasing its tail.

The RDF working group has produced an entire document trying to clarify what the semantics of RDF are. Here is an example paragraph to study:
The semantic extension described here requires the reified triple that the reification describes - I(_:xxx) in the above example - to be a particular token or instance of a triple in a (real or notional) RDF document, rather than an 'abstract' triple considered as a grammatical form. There could be several such entities which have the same subject, predicate and object properties. Although a graph is defined as a set of triples, several such tokens with the same triple structure might occur in different documents. Thus, it would be meaningful to claim that the blank node in the second graph above does not refer to the triple in the first graph, but to some other triple with the same structure. This particular interpretation of reification was chosen on the basis of use cases where properties such as dates of composition or provenance information have been applied to the reified triple, which are meaningful only when thought of as referring to a particular instance or token of a triple.
I've read that sentence over and over again; I've finally concluded that it is an example of steganography. Here is how I have decoded it:
the semantic extension described HEre requires the reified tripLe that the reification describes - i(_:xxx) in the above example - to be a Particular token or InstAnce of a triple in a (real or notional) rdf docuMent, rAther than an 'abstract' triPle consideRed as a grammatIcal form. there could be Several such entities which have the same subject, predicate and Object properties. although a graph is defiNed as a set Of tRiples, several such tokens wIth the same triple structure might occur i different documents. thus, it would be meNAningful to Claim thAt the blank node in the second Graph abovE does not refer to the triPLE in the first grAph, but to Som other tERiplE with the Same struCtUrE. THIS Particular interpretatiOn Of Reification was choSen On the basis of Use cases where properties such as dates of composition or provenance information have been appLied to the reified triple, whicH are mEaningfuL only when thought of as referring to a Particular instance or token of a triple.
I'll try to suggest some ways that we might rescue RDF and the Semantic Web in a future post.


  1. Man... Finally some RDF-related humor.

  2. ...also, sorry that you had to wait 4 years for some appreciation.