Saturday, September 5, 2009

RDF Properties on Magic Shelves

Book authors and politicians who go on talk shows, whether it's the Daily Show, Charlie Rose, Fresh Air, Oprah, Letterman, whatever, seem to preface almost every answer with the phrase "That's a really good question, (Jon|Teri|Stephen|Conan)". The Guest never says why it's a good question because real meaning of that phrase is "Thanks for letting me hit one out of the ballpark." Talk shows have so little in common with baseball games or even tennis matches. On the rare occasion when a guest doesn't adhere to form, the video goes viral.

I've been promising to come back to my discussion of Martha Yee's questions on putting bibibliographic data on the semantic web. Karen Coyle has managed to discuss all of them at least a little bit, so I'm picking and choosing just the ones that interest me. In this post, I want to talk about Martha's question #11:
Can a property have a property in RDF?
The rest of my post is divided into two parts. First, I will answer the question, then in the second part, I will discuss some of the reasons that it's a really good question.

Yes, a property can have a property in RDF. In the W3C Recommentation entitled RDF Semantics, it states: "RDF does not impose any logical restrictions on the domains and ranges of properties; in particular, a property may be applied to itself." So not only can a property have a property in RDF, it can even use itself as a property!

OK, that's done with. Not only is the answer yes, but it's yes almost to the point of absurdity. Why would you ever want a property to be applied to itself? How can a hasColor property have a hasColor property? If you read and enjoyed Gödel, Escher, Bach, you're probably thinking that the only use for such a construct is to define a self-referential demonstration of Gödel's Incompleteness Theorem. But there actually are uses for properties which can be applied to themselves. For example, if you want to use RDF properties to define a schema, you probably want to have a "documentation" property, and certainly the documentation property should have its own documentation.

If you're starting to feel queasy about properties having properties, then you're starting to understand why Yee question 11 is a good one. Just when you think you understand the RDF model as being blobby entities connected by arcs, you find out that the arcs can have arcs. Our next question to consider is whether properties that have properties accomplish what someone with a library metadata background intends them to accomplish, and even if they do so, is it the right way to accomplish it?

In my previous post on the Yee questions, I pointed out that ontology development is a sort of programming. One of most confusing concepts that beginning programmers have to burn into their brains is the difference between a class and an class instance. In the library world, there are some very similar concepts that have been folded up into a neat hierarchy in the FRBR model. Librarians are familiar with expressions of works that can be instantiated in multiple manifestations, each of which can be instantiated in multiple items. Each layer of this model is an example of the class/instance relationship that is so important for programmers to understand. This sort of thinking needs to be applied to our property-of-a-property question. Are we trying to apply an property to an instance of a property, or do we want to apply properties to property "classes"?

Here we need to start looking at examples, or else we will get hopelessly lost in abstraction-land. Martha's first example is a model where the dateOfPublication is a property of a publishedBy relationship. In this case, what we really want is a property instance from the class of publishedBy properties that we modify with a dateOfPublication property. Remember, there is a URI associated with the property piece of any RDF triple. If we were to simply hang a dateOfPublication on a globally defined publishedBy we would have made that modification for every item in our database using the publishedBy attribute. That's not what we want. Instead, for each publishedBy relation we wanted to assert, we need to create a new property, with a new URI, related to publishedBy using the RDF Schema property subPropertyOf.

Let's look at Martha's other example. She wants to attach a type to her variantTitle property to denote spine title, key title, etc. In this case, what we want to do is create global properties that retain variantTitleness while making the meaning of the metadata more specific. Ideally, we would create all our variant title properties ahead of time in our schema or ontology. As new cataloguing data entered our knowledgebase, our RDF reasoning machine would use that schema to infer that spineTitle is a variantTitle so that a search on variantTitle would automatically pick up the spineTitles.

Is making new properties by adding a property to a subproperty the right way to do things? In the second example, I would say yes. The new properties composed from other properties make the model more powerful, and allow the data expression to be simpler. In the first example, where a new property is composed for every assertion, I would say no. A better approach might be to make the publication event a subject entity with properties including dateOfPublication, publishedBy, publishedWhat, etc. The resulting model is simpler, flatter, and more clearly separates the model from the data.

We can contrast the RDF approach of allowing new properties to be created and modified by other properties to that of MARC. MARC makes you to put data in fields and subfields and subfields with modifiers, but the effect is sort of like having lots of dividers on lots shelves on a bookcase- there's one place for each and every bit of data- unless there's no place. RDF is more like a magic shelf that allows things to be in several places at once and can expand to hold any number of things you want to put there.

"Thanks for having me, Martha, it's been a real pleasure."
Reblog this post [with Zemanta]

2 comments:

  1. Thank you so much, Eric, for this explanation; I’m feeling a little dizzy (smile), but I will keep ruminating and perhaps it will clear up over time. In the mean time, I have a new and more complicated (I think?) example for you. How would you model the following?

    In my model, work is a class and concept is a class, but the subject relationship between the work and the concept is a property. I felt I had to do this because a work can have a subject relationship with a person (a book about Shakespeare) or another work (a book about Hamlet) (person and work are all classes) as well as with a concept or an object. However, a work can have a subject relationship with more than one concept or object and it would be helpful if we could specify what the relationship is between the two or more concepts or objects the work is about. An example would be a book on the effect of water pollution on fish. You could just create a subject relationship between water pollution (concept) and the work and another subject relationship between fish (object) and the work, but the subject access you provide your users would be richer if you could say that the relationship between water pollution and fish was an ‘effect on’ relationship. Am I right in thinking that we need the relationship property ‘has-subject’ as in ‘work A has-subject water pollution’ and ‘work A has-subject fish’ and then that relationship property needs to have the property ‘subject-to-subject-relationship-effect-on?’ Can you see a simpler way to model this in RDF? Or are we pushing the limits of RDF and demonstrating that perhaps it is not the best vehicle for our data?

    By the way, I take your point about distinguishing between a class and a class instance. There are lots of books that are about the effect of one thing on another. Because this pattern exists, would it not be wise to create a relationship property in the model to accommodate it?

    ReplyDelete
  2. Thanks for the comment Martha!

    I think the answer is that there are a number of ways to model the effect-of-water-pollution-on-fish subject relationship in an RDF model. I would lean towards saying 'work A has-subject (water pollution has-effect-on fish)' where the part in parentheses is reified either explicitly or implicitly as part of an ontology describing effects.

    It's actually a good example of something that's awkward in a classical flat taxonomy; LCSH becomes horrific for computers when it tries to do this.

    ReplyDelete