Wednesday, April 29, 2009

RDF and Twitter: Compare and Contrast

As I wrote previously, RDF was developed with the idea that it would be the backbone of something called the "semantic web", which was supposed to be different from the world-wide web in that machines would be able to transmit and "understand" information from global network. In contrast, Twitter was developed with the idea that people would need to document their sad pathetic lives in 240 character chunks. On this date, however, the Twitterverse seems to be an intelligent global network that can transmit and understand almost anything, and the RDF-based semantic web seems to still be convinced of a need for agents to transmit dribbles of sad, pathetic knowledge in an endless stream of subject-object-predicate triples.

It's interesting to compare the core data models for RDF and for Twitter. In RDF, the fundamental particles are, as I've said, subject-object-predicate triples. To recast that last sentence into the RDF model, we would proceed as follows:
 Assertion:
subject: RDF
object: subject-object-predicate triples
predicate: has fundamental particles of type
That's probably too self-referential for most people to wrap their heads around, so instead I'll change the example:
 Assertion:
subject: The United States
object: Barack Obama
predicate: has a president named
I usually have trouble remembering which is the predicate and which is the object. If you think about it, however, you can express the same particle of knowledge in ways that swap the roles of predicate and object, or even subject and predicate. For example:
 Assertion:
subject: Barack Obama
object: President of the United States
predicate: has the office of
In your copious spare time, you can work out the other 4 permutations.

Now let's look at Twitter. The particle of information in Twitter, the tweet, seems also to be a triple:
 Tweet:
tweeter: gluejar
message: going to bed now!
time: Wed, 29 Apr 2009 06:58:01 +0000
The tweeter in turn has associated with it sets of followed users and followers as well as profile information. There's a lot to talk about here, and in a previous post I pointed out that Twitter message content is becoming richer and more linguistically complex. But the point I'd like to make for now is that twitter's point of view is that it doesn't care so much about what the message is saying as who is saying it and when it was said. The more we look at the RDF examples above, the more the subject-object-predicate representation of knowledge seems limiting. The assertion may be true or false depending on when it was said; assertions removed from the context of who is making the assertion are for the most part useless because machines have no way to know whether to trust the assertion.

Friend-of-the-blog Jeff Young asserts that the OpenURL data model can be thought of as answering 6 questions: Who, What, Where, When, Why and How. Whatever success Twitter has achieved can be thought of as an argument that the most important of these are the Who, What and When.

Sanity Alert! the following may be mind-blowing to certain susceptible individuals: the data model that Twitter REALLY uses to propagate tweets is RSS and Atom. These formats are decended from what was originally called "Meta Content Format" which became "RDF Site Summary" (Yes, the very same RDF!) which became "Really Simple Syndication" or maybe something else, I'm not sure for sure. Here's how Twitter REALLY feeds into the semantic web:
  tweet:
title: gluejar: going to bed now!
description: gluejar: going to bed now!
pubDate: Wed, 29 Apr 2009 06:58:01 +0000
guid: http://twitter.com/gluejar/statuses/1649740567
link: http://twitter.com/gluejar/statuses/1649740567

Exercise for the reader- how does this look in Atom?

Does anyone but me think that there's something weird going on here?

4 comments:

  1. Apples and oranges. And fighter jets and goldfish. "...it doesn't care so much about what the message is saying as who is saying it and when it was said." So what is the point of comparing and contrasting two information technologies with such different goals? That's like saying that XML is better than relational databases because it's better at representing the relationship of inline markup to prose sentences, or that relational databases are better than XML because they scale so well when you want to track data that fits well into normalized tables.

    The things that Twitter serialization formats do well (track who is saying something, a simple string to represent what they said, and when they said it) is trivial in RDF--many have done it in RDF (http://bit.ly/bR3lbs)--and the things that RDF does well have nothing to do with the goals of Twitter. So, again, what is the point of comparing and contrasting two information technologies with such different goals?

    ReplyDelete
  2. bobducharme- Given that the goals are so different- isn't it of interest that the data models are similar? Isn't it interesting to see how the ecosystems surrounding twitter and the semantic web have evolved so differently?

    Now that Twitter has announced annotations, we'll see increasing overlap of function and capability of the two; they are still very different beasts, but maybe they'll begine to compete for sustenance.

    ReplyDelete
  3. Twitter is a messaging service and a schema. RDF is a format for data representation. They can't compete, because they operate at completely different levels.

    In fact, Twitter may use RDF.

    And a tweet is obviously not a triple, but a collection of them:

    Assertion:
    subject: tweet
    object: gluejar: going to bed now!
    predicate: content
    Assertion:
    subject: tweet
    object: Wed, 29 Apr 2009 06:58:01 +0000
    predicate: date
    Etc.

    ReplyDelete
  4. One could imagine to use Twitter as an ontology collaborative workbench where everyone can propose RDF triples and others can confirm (retweet), infirm, make different proposals or comments (answer), etc. Anybody knows somebody who tried that?

    ReplyDelete