Friday, April 17, 2009

I have seen the Semantic Web and it tweets "Temba, his arms wide!".

We actually know quite a lot about how human languages develop, if you've not read "The Language Instinct" or something in that direction then you really need to make some time for it. One thing that is known is that if you want to develop a human language, the main thing you need is a bunch of children. If you put a bunch of adults together who don't speak a common language, they start communicating using pidgin, fragments of languages mixed together with very simple grammars: "Me Tarzan, you Jane" sorts of things that never develop into true languages with complexity and expressive power and the ability to have Shakespeares or Tolstoys. But if you add children to the mix, something completely different happens. Their brains seem to be wired to invent the complexity missing from the pidgins of the adults, and the result is a creole language with all the complexity and expressiveness of any other human language.

I've recently decided that I need to understand what's going on with Twitter. If you've not tried it, don't worry too much, because I've been there and I can assure you that it's every bit as dumb an idea as it sounds like, as dumb as putting together a bunch of adults who don't speak any languages in common and expect something useful to come out. But despite the dumbness of the twitter idea, it's really interesting what is happening there. Yes, there are lots of adults and corporate entities making "Me Tarzan you Jane" noises, but a lot of people manage to approach Twitter with the child-like approach that is resulting in more complexity and expressiveness.

Twitter asks for status messages of quite a short length, 140 characters. You would think that this is a severe limit on what you can do with it, but it turns out to be a great blessing, because it forces people to creatively seek ways to build linguistic complexity and expressive power into their tweets. The result is the emergence of a new human language. In addition to an entirely new set of vocabulary imported from texting: OMG, ROTFL and the like, there are three grammatical constructs that are widely used to build the expressiveness of the 240 character tweet.
1. The "@username" construct. Used to address and reference another user.
2. The "#topic" construct, or hashtag. Used to tie in your tweet with a wider conversation.
3. The embedded hyperlink. Used to point to something on the web.

All three of these pieces of grammatical machinery are worth further discussion (I'm not sure if I'll get a chance to write about them for a week or so due to vacation!!!!) but I need to tie all this into Star Trek and the Semantic Web so I'll focus on the hyperlink part for now.

In the episode "Darmok", the USS Enterprise-D is on a mission to attempt to establish communications between the Federation and the Tamarians after several previous attempts had failed. The difficulty was that the output of the Federation's universal translators was a stream of words that didn't make sense. A typical message was "Darmok and Jalad at Tanagra" for example. As the plot unfolds, Captain Picard and a Tamarian, Dathon, are beamed down to a planet and after a violent interlude, Picard realizes that the Tamarian language is composed entirely of references to episodes in the cultures's oral history. "Darmok and Jalad at Tanagra" for example, is meant to express "Let's cooperate to face a common enemy".

Believe it or not, there is an analog to the Tamarian language that is being promoted as being the next great internet revolution. Serious people are promoting the idea that this "Semantic Web" will represent the emergence of a new kind of intelligent data network of great power. The core of the Semantic Web is a data model called "RDF", which stands for Resource Description Framework. RDF is a beautiful thing. It's based on the idea that all knowledge can be represented as a bunch of data triples, each of which is an assertion: SUBJECT-OBJECT-PREDICATE. It also goes a step further, by saying that all subjects, all objects, and all predicates can be represented by URI's (Uniform Resource Identifiers), which are essentially hyperlinks, or references to other things.

The problem with RDF is the same problem that Picard's universal translator had with Tamarian. Both Tamarian and RDF can seem to be nothing but references.

So what does Darmok have to do with Twitter? Well, it was only through intense (and ultimately fatal) interaction and imitation between Picard and Dathon that the two were able to converge the reference to the concept. Children create Creole languages only by intense imitation and play in a group. And the Semantic Web will only happen in the presence of intense community messaging and back and forth networking provided in an environment like Twitter.

Temba, his arms wide!

1 comment:

  1. Rod Page pointed me at this post which touches on some of the same themes.