Wednesday, July 29, 2009

The Illusion of Internet Identity

You've certainly heard of Arthur C. Clarke's Third Law, "Any sufficiently advanced technology is indistinguishable from magic", which says more about magic and our perceptions of the world than it does about technology. When technology does something that is not natural to us, we of course perceive it to be supernatural. But what happens when technology approximates something so natural to us that we don't even perceive that there's anything remarkable? Then we attribute powers to the technology that just don't exist. Just as we can perceive emotions in a stuffed teddy bear, it is only with difficulty that we avoid anthropomorphizing technologies. Do you have a cute name for your car? Do you refer to your GPS as "Lola"? If you have not done so, try the web version of ELIZA, and see if you can avoid thinking of ELIZA as a real person. It's very hard for us to understand how complicated the act of carrying on a real conversation really is- we do it all the time. Even a profoundly retarded technology will be imbued with magic if its function is sufficiently mundane.

I've been reading a paper by Patrick Hayes and Harry Halpin and a presentation by Pat Hayes, both with the unfortunate title "In Defense of Ambiguity". The paper provides a wonderful review of the theory of identity. I've been living very happily, doing productive work in the world of identifiers without ever knowing that identity needed to have a theory behind it. In retrospect, I've managed to do this by staying away from the difficult bits.

After reading Hayes and Halpin, I've come to realize what a miracle human communication is. The fact that I can meet someone with whom I share no languages and that we can exchange our own names and establish names for things may seem simple, but it's something that machines cannot do. For example, I may gesture at myself and say "Eric" to establish my identity. If I then gesture at a banana and say "banana", it's very likely that my counterpart will understand that I have not established an identity for the banana, but rather I have given a name for the kind of fruit. This is possible because people have brains that are similarly wired- our brains are wired to recognize individual people but not individual bananas (though sometimes our knowledge models diverge). Our computers on the other hand, have no fruit wiring, or individual person wiring, so establishment of identity is very hard for them.

The difficulty of teaching computers to identify things has not stopped us from using them to build elaborate identity systems. Hayes and Halpin observe that internet identity can only be established by description, and description is inherently ambiguous. Attempts to make real-world-object identifiers global or to add description actually make the situation worse, by increasing ambiguity. In our daily lives, ambiguity in our communications is mostly not a problem. When I say the word "rose" a listener will almost never be confused between the flower and the verb. I can say "rows of rosebushes" and only rarely will people hear "rose of rosebushes". Our brains are so good at using context to resolve ambiguity that we don't realize how hard it is for computers to do the same thing.

That the situation becomes worse with added description was a bit hard for me to absorb, because at first it seems that the better you define something, the less ambiguous your statements about it become. But it's not true for computers. Suppose my internet identity description added a physical description of me- for example the fact that I have blonde hair. That might help to identify me under certain circumstances, but then when my hair turns gray, it makes my identification more tenuous. You could say that I had blonde hair on a particular date, but then you'd need to add a physical model for hair color to your internet identity system. In actual fact, the added description might help a human to identify me, but it hurts a computer's efferts to establish my identity.

The Hayes and Halpin paper was written in the context of the "http-range" semantic web controversy that I touched upon in my post on the semantics of redirection. They argued that the http protocol is not the right place to put establishments of identity, and that the description model is better suited to do that. As I understand it, the Hayes-Halpin view did not prevail with the W3C TAG; "ambiguity" was not a great concept for people to rally around, I guess.

The internet identity systems I've worked with revolved around identifying things in libraries- books, serials, and articles. For the most part, these sorts of objects do not usually present deep identification quandaries, and so I've not noticed my ignorance of identity theory. For example, most people imagine that computers use ISBNs to identify books (or as I imagined in my last post, that computers use ISBNs to identify items in bookstores). Most often this illusion does not get us into any trouble, just as the illusion that teddy bears have feelings is mostly harmless. Computers are wired to deal with records in data files, and they use ISBNs (with frequent success) to identify and match records in data files, that's all. The rest is just software trickery.

It's interesting to note that the ISBN was not developed by the library community. It was developed by a statistics professor named Gordon Foster for the British Publishers Association. Librarians lived without identifier systems for many years and were content with the library equivalents of an address or locator system. It's as if librarians have intuitively known something that the architects of the semantic web have only recently struggled with- that we can aspire to build description systems and access systems, but building a system that can provide identity is more difficult than it looks like to a human.


  1. Eric, I am SO GLAD you've got the time to do these thoughtful posts. I'm trying to read all of the articles and books you mention -- just finished Innovator's Dilemma and now have this article to read. If I have comments after reading I will post them. (Could we start a reading club? Interesting tech reading with discussion?)

  2. Great post! I've poked around those issues for quite a while at
    See e.g.,