Tuesday, August 11, 2009

Shibboleth, Google Book Search, and the Hello Kitty Diary

I don't think I can sell privacy. In fact, it's hard to think of any technology that has succeeded in the market place because of privacy attributes. Swiss banks don't count as technology; strong encryption succeeded in the market for its security attributes rather than privacy attributes. (The same is true of locks on houses- its true that they provide privacy, but people use them against thieves, not against snoops. OK, maybe curtains, but the if there's a trade-off between privacy and style in curtain technology, style usually wins. Even the Hello Kitty Electronic Password Diary does not appear to be a big commercial success.

In my post on privacy and Google Book Search, I alluded to technological solutions libraries could use to enhance patron privacy while also protecting against unauthorized access. I thought it would be useful to elaborate on this comment with some details. In general, there is no reason that privacy and security objectives can both be met in a properly engineered solution, other than the fact that it's hard to find someone willing to pay for the properly engineered solution. For example, I mentioned Shibboleth as a possible solution to providing security and privacy. Shibboleth is an open-source single-sign-on authentication system developed as part of the Internet2 project. It uses strong cryptographic techniques to delegate trust over a network, and in so doing, allows for significantly enhanced privacy.

Think about the situation where a company has licensed some content to a university. The licensor wants to make sure that only persons associated with the university are allowed to access the content. It doesn't need to know who the user is, it only needs to know that the user is properly entitled. The Shibboleth system allows the institutional user to sign in to an authentication point once using their institutional credentials, then any licensed resource can check with the central authentication point that the user is accredited by virtue of institutional affiliation. Shibboleth also allows users an institutions to disclose attributes to providers of their choosing. Attributes might include their name, preferred language, subject areas of interest, subgroup membership, etc.. Security is preserved because the institution still knows the identity of the users, and is enhanced because the Shibboleth system is designed to be much harder to defeat than competing solutions.

As far as I understand, Shibboleth would not significantlyonly slightly enhance privacy in the specific scenario created by Google Book Search, where users have to be tracked as to how much of individual books they have viewed. However, a system could be built that distributes information over a network. Here's how it would work:
  1. When the user is authenticated by the institution, a session id would be sent to Google. The session id tracks the user, but only the institution knows the identity of the user.
  2. When the user views a page in a book, Google sends a message to the institution to increment a named counter associated with the user. The name of the counter identifies a book, but only Google knows which book is associated with the counter.
  3. when the user asks to view another page, Google asks the institution for the page count associated with the book and the user, and grants access accordingly.
Such a system works to enhance privacy by storing separately the identity of the person reading the book and the identity of the book. Only if Google and the institution agree to exchange information can the reading history of an identified patron be revealed. This results in much stronger privacy even than we have in the print world. A government request for a patron's GBS reading habits would have to be made to two separate entities, probably in two different jurisdictions.

What is the likelihood that such a system can be created and adopted? On this score I am very skeptical. Who would pay for the enhanced privacy afforded by such a system? The success of a variety of Web 2.0 services seem to indicate that users are almost eager to give up privacy to gain the ability to communicate. As Randal Picker has discussed in a recent paper, consumers have significant incentives to give up their privacy to online advertising networks because doing so amounts to advertising by the consumer that results in a more efficient market. The history of Shibboleth can be used as an indicator of market behavior. Although it can provide enhanced privacy and strong security, these advantages have not been able to counteract implementation and usability costs and compared to competing technologies, and Shibboleth has not been widely adopted. When Peter Brantley raised the specific question of using Shibboleth for Google Book Search, Google's Dan Clancy commented that "Some institutions use Shiboleth and we will support this although most institutions prefer IP authentication". Google is known for putting a very high priority on usability, which is an area of significant weakness for Shibboleth.

On second thought, maybe the Swiss banks are onto something. Maybe the best target market for ultimate privacy is ultra rich people. Sergey and Larry, Warren and Bill, might I sell you a bit of privacy?
Reblog this post [with Zemanta]


  1. On Watermarking: I've not mentioned the provision discussed on the EPIC page, but library-generated session information as discussed in my post could also be used to generate the watermark required for printing from post settlement GBS institutional subscriptions.

  2. Hi Eric

    Could this not be achieved simply by Google Book Search using eduPersonTargetedID?? This is a persistent identifer used by Shibboleth to allow systems to recognise a returning user but it is not consider to be personally identifiable information so there would less of a problem with Google storing this information.


  3. Nicole- Yes, eduPersonTargetedID is exactly the mechanism that Google could use to implement GBS services, including personal saved histories, counts of pages read and things like that while allowing the person to remain unidentified by Google. What I wrote was too strong, Shibboleth would indeed provide some privacy enhancement. However, when there is a persistent identifier, all of a user's activity may get linked together, thus compromising the user's privacy. This possibility is enhanced when a provider provides many services, some of which are inherently traceable to a person (think email) and linked to a service via IP address and persistent cookies.

    Thanks for the comment (useful comments are the great joy of blogging!)

  4. Eric,

    You suggest an interesting research project: how many people actually care about privacy for its own sake? An initial hypothesis could be--not many. Most who aren't famous or otherwise rich wouldn't seem to have very many reasons to care. A lack of overall demand for privacy would seem to be a good argument to protect only valuable data, rather than personal identity. Each of us will have at least one URI, and will have to come to terms with persistent, global identifiers. An excellent question to anticipate here is, what will we learn about the necessary boundaries of privacy once global, persistent URIs for each person (place and thing) are the norm?