Friday, May 18, 2018

The Shocking Truth About RA21: It's Made of People!

Useful Utilities
logo from 2004
When librarian (and programmer) Chris Zagar wrote a modest URL-rewriting program almost 20 years ago, he expected the little IP authentication utility would be useful to libraries for a few years and would be quickly obsoleted by more sophisticated and powerful access technologies like Shibboleth. He started selling his program to other libraries for a pittance, naming this business "Useful Utilities", fully expecting that it would not disrupt his chosen profession of librarianship.

He was wrong. IP address authentication and EZProxy, now owned and managed by OCLC, are still the access management mainstays for libraries in the age of the internet. IP authentication allows for seamless access to licensed resources on a campus, while EZProxy allows off-campus users to log in just once to get similar access. Meanwhile, Shibboleth, OpenAthens and similar solutions remain feature-rich systems with clunky UIs and little mainstream adoption outside big rich publishers, big rich universities and the UK, even as more distributed identity technologies such as OAuth and OpenID have become ubiquitous thanks to Google, Facebook, Twitter etc.

from My Book House, Vol. I: In the Nursery, p. 197.
So how long will the little engines that could keep chugging? Not long, if the folks at RA21 have their way. Here are some reasons why the EZProxy/IP authentication stack needs replacement:

  1. IP authentication imposes significant administrative burdens on both libraries and publishers. On the library side, EZProxy servers need a configuration file that knows about every publisher  supplying the library. It contains details about the publisher's website that the publisher itself is often unaware of! On the publisher side, every customer's IP address range must be accounted for and updated whenever changes occur. Fortunately, this administrative burden scales with the size of the publisher and the library, so small publishers and small institutions can (and do) implement IP authentication with minimal cost. (For example, I wrote a Django module that does it.)
     
  2. IP Addresses are losing their grounding in physical locations. As IP address space fills up, access at institutions increasingly uses dynamic IP addresses in local, non-public networks. Cloud access points and VPN tunnels are now common. This has caused publishers to blame IP address authentication for unauthorized use of licensed resources, such as that by Sci-Hub. IP address authentication will most likely get leakier and leakier.
     
  3. Men Monsters in the middle are dangerous, and the web is becoming less tolerant of them. EZProxy acts as a "Man Monitor in the Middle", intercepting web traffic and inserting content (rewritten links) into the stream. This is what spies and hackers do, and unfortunately the threat environment has become increasingly hostile. In response, publishers that care about user privacy and security have implemented website encryption (HTTPS) so that users can be sure that the content they see is the content they were sent.

    In this environment, EZProxy represents an increasingly attractive target for hackers. A compromised EZProxy server could be a potent attack vector into the systems of every user of a library's resources. We've been lucky that (as far as is known) EZProxy is not widely used as a platform for system compromise, probably because other targets are softer.

    Looking into the future, it's important to note that new web browser APIs, such as service workers, are requiring secure channels. As publishers begin to make use these API's, it's likely that EZProxy's rewriting will unrepairably break new features.

So RA21 is an effort to replace IP authentication with something better. Unfortunately, the discussions around RA21 have been muddled because it's being approached as if RA21 is a product design, complete with use cases, technology pilots, and abstract specifications. But really, RA21 isn't a technology, or a product. It's a relationship that's being negotiated.

What does it mean that RA21 is a relationship? At its core, the authentication function is an expression of trust between publishers, libraries and users. Publishers need to trust libraries to "authenticate" the users for whom the content is licensed. Libraries need to trust users that the content won't be used in violation of their licenses. So for example, users are trusted keep their passwords secret. Publishers also have obligations in the relationship, but the trust expressed by IP authentication flows entirely in one direction.

I believe that IP Authentication and EZProxy have hung around so long because they have accurately represented the bilateral, asymmetric relationships of trust between users, libraries, and publishers. Shibboleth and its kin imperfectly insert faceless "Federations" into this relationship while introducing considerable cost and inconvenience.

What's happening is that publishers are losing trust in libraries' ability to secure IP addresses. This is straining and changing the relationship between libraries and publishers. The erosion of trust is justified, if perhaps ill-informed. RA21 will succeed only if creates and embodies a new trust relationship between libraries, publishers, and their users. Where RA21 fails, solutions from Google/Twitter/Facebook will succeed. Or, heaven help us, Snapchat.

Whatever RA21 turns out to be, it will add capability to the user authentication environment. IP authentication won't go away quickly - in fact the shortest path to RA21 adoption is to slide it in as a layer on top of EZProxy's IP authentication. But capability can be good or bad for parties in a relationship. An RA21 beholden to publishers alone will inevitably be used for their advantage. For libraries concerned with privacy, the scariest prospect is that publishers could require personal information as a condition for access. Libraries don't trust that publishers won't violate user privacy, nor should they, considering how most of their websites are rife with advertising trackers.

It needn't be that way. RA21 can succeed by aligning its mission with that of libraries and earning their trust. It can start by equalizing representation on its steering committee between libraries and publishers (currently there are 3 libraries, 9 publishers, and 5 other organizations represented; all three of the co-chairs represent STEM publishers.) The current representation of libraries omits large swaths of libraries needing licensed resources. MIT, with its Class A huge IP address block, has little in common with my public library, the local hospital, or our community colleges. RA21 has no representation of Asia, Africa, or South America, even on the so-called "outreach" committee. The infrastructure that RA21 ushers in could exert a great deal of power; it will need to do so wisely for all to benefit.

To learn more...
Thanks to Lisa Hinchliffe and Andromeda Yelton for very helpful background.

Would you let your kids see an RA21 movie?
_______________

Update 5/17/2019: A year later, the situation is about the same

4 comments:

  1. Hi Eric

    This is a good summary of many of the key points about RA21. I particularly like the point already picked out by others: "RA21 will succeed only if creates and embodies a new trust relationship between libraries, publishers, and their users". This is why these discussions are so important - a new trust relationship can only be built through dialogue and a better understanding of the issues by everyone.

    Some comments:

    1. "Shibboleth, OpenAthens and similar solutions remain feature-rich systems with clunky UIs". It's not correct to group together all the SAML products and services you linked to in this way. I can't speak for all of them but Shibboleth has no native UI, while OpenAthens has an intuitive UI built on 20 years' experience of serving librarians - see https://youtu.be/Joibg_Mnkpc?t=1049

    2. "little mainstream adoption outside big rich publishers, big rich universities and the UK". There are SAML federations all over the world serving large and small academic and research institutions alike; as well as the UK, those in the Netherlands, Germany, France and Japan are particularly active.

    Outside of those sectors, OpenAthens has brought this technology to other types of organisation: the UK National Health Service and the Department of Veterans Affairs are using the same access routes, as are departments of the US Army, Navy & Air Force and aerospace, chemical and pharmaceutical companies around the world.

    3. These points are not intended to be critical; much of this stuff is not easy to detect as almost by definition, SAML is middleware - if you can see it, it probably means something is broken!

    And I don't mean 'broken' only in a technical sense. One of RA21's starting points was the variable terminology and language on publisher websites ("Login via Shibboleth", "Select your Federation" etc). That was a collective implementation failure which the RA21 project is encouraging publishers to rethink.

    As part of those changes, we have been trying to help build a better ecosystem by encouraging publishers to drop their "Login via OpenAthens" links for some time now, because it's no longer required.

    Thanks for an excellent contribution to the discussion.

    Phil

    ReplyDelete
    Replies
    1. Thanks for these comments, Phil.

      1. I'd love to hear your views about "why we're still using IP Authentication".

      2. "mainstream adoption" ≠ "adoption in many niches".

      3. I feel your pain.

      Also, given our agreement around the importance of user trust, I wonder if you can explain why reading the OpenAthens privacy notice subjects the user to 17 different advertising trackers, none of which are disclosed in said privacy notice!

      Delete
  2. Hi Eric

    1. IP *recognition* is still being used because there is a mixed economy of tools. That won't change in the short-term.

    2. I'm not sure everyone would agree with your characterisation of the UK National Health Service as "niche"(!)

    3.1 The OpenAthens privacy policy makes clear references to cookies and other analytical tools (https://openathens.org/privacy/).
    3.2 The site I'm posting this comment on uses tracking cookies and I can't find a link to a privacy policy.
    3.3 RA21 is not trying to change the way the internet works.

    ReplyDelete
    Replies
    1. Thanks for your comments.

      1. Good point to call it "recognition" or "filtering".

      2. Unless I've not understood Brexit, the UK National Health Service is also not outside "UK".

      3.1. ...so we're all good.
      3.2 I should stop being mean to commenters. But to be fair, my blog makes no claims, legal or not, about taking your personal information seriously. It's on google! But there is a blog license agreement.
      3.3 Certainly RA21 is trying to change the way libraries work. Just because the internet has become one big surveillance machine doesn't mean libraries must join in.

      Delete

Note: Only a member of this blog may post a comment.