Monday, October 19, 2020

We should regulate virality

It turns out that virality on internet platforms is a social hazard! 

Living in the age of the Covid pandemic, we see around us what happens when we let things grow exponentially. The reason that the novel coronavirus has changed our lives is not that it's often lethal - it's that it found a way to jump from one infected person to several others on average, leading to exponential growth. We are infected with virus without regard to the lethality of the virus, but only its reproduction rate.

For years, websites have been built to optimize virality of content. What we see on Facebook or Twitter is not shown to us for its relevance to our lives, its education value, or even its entertainment value. It shown to us because it maximizes our "engagement" - our tendency to interact and spread it. The more we interact with a website, the more money it makes, and so a generation of minds has been employed in the pursuit of more engagement. Sometimes it's cat videos that delight us, but more often these days it's content that enrages and divides us.

Our dissatisfaction with what the internet has become has led calls to regulate the giants of the internet. A lot of the political discourse has focused on "section 20"  a part of US law that gives interactive platforms such as Facebook a set of rules that result in legal immunity for content posted by users. As might be expected, many of the proposals for reform have sounded attractive, but the details are typically unworkable in the real world, and often would have effects opposite of what is intended. 

I'd like to argue that the only workable approaches to regulating internet platforms should target their virality. Our society has no problem with regulations that force restaurant, food preparation facilities, and even barbershops to prevent the spread of disease, and no one ever complains that the regulations affect "good" bacteria too. These regulations are a component of our society's immune system, and they are necessary for its healthy functioning.

never going to give you covid
Add caption

You might think that platform virality is too technical to be amenable to regulation, but it's not. That's because of the statistical characteristics of exponential growth. My study of free ebook usage has made me aware of the pervasiveness of exponential statistics on the internet. Sometime labeled the 80-20 rule, the Pareto principle, or log-normal statistics, it's the natural result of processes that grow at a rate proportional to their size. As a result, it's possible to regulate virality of platforms because only a very small amount of content is viral enough dominate the platform. Regulate that tiny amount of super-viral content, and you create incentive to moderate the virality of platforms. The beauty of doing this is that a huge majority of content is untouched by regulation.

How might this work? Imagine a law that removed a platform's immunity for content that it shows to a million people (or maybe 10 million - I've not sure what the cutoff should be). This makes sense, too; if a platform promotes illegal content in such a way that a million people see it, the platform shouldn't get immunity just because "algorithms"! It also makes it practical for platforms to curate the content for harmlessness- it won't kill off the cat videos! The Facebooks and Twitters of the world will complain, but they'll be able to add antibodies and T-cells to their platforms, and the platforms will be healthier for it. Smaller sites will be free to innovate, without too much worry, but to get funding they'll need to have plans for virality limits.

So we really do have a choice; healthy platforms with diverse content, or cesspools of viral content. Doesn't seem like such a hard decision!

Sunday, September 6, 2020

Notes on work-from-home teams

I've been working from home full-time for over eleven years - at least partly work-from-home for 20 years. I've managed work-from-home teams, and worked with quite a few others on joint projects. So when some colleagues were sharing their work-from-home experiences, I piped up with some thoughts. When I was asked recently to repeat them, I realized it might be useful to make a list for the blog. Old-style.


  1. In-person time is super-valuable. It builds a foundation for the digital interactions we're all stuck with for a while.
  2. Engineers in particular are prone to under-communicate, so a manager has to pro-actively push people to communicate more than they would on their own ...
  3. ... and create a safe environment that promotes asking for help.
  4.  Most remote workers need an extra helping of encouragement and positive reinforcement...
  5. ... doubly so for people prone to self-doubt or imposter syndrome.
  6. Worker depression is the hardest thing for a work-from-home team to manage.
  7. Trust is the most important attribute for work-from-home teams, and it has to be mutual in any type of relationship.

I think most of these are self-explanatory. In the near-term current environment, the first point is not so helpful for teams that haven't banked some in-person time; non-work activities, remote meal-sharing and happy hours are imperfect substitutes for the real thing.

The point about worker depression is worth emphasizing. It's a real hazard, often without easy mitigations. For me, daily exercise and intentional social interaction are the most effective medicine, but everyone is different. A work-from-home team needs time, space, and often support to figure out what works.

Tuesday, December 3, 2019

Your Identity, Your Library

Today, your identity on the Internet is essentially owned by the big email providers and social networks. Google, Yahoo, Facebook, Twitter - chances are you use one of these services to conveniently log into other services as YOU. You don't need to remember a new password for each service, and the service providers don't have to verify your "identity". What you gain in convenience, you lose in privacy, and that's turned out really well, hasn't it?

The "flow" you use to take advantage of this single sign-in is a "dance" that takes you from website to website and back to the site you're logging into. A similar dance occurs to secure access to resources licensed on you behalf by libraries, institutions, corporations, etc.. I wrote a bunch of articles about "RA21" (now rebranded as the vaguely NSFW "SeamlessAccess"), an effort spearheaded by STM publishers to improve the user experience of that dance. (It can be complicated and confusing because there are lots of potential dance partners!)

Henri Matisse, La danse (first version) 1909

These dance partners style themselves as "identity providers". That label makes me uncomfortable. Identity can't be something that can be stripped from you by on the whim of a megacorporation. Instead, internet identity should be woven from a web of relationships. These can be formed digitally or face-to-face, global or local, business or personal.

You'd have thunk that the whole identity-on-the-internet thing would have improved in the 13 years since that login dance was first rolled out. And you'd be almost right, because a new architecture for internet identity is now on the horizon. Made possible by many of the same technologies that are securing the internet and inflating the blockchain bubble, massively distributed and even "self-sovereign identity" are becoming real-ish.

These technologies will inevitably be applied to the access authorization problem. Access via distributed identity replaces the website-to-website dance with the presentation of some sort of signed credential. A service provider verifies the signature against the signer's public key. It's like showing a passport that can't be forged. A tricky bit is that the credential also needs to be checked against a list of revoked credentials. This would have been cumbersome even ten years ago, but distributed databases are now a mature technology, versions of which underpin the internet itself.

Interlinked with the concept of distributed identity is the notion that users of the web should be able to securely control their data, and that decisions about what a web site gets to know about you should not be delegated to advertising networks.

Unfortunately, we're not quite ready for distributed identity, in the sense that implementation for today's web would require users to install plugin software, which has its own set of usability, privacy and security issues. The ideal situation would be for some sort of standardized distributed identity and secure data management capability to be installed in browser software - Chrome, Firefox, Safari, etc.

There's a lot of work going on to make this happen.
  • ID2020 has put out an identity manifesto that starts with the declaration that "The ability to prove one’s identity is a fundamental and universal human right."
  • Tim Berners-Lee is leading the Solid Project, which let's you "move freely between services, reuse data across apps, connect with anyone, and select what you share precisely".
  • The W3C Verifiable Claims Working Group has published Technical Recommendations for "Verifiable Credential Use Cases and a "Verifiable Credential Data Model". They observe that "from educational records to payment account access, the next generation of web applications will authorize entities to perform actions based on rich sets of credentials issued by trusted parties."
  • The Sovrin Network is a "new standard for digital identity – designed to bring the trust, personal control, and ease-of-use of analog IDs – like driver’s licenses and ID cards – to the Internet."
  • Kaliya Young, Doc Searls and Phil Windley have been convening the Internet Identity Workshop twice a year since 2005 to create a community centered around internet identity. A glance at prior year proceedings gives a flavor of how much is happening in the field
The common thread here is that users, not unaccountable third parties, should be able to manage their identity on the internet, while at the same time creating a global chain of trust.

It seems to me that there's a last-mile problem with all these schemes. If identity is really a universal human right, how do we create a chain of trust that can include every human? That problem becomes a lot easier to solve if there were some sort of organization with a physical presence in communities all over, trusted by the community and by other organizations. A sort of institution experienced in managing information access and privacy, and devoted to the needs of all sorts of users.

In other words, what if "libraries" existed?

The federated authentications systems used by libraries today - Shibboleth, Athens, and related systems use a dance similar to what you do with Google or Facebook. It's a big step that moves your internet identity away from "surveillance capitalists" towards community institutions. But you still don't have control over what data your institution give away, as you will in the next-generation internet identity systems I describe here. (RA21 is no different from Shib or Athens in this respect.)

What might libraries do to prepare for the age of distributed identity? The first step is not about technology, it's about mission. I believe libraries should start to think of themselves as internet relationship providers for their communities. When I get access to a resource though my library, I won't be "logging in",  I'll be asserting a relationship with a library community, and the library will be standing behind me. Joining an identity federation is a good next step for libraries. But the library community needs to advocate for user identity as a basic human right and prepare their systems to support a future where no dancing is required.

Update 12/5/2019: revised last two paragraphs to be less mystifying.

Friday, July 26, 2019

Four-Leaf Clovers

It seems a friend of mine collects four-leaf clovers.

When I was a kid, I loved looking for four-leaf clovers in the lawn.  It was the same sort of relaxing concentration and observation you use to find a piece of a jigsaw puzzle. But one day, I found a clover plant in front of the garage that had multiple four-leaf clovers. Looking carefully, I found that not only were there four leaf clovers, but there were FIVE-LEAF-CLOVERS. I had hit the jackpot. And even a six-leaf clover!!!! I swear to all of God's integers that I even found a SEVEN leaf clover. I saved that seven leaf clover in my box of treasures for years, until I just had seven crumbling leafs of a clover.

I never looked for a four-leaf clover again.

Now, whenever I remember that clover plant (and that garage), I think of the toxins that must have caused the polyfoliate abomination.

Please don't let my story stop you looking for four-leaf clovers! Happy summer!

Thursday, May 30, 2019

Responding to Critical Reviews

The first scientific paper I published was submitted to Physical Review B, the world's leading scientific journal in condensed matter physics. Mailing in the manuscript felt like sending my soul into a black hole, except not even Hawking radiation would came back. A seemingly favorable review returned a miraculous two months later:
"I found this paper interesting, and I think it probably eventually it should be published - but only after Section II is revamped and section III clarified."
I made a few minor revisions and added some computations that had been left out of the first version, then confidently resubmitted the paper. But another two months later, I received the second review. The referee hadn't appreciated that I had deflected the review's description of "fundamental logic flaws and careless errors" that made my paper "extremely confusing". The reviewer went on to say "I do not think the authors' new variational calculation is correct" and suggested that my approach was completely wrong.
A ridiculously long equation

My thesis advisor suggested that I go and talk to Bob Laughlin in the Physics department about how to deal with the stubborn referee. I had been collaborating with Bob and one of his students on a related project, and he had become a surrogate advisor for my theoretical endeavors. During that time, Bob had acquired a reputation among my fellow students for asking merciless questions at oral exams; many of us were scared of him.

Bob's lesson on how to deal with a difficult referee turned out to be one of the most useful things I learned in grad school. Referees, he told me, come in 2 varieties, complete idiots, and not-complete-idiots. (Yes, Bob was merciless.) If your referee is a complete idiot, all you can do is ask for a different referee. If your referee has the least bit of sense, then you have to take the attitude that either the referee is somewhat correct, and you think YES-SIR MISTER REFEREE SIR! (Bob had been in the Army) and do whatever the referee says to do, or you take the point of view that you have explained something so poorly that the referee, who is an excellent representative of your target audience, had no hope of understanding it. Either way, there was a lot of work to do. We decided that this referee was not an idiot, and I needed to go back to the drawing board and re-do my calculation, figuring out how to be clearer and more correct in my exposition.

A third review came back with the lovely phrase "The significance of the calculation of section II, which is neither fish nor fowl, remains unclear." Using Bob's not-idiot rule, I recognized that my explanation was still unclear and I worked even harder to improve the paper.

My third revised version was accepted and published. Bob later won the Nobel Prize. I'm here writing blog posts for you about RA21.

RA21 received 120 mostly critical reviews from a cross-section of referees, not a single one of whom is the least bit an idiot. Roughly half the issues fell into the badly-explained category, while the other half fell in the "fundamental flaws and careless errors" category. RA21 needs to go back to the chalkboard and rethink even their starting assumptions before they can move forward with this much-needed effort.