Monday, December 31, 2018

On the Surveillance Techno-state

I used to run my own mail server. But then came the spammers. And  dictionary attacks. All sorts of other nasty things. I finally gave up and turned to Gmail to maintain my online identities. Recently, one of my web servers has been attacked by a bot from a Russian IP address which will eventually force me to deploy sophisticated bot-detection. I'll probably have to turn to Google's recaptcha service, which watches users to check that they're not robots.

Isn't this how governments and nations formed? You don't need a police force if there aren't any criminals. You don't need an army until there's a threat from somewhere else. But because of threats near and far, we turn to civil governments for protection. The same happens on the web. Web services may thrive and grow because of economies of scale, but just as often it's because only the powerful can stand up to storms.  Facebook and Google become more powerful, even as civil government power seems to wane.

When a company or institution is successful by virtue of its power, it needs governance, lest that power go astray. History is filled with examples of power gone sour, so it's fun to draw parallels. Wikipedia, for example, seems to be governed like the Roman Catholic Church, with a hierarchical priesthood, canon law, and sacred texts. Twitter seems to be a failed state with a weak government populated by rival factions demonstrating against the other factions. Apple is some sort of Buddhist monastery.

This year it became apparent to me that Facebook is becoming the internet version of a totalitarian state. It's become so ... needy. Especially the app. It's constantly inventing new ways to hoard my attention. It won't let me follow links to the internet. It wants to track me at all times. It asks me to send messages to my friends. It wants to remind me what I did 5 years ago and to celebrate how long I've been "friends" with friends. My social life is dominated by Facebook to the extent that I can't delete my account.

That's no different from the years before, I suppose, but what we saw this year is that Facebook's governance is unthinking. They've built a machine that optimizes everything for engagement and it's been so successful that they they don't know how to re-optimize it for humanity. They can't figure out how to avoid being a tool of oppression and propaganda. Their response to criticism is to fill everyone's feed with messages about how they're making things better. It's terrifying, but it could be so much worse.

I get the impression that Amazon is governed by an optimization for efficiency.

How is Google governed? There has never existed a more totalitarian entity, in terms of how much it knows about every aspect of our lives. Does it have a governing philosophy? What does it optimize for?

In a lot of countries, it seems that the civil governments are becoming a threat to our online lives. Will we turn to Wikipedia, Apple, or Google for protection? Or will we turn to civil governments to protect us from Twitter, Amazon and Facebook. Will democracy ever govern the Internet?

Happy 2019!

Thursday, December 27, 2018

Towards Impact-based OA Funding

Earlier this month, I was invited to a meeting sponsored by the Mellon Foundation about aggregating usage data for open-access (OA) ebooks, with a focus on scholarly monographs. The "problem" is that open licenses permit these ebooks to be liberated from hosting platforms and obtained in a variety of ways. A scholar might find the ebook via a search engine, on social media or on the publisher's web site; or perhaps in an index like Directory of Open Access Books (DOAB), or in an aggregator service like JSTOR. The ebook file might be hosted by the publisher, by OAPEN, on Internet Archive, Dropbox, Github, or Libraries might host files on institutional repositories, or scholars might distribute them by email or via ResearchGate or discipline oriented sites such as Humanities Commons.

I haven't come to the "problem" yet. Open access publishers need ways to measure their impact. Since the whole point of removing toll-access barriers is to increase access to information, open access publishers look to their usage logs for validation of their efforts and mission. Unit sales and profits do not align very well with the goals of open-access publishing, but in the absence of sales revenue, download statistics and other measures of impact can be used to advocate for funding from institutions, from donors, and from libraries. Without evidence of impact, financial support for open access would be based more on faith than on data. (Not that there's anything inherently wrong with that.)

What is to be done? The "monograph usage" meeting was structured around a "provocation": that somehow a non-profit "Data Trust" would be formed to collect data from all the providers of open-access monographs, then channel it back to publishers and other stakeholders in privacy-preserving, value-affirming reports. There was broad support for this concept among the participants, but significant disagreements about the details of how a "Data Trust" might work, be governed, and be sustained.

Why would anyone trust a "Data Trust"? Who, exactly, would be paying to sustain a "Data Trust"? What is the product that the "Data Trust" will be providing to the folks paying to sustain it? Would a standardized usage data protocol stifle innovation in ebook distribution? We had so many questions, and there were so few answers.

I had trouble sleeping after the first day of the meeting. At 4 AM, my long-dormant physics brain, forged in countless all-nighters of problem sets in college, took over. It proposed a gendanken experiment:
What if there was open-access monograph usage data that everyone really trusted? How might it be used?
The answer is given away in the title of this post, but let's step back for a moment to provide some context.

For a long time, scholarly publishing was mostly funded by libraries that built great literature collections on behalf of their users - mostly scholars. This system incentivized the production of expensive must-have journals that expanded and multiplied so as to eat up all available funding from libraries. Monographs were economically squeezed in this process. Monographs, and the academic presses that published them, survived by becoming expensive, drastically reducing access for scholars.

With the advent of electronic publishing, it became feasible to flip the scholarly publishing model. Instead of charging libraries for access, access could be free for everyone, while authors paid a flat publication fee per article or monograph. In the journal world, the emergence of this system has erased access barriers. The publication fee system hasn't worked so well for monographs, however. The publication charge (much larger than an article charge) is often out of reach for many scholars, shutting them out of the open-access publishing process.

What if there was a funding channel for monographs that allocated support based on a measurement of impact, such as might be generated from data aggregated by a trusted "Data Trust"? (I'll call it the "OA Impact Trust", because I'd like to imagine that "impact" rather than a usage proxy such as "downloads" is what we care about.)

Here's how it might work:

  1. Libraries and institutions register with the OA Impact Trust, providing it with a way to identify usage and impact relevant to the library or institutions.
  2. Aggregators and publishers deposit monograph metadata and usage/impact streams with the Trust.
  3. The Trust provides COUNTER reports (suitably adapted) for relevant OA monograph usage/impact to libraries and institutions. This allows them to compare OA and non-OA ebook usage side-by-side.
  4. Libraries and institutions allocate some funding to OA monographs.
  5. The Trust passes funding to monograph publishers and participating distributors.

The incentives built into such a system promote distribution and access. Publishers are encouraged to publish monographs that actually get used. Authors are encouraged to write in ways that promote reading and scholarship. Publishers are also encouraged to include their backlists in the system, and not just the dead ones, but the ones that scholars continue to use. Measured impact for OA publication rises, and libraries observe that more and more, their dollars are channeled to the material that their communities need.

Of course there are all sorts of problems with this gedanken OA funding scheme. If COUNTER statistics generate revenue, they will need to be secured against the inevitable gaming of the system and fraud. The system will have to make judgements about what sort of usage is valuable, and how to weigh the value of a work that goes viral against the value of a work used intensely by a very small community. Boundaries will need to be drawn. The machinery driving such a system will not be free, but it can be governed by the community of funders.

Do you think such a system can work? Do you thing such a system would be fair, or at least fairer than other systems? Would it be Good, or would it be Evil?

  1. Details have been swept under a rug the size of Afghanistan. But this rug won't fly anywhere unless there's willingness to pay for a rug.
  2. The white paper draft which was the "provocation" for the meeting is posted here.
  3. I've been thinking about this for a while.