Friday, March 21, 2025

AI bots are destroying Open Access

There's a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet. And the technologists defending against this broad-based attack are doing everything they can to preserve their outlets while trying to remain true to the mission of providing the digital lifeblood of science and culture to the world.

Yes, many of these beloved institutions are under financial pressures in the current political environment, but politics swings back and forth. The AI armies are only growing more aggressive, more rapacious, more deceitful and ever more numerous.

I'm talking about the voracious hunger of AI companies for good data to train Large Language Models (LLMs). These are the trillion-parameter sets of statistical weights that power things like Claude, ChatGPT and hundreds of systems you've never heard of. Good training data has lots of text, lots of metadata, is reliable and unbiased. It's unsullied by Search Engine Optimization (SEO) practitioners. It doesn't constantly interrupt the narrative flow to try to get you to buy stuff. It's multilingual, subject specific, and written by experts. In other words, it's like a library.

At last week's Code4lib conference hosted by Princeton University Library, technologists from across the library world gathered to share information about library systems, how to make them better, how to manage them, and how to keep them running. The hot topic, the thing everyone wanted to talk about, was how to deal with bots from the dark side.

robot head emoji with eyes of sauron

Bots on the internet are nothing new, but a sea change has occurred over the past year. For the past 25 years, anyone running a web server knew that the bulk of traffic was one sort of bot or another. There was googlebot, which was quite polite, and everyone learned to feed it - otherwise no one would ever find the delicious treats we were trying to give away. There were lots of search engine crawlers working to develop this or that service. You'd get "script kiddies" trying thousands of prepackaged exploits. A server secured and patched by a reasonably competent technologist would have no difficulty ignoring these.

The old style bots were rarely a problem. They respected robot exclusions and "nofollow" warnings. The warning helped bots avoid volatile resources and infinite parameter spaces. Even when they ignored exclusions they seemed to be careful about it. They declared their identity in "user-agent" headers. They limited the request rate and number of simultaneous requests to any particular server. Occasionally there would be a malicious bot like a card-tester or a registration spammer. You'd often have to block these based on IP address. It was part of the landscape, not the dominant feature.

The current generation of bots is mindless. They use as many connections as you have room for. If you add capacity, they just ramp up their requests. They use randomly generated user-agent strings. They come from large blocks of IP addresses. They get trapped in endless hallways. I observed one bot asking for 200,000 nofollow redirect links pointing at Onedrive, Google Drive and Dropbox. (which of course didn't work, but Onedrive decided to stop serving our Canadian human users). They use up server resources - one speaker at Code4lib described a bug where software they were running was using 32 bit integers for session identifiers, and it ran out!

The good guys are trying their best. They're sharing block lists and bot signatures. Many libraries are routinely blocking entire countries (nobody in china could possibly want books!) just to be able to serve a trickle of local requests. They are using commercial services such as Cloudflare to outsource their bot-blocking and captchas, without knowing for sure what these services are blocking, how they're doing it, or whether user privacy and accessibility is being flushed down the toilet. But nothing seems to offer anything but temporary relief. Not that there's anything bad about temporary relief, but we know the bots just intensify their attack on other content stores.

direct.mit.edu  Verifying you are human. This may take a few seconds. direct.mit.edu needs to verify the security of your connection before proceeding. Verification is taking longer than expected. Check your internet connection and refresh the page if the issue persists.
The view of MIT Press's Open-Access site from the Wayback Machine.

The surge of AI bots has hit Open Access sites particularly hard, as their mission conflicts with the need to block bots. Consider that Internet Archive can no longer save snapshots of one of the best open-access publishers, MIT Press because of cloudflare blocking. (see above) Who know how many books will be lost this way?  Or consider that the bots took down OAPEN, the worlds most important repository of Scholarly OA books, for a day or two. That's 34,000 books that AI "checked out" for two days. Or recent outages at Project Gutenberg, which serves 2 million dynamic pages and a half million downloads per day. That's hundreds of thousands of downloads blocked! The link checker at doab-check.ebookfoundation.org (a project I worked on for OAPEN) is now showing 1,534 books that are unreachable due to "too many requests". That's 1,534 books that AI has stolen from us! And it's getting worse.

Thousands of developer hours are being spent on defense against the dark bots and those hours are lost to us forever. We'll never see the wonderful projects and features they would have come up with in that time.

The thing that gets me REALLY mad is how unnecessary this carnage is. Project Gutenberg makes all its content available with one click on a file in its feeds directory. OAPEN makes all its books available via an API. There's no need to make a million requests to get this stuff!! Who (or what) is programming these idiot scraping bots? Have they never heard of a sitemap??? Are they summer interns using ChatGPT to write all their code? Who gave them infinite memory, CPUs and bandwidth to run these monstrosities? (Don't answer.)

We are headed for a world in which all good information is locked up behind secure registration barriers and paywalls, and it won't be to make money, it will be for survival. Captchas will only be solvable by advanced AIs and only the wealthy will be able to use internet libraries.

Or maybe we can find ways to destroy the bad bots from within. I'm thinking a billion rickrolls?

Notes:

  1. I've found that I can no longer offer more than 2 facets of faceted search. Another problematic feature is "did you mean" links. AI bots try to follow every link you offer even if there are a billion different ones.
  2. Two projects, iocaine and nepenthes are enabling the construction of "tarpits" for bots. These are automated infinite mazes that bots get stuck in, perhaps keeping the bots occupied and not bothering anyone else. I'm skeptical.
  3. Here is an implementation of the Cloudflare Turnstyle service (supposedly free) that was mentioned favorably at the conference.
  4. It's not just open access, it's also Open Source.
  5. Cloudflare has announced an "AI honeypot". Should be interesting.
  6. One way for Open Access site to encourage good bot behavior is to provide carrots to good robots. For this reason, it would be good to add Common Crawl to greenlists: https://commoncrawl.org/ccbot
  7. Ian Mulvaney (BMJ) concurs
















Tuesday, February 11, 2025

Strava Verse

strava route that looks like an elephant
The internet gives us new ways to express ourselves. One of the more strenuously esoteric forms of artistic expression is Strava art, in which people do runs that, when mapped, draw pictures. None of my strava art was particularly good, but my running club friends in Stockholm regularly run "elefanten". I spent a year attempting "Found Strava Art", where you just run a new route and give the run a name based on what it looks like. I ran a lot of flowers and space ships, but meh. Last year I named each run with a line of a song that came up on my iPod. Too obscure.

This year I decided to serialize poems with my Strava runs. I didn't have a plan, but I started with Jabberwocky. It seemed appropriate to comment using nonsense words, because, Jabberwocky. I ended up with this:

’Twas brillig, and the slithy toves did gyre and gimble in the wabe
I love running with my slithy toves!
All mimsy were the borogoves, and the mome raths outgrabe.
My right knee was a grobble mimsy today, but mome what a rath!  
Beware the Jabberwock, my son!
Also, the Jabberrun can be hard on the knees.
The jaws that bite, the claws that catch!
ERC hosted run had quiche to bite and George to catch.

He took his vorpal sword in hand
New York Sirens game. Women with vorpal sticks. Slain by the Charge 3-2.
Beware the Jubjub bird, and shun the frumious Bandersnatch!
Definitely well salted and frumious out there today.
Long time the manxome foe he sought
But quick the manxless chill he caught
So rested he by the Tumtum tree
Covered with snow in filagree
And stood a while in thought.
Though clabbercing in a profunctional dot!

And, as in uffish thought he stood
Trolloping thru the Brookdale wood.
The Jabberwock, with eyes of flame
Cheld and hord, a glistering name…
Came whiffling through the tulgey wood
And caught the two burblygums because he could.
And burbled as it came!
So late the Jabberrun slept
For Eight Muyibles passed as though aflame
O'er Curbles and Nonces the pluffy sheep leapt.

One, two! One, two! And through and through
Three four! Three four! Sankofa’s coffee’s fit to pour.
The vorpal blade went snicker-snack!
The Icebeest of Hoth kept blobbering back.
He went galumphing back.
He left it dead, and with its head
... the Garmind sprang to life

And hast thou slain the Jabberwock?
The ice, the snow, it's hard as rock.
Come to my arms, my beamish boy!
Think of my knees! Oy oy oy oy.
O frabjous day! Callooh! Callay!”
O jousbarf night! The fluss! The fright!
He chortled in his joy.
(And padoodled the rest of of the way!)

‘Twas brillig and the slithy toves
Did not, had not, could not loave.
Did gyre and gimble in the wabe
“Dunno.” said the wormly autoclave
All mimsy were the borogoves,
Again and again, beloo and aboave
And the mome raths outgrabe.
The end. Ooh ooh Babe!

Terrible right? But it has its moments.

I've started a new one. I fear it will get more topical.

Notes:

Tuesday, November 12, 2024

Thank you, New York City

A smiling Eric, next to a sign for "TCS New York City Marathon", the verrazano Narrows Bridge against a pink morning sky in the background.
fresh off the bus
It was 11:15AM in the pink D corral of the fifth wave, and surrounding me were runners of all shapes and sizes, from around the world, all of us waiting for our race to start in 15 minutes. We had waited through the morning (five hours for me) as our faster friends drifted away excitedly and cannons sounded the starts of earlier waves. There was a determined silence as each of us thought ahead to our 2024 New York City Marathon.

A few meters to my right I saw a woman wearing a large pink button proclaiming her status as a "Birthday Girl". Her shirt had the name "HEATHER" across the front. I shouted "HAPPY BIRTHDAY HEATHER!", and she turned to look at me, a bit startled. I walked over and we chatted a bit. She was from the UK, and was running New York to celebrate turning 50. I told her she was going to have fun, and that the crowd would be calling to her the whole way. "Really?" she said. "Hey, this is New York", I reassured her. "You don't have to know someone 10 years before you can talk to them on a first name basis!"

Then, over to the side of the corral, I saw another woman, wearing a BIRTHDAY GIRL shirt. "Heather, you must go over and wish her happy birthday!" Heather hesitated, but I said "Aw come on!" and led her through the crowd to the other birthday girl. The two marathon twins hugged, and everything felt right with the world. I looked around and the crowd seemed a bit anxious waiting. I shouted "Hey everyone! We have two birthday girls running with us! Let's sing Happy Birthday!"

And so I led a happy chorus of more than a thousand runners in a joyful rendition of "Happy Birthday". Miraculous. My whole day was like that. From start to end, the crowd was shouting my name. They got riled up when I acknowledged them, sometimes chanting "ERIC, ERIC, ERIC" as I gave them high fives. 

I had decided to run the 2024 New York City Marathon about ten months earlier. A friend heard me talk about running and suggested that I get a fundraising entry through the charity he was involved with. At that point I had just run my 11th Half Marathon but never a marathon. A marathon seemed an unnecessary stretch for me and my creaky legs. But I decided in an instant. Two days later I told a running friend, Janell, and a few others about my decision. I knew I couldn't back out after that.

Eric is running, wearing a "Team Amref" singlet, an orange "81 flies on" cap, and blue Fleet Feet "Running changes everything"compression sleeves.
still looking good at mile 9
The first 10 miles of the race flew by as I ran at a pace that was faster than I expected (I was doing a 3:1 run:walk). Axel and Karen were there rooting for me at mile 9 with my Fleet Feet friends and then again around mile 12. The crowd on 1st Avenue at mile 16 made me forget that I had never raced that far.  More running friends were waiting at mile 18 where it really helped. At mile 21 my 3:1 cycle became 2:1, and at mile 23 it was 1:1. On Fifth Avenue it seemed like everyone I knew was there cheering me on. The bearded prophet with "The End is Near" on a sign could have been a hallucination. Coming out of the Bronx I had switched to my running playlist, and in the Park I started "singing" the lyrics out loud: "It's the End of the World and We Know It!". I wasn't feeling that fine and I switched to 100% brisk walk.

Re-entering the park for the last half mile, I was determined to finish it running. BIG MISTAKE! I cramped up immediately and could barely stagger on. But after a few minutes, my legs consented to a sloooow walk and finally relented on a brisk finish. Then a second miracle occurred. I knew I had friends who were volunteering at the finish line, but to see and hug them all was a blessing I had not expected. And to get the medal from my friend Janell! 

Back of the medal with braille text "TCS New York City Marathon"

Thank you to everyone who donated to my fundraiser for Amref Health Africa. Thank you to Karen and Axel for getting me home with my cramping legs. Thank you to the coaches, runners and PTs who helped my get through the training. Thank you to all the spectators and to the volunteers who got me from the start to the finish, and thank you to the zombies that trudged with me for the long long long walk out of the park. 


Strava: All my friends are in New York

This series of posts:

Thursday, October 10, 2024

I Fondled Salvador Dalí's Earrings

 Content Warning: AI

My Uncle Henry was a Professor of Chemistry at NYU. He lived, for the most part, in his sister-in-law Barbara's 7-story townhouse on East 67th street in Manhattan. He acted as the caretaker of this mansion when Barbara went off living her socialite life in Paris or wherever. My family would stay in the townhouse whenever we came to New York to visit my favorite uncle.

This is how my parents ended up being at a fancy party attended by Salvador Dalí. It seems that Barbara had commissioned a portrait of herself, and the occasion of the party was the painting's unveiling. I was there too; I was a few months old. The great painter was amused to see a baby at this party and the baby was extremely amused at this strange looking adult. More accurately, I was captivated by his shiny earrings and reached out to play with them as though they were a mobile hanging in my crib. Or so I have been told. So many times.

A surrealist figure resembling Salvador Dalí, dressed in an eccentric outfit with a curled mustache and large, ornate earrings. A baby is playfully tugging on the ornate earrings
Dalí and Eric as hallucinated by DALL-E

My dad was presented to Dalí as a brilliant young engineer, which he was. Dad was born in Gary, Indiana, but moved to Sweden with his family when he was 7 years old. (That's a whole 'nother story!) After graduation from the Royal Institute of Technology in Stockholm, he decided to take a job with Goodyear Aerospace in Akron, Ohio, because that way he didn't have to serve in the Swedish Army and give up his American citizenship. He worked on semiconductor devices before anyone had ever heard of semiconductors.

Maybe brilliant engineers were exotic creatures in that fancy New York City party circuit, because Salvador Dalí buttonholed my dad. He wanted my dad to invent something for him. The conversation went something like this (imagine me sitting in Dalí's lap, not paying attention to the conversation at all):

Dalí: "Tell me, young man, do you invent things?"

Dad: "As a matter of fact, I'm working on what they call a buffered amp..."

Dalí: "Never mind that, I have an idea I want you to work on..."

Dad: "Yes?"

Dalí: "I want you to invent a paint gun..."

Dad: "That doesn't sound too hard..."

Dalí: "... that will paint what I see in my mind."

Dad: "??"

Dalí: "I paint, but the paintings are never what I want."

Dad: "That's not how..."

Dalí: "I want to press a button and have the paint go in the right place."

Dad: "Well maybe someday..."

Dalí: "You start working on it, let me know how it goes"

Eric: "Waaaaaaaaa!"

Apparently, the paint gun was a bit of an obsession with Dalí. He created a technique called "bulletism" that involved using an antique gun (an "arquebus") to shoot vials of paint at a canvas. A couple of months after the fancy party, he appeared on the Ed Sullivan show firing a paint gun at a canvas! 

Sixty-four years later, we sort of know how to build Dalí's mind reading paint-gun. We have technologies that let us see the brain think (functional brain imaging combined with deep learning), and technologies that can make pictures from human thoughts (when expressed as LLM prompts). It's now easy to imagine a device that uses your brain to control an AI image generator (see the image above!). Such a device could take advantage of the brain's plasticity to give Dalís of the future the power to make images from activity that exists only in their brains.

People are arguing about whether AI can make art. There's even a copyright case in which the US copyright office is saying, effectively, that you can't copyright what you tell an AI to create.

It seems clear to me, at least, that AI, wielded as a tool, can make art, in the same way that a Stradivarius, wielded by a musician, can make art, or that a camera, wielded by a photographer, can make art, or that computer program, wielded by a poet, can make art. 

Salvador Dalí was just ahead of his time. 

Notes:

  1. While OpenAI's "DALL-E" is supposed to be a combination of "Dalí" And "WALL-E", I've not been able to find any mention of Dalí's interest in brain-computer interfaces!
  2. I couldn't find an image of the painting "Portrait of Bobo Rockefeller" on the web; a study for the painting is in the Dalí Museun in Spain. Dalí had a policy of not allowing his subjects to see their portrait before is was unveiled, and my understanding is that Barbara was never really fond of the painting. It had an prominent place in her living room though.
  3. Researchers have studied the use of brain-scanning techniques to develop brain-computer interfaces for uses such as the development of speech prostheses that convert brain activity into intelligible speech. 
  4. Openwater is combining infrared and acoustic imaging to see brain activity for neurological diagnosis. But they can see the potential for mind reading using the help of deep learning pattern recognition. Founder May Lou Jepsen says “I think the mind-reading scenarios are farther out, but the reason I'm talking about them early is because they do have profound ethical and legal implications.” 
Comments. I encourage comment on the Fediverse or on Bluesky. I've turned off commenting here.

Reminder: I'm earning my way into the NYC Marathon by raising money for Amref Health Africa. 

Wednesday, August 7, 2024

Running away from home

(I'm blogging my journey to the 2024 New York Marathon.)

For a long time, it's been a goal of mine to live and work someplace where the language is something other than English. I've studied French in school and I've studied a bit of Mandarin and Japanese. And Swedish. But I'd never had the opportunity to live in another language, to get comfortable enough to have casual conversations and say the things I want to say.

Two years ago (2022) my Aunt Siv planned an 80th birthday celebration for herself, inviting the whole family to join her for a party in Lappland (northern Sweden). Coming out of two long pandemic years, we were eager to go and travel. There was still a lot of uncertainty about Covid, and with the invasion of Ukraine adding to the feeling that the trip might or might not happen, we booked refundable tickets for a vacation in Sweden. 

Swedish was my first language! My parents both grew up in Sweden, but met and married in Ohio. My mom's teenaged sister Siv came over to help my mom with the baby (me) so there was a lot of Swedish in the house. When I started going to nursery school I quickly learned English, and began refusing to speak Swedish. By the time I got to kindergarten, I had completely forgotten all of my Swedish language. But traces remained. After college I decided I should learn Swedish and I took a class in Stockholm. Learning Swedish was completely different from learning French in school, because I could hear in my head if it was right. After one day of class, I could speak 2 sentences of perfect Swedish. I confidently went into a shop, used my 2 perfect sentences, and got into deep trouble because I had no clue what the answers meant. I had a good accent without much trying. This has been very helpful, because when swedes hear a foreigner try to speak Swedish, they immediately switch to English, making it rather difficult for the foreigner to learn. Not me. Swedish people are amazed that I seem to be able to speak good Swedish.

I wanted to improve my Swedish, so I wanted a little longer in Sweden than the rest of the family, and our planning took its final shape when my wife said "Eric, you should just stay! For years you been saying you want to live somewhere in another language, and now the internet lets you work from where ever you want!" So all of a sudden I was going to spend four weeks in Stockholm on my own without much of a plan. I was scared. How would I meet people? Sure, I could sit in my AirBnB and work as a digital nomad, but what would be the point?

Running was one of the answers. There was a half-marathon to run, RUNmaröloppet,  that would take me out to an island in Stockholm's archipelago. I had identified a running club, Mikkeller Running Club Stockholm,   that seemed sociable, as they meet at a bar on Tuesdays and have beers afterward. Both of these turned out to be awesome. And so I started running away from home. 

Running with a group is universal and local at the same time. No matter where you run you can have the same conversations with whoever's running next to you. "Are you training for a race?" "My legs are so stiff." "I'm recovering from an IT-band strain." "My name is Eric, have we run together before?" But every route you run is different in its own beautiful way, and the group helps  newcomers (and often the regulars!) to avoid getting lost. By the end of the run, the group has shared an indelible experience and there aren't strangers anymore.

RUNmaröloppet was a blast. You have to take a boat to the island. The course is quite technical in places and is also the most beautiful race I've ever run. I did it again this year, and finished 5th in my age group, despite a lingering knee injury that force me to use walk-run again. Full disclosure: I also finished DFL (Dead F-in Last) out of 282 runners, and was never so happy with a finish.

Mikkeller Running Club Stockholm meets every Tuesday on the lively urban island of Södermalm. Good people, good beer, 5K, 7K and longer routes. The 5K is at a "cozy" pace and welcomes runners of all paces. (Linguistic note: back home we call it "sexy" pace. Maybe this has deep sociological meaning. Or maybe it's the conversion from km to mi.) 


In Stockholm I discovered this thing called ParkRun.  These people have taken "running away from home" to extremes. ParkRun started somewhere in England and has spread around the world like a pandemic. They have special t-shirts to commemorate milestones such as a runner's 100th ParkRun. I've now run the ParkRun in Stockholm's Haga Park 6 times. It's a timed 5K run. At every run there are people from all over the world - last week I met a couple from Sheffield who had hopped off their cruise ship and took a taxi to the ParkRun so they could add Sweden to their list of ParkRun countries.  Some of them even try to run ParkRun places starting with every letter of the alphabet! I love how crazy runners can be.


My Stockholm 2022 sojourn was topped off by a 10K race around Södermalm called "Midnattsloppet".  Midnattsloppet is sort of a night-time EuroPop Bay-to-Breakers. 22,000 runners in the 10K, another 17K in the 5K. There was a musical act every kilometer to fire up the runners but only two water stations on that pretty warm night. At the top of the first big hill, there was a choir of ~20 blonde women singing “Waterloo” which I thought a poor choice given the pre-ABBA history of Waterloo. The faster waves of runners got “We are the Champions”. At the start, runners were prompted to sing a song which apparently is the anthem of the Hammarby Football Club, written by a guy who must have been the guitarist for a Swedish Spinal Tap. Apparently he caused a scandal by wearing a "69" T-shirt on Swedish television and sadly died at a young age. On Midnattsloppet night you can walk into any bar in Stockholm in a shirt dripping with sweat and the bouncer will say "Good Jobb!". (I verified this.)

I now have a pair of ruby red New Balance 1080 version 12s. (NOT v13!) My running gait is such that there's a flat wear spot where my feet click together. There's no place like home. There's no place like home.





This series of posts: