Suppose you were a publisher and you wanted to get the goods on a pirate who was downloading your subscription content and giving it away for free. One thing you could try is to trick the pirate into downloading fake content loaded with spy software and encoded information about how the downloading was being done. Then you could verify when the fake content was turning up on the pirate's website.
This is not an original idea. Back in the glory days of Napster, record companies would try to fill the site with bad files which somehow became infested with malware. Peer-to-peer networks evolved trust mechanisms to foil bad-file strategies.
I had hoped that the emergence of Sci-Hub as an efficient, though unlawful, distributor of scientific articles would not provoke scientific publishers to do things that could tarnish the reputation of their journals. I had hoped that publishers would get their acts together and implement secure websites so that they could be sure that articles were getting to their real subscribers. Silly me.
In series of tweets, Rik Smith-Unna noted with dismay that the Wiley Online Library was using "fake DOIs" as "trap URLs", URLs in links invisible to human users. A poorly written web spider or crawler would try to follow the link, triggering a revocation of the user's access privileges. (For not obeying the website's terms of service, I suppose.)
Don't know why I'm being coy - it's Wiley. Thousands of fake DOIs buried around their site. RE https://t.co/YORhwzOZPE— ⓪ Rik Smith-Unna (@blahah404) May 27, 2016
Whops, not sure if @goetheuni or just me is banned from Wiley because I accessed a fake DOI last week. https://t.co/G6MTLQEyuK— Bastian Greshake (@gedankenstuecke) June 7, 2016
Gabriel J. Gardner of Cal State Long Beach has reported his library's receipt of a scary email from Wiley stating:.@Protohedgehog @gedankenstuecke @blahah404 Yes. Isn't that illegal? Any comments @WileyOpenAccess ? pic.twitter.com/S4P22S5B0g— Sujai Kumar (@sujaik) June 7, 2016
Wiley has been investigating activity that uses compromised user credentials from institutions to access proxy servers like EZProxy (or, in some cases, other types of proxy) to then access IP-authenticated content from the Wiley Online Library (and other material). We have identified a compromised proxy at your institution as evidenced by the log file below.
We will need to restrict your institution’s proxy access to Wiley Online Library if we do not receive confirmation that this has been remedied within the next 24 hours.
I've been seeing these trap urls in scholarly journals for almost 20 years now. Two years ago they reappeared in ACS journals. They're rarely well thought out, and from talking with publishers who have tried them, they don't work as intended. The Wiley trap URLs exhibit several mistakes in implementation.
- Spider trap URLs are are useful for detecting bots that ignore robot exclusions. But Wiley's robots.txt document doesn't exclude the trap urls, so "well-behaved" spiders, such as googlebot are also caught. As a result, the fake Wiley page is indexed in Google, and because of the way Google aggregates the weight of links, it's actually a rather highly ranked page
- The download urls for the fake article don't download anything, but instead return a 500 error code whenever an invalid pseudo-DOI is presented to the site. This is a site misconfiguration that can cause problems with link checking or link verification software.
- Because the fake URLs look like Wiley DOI's, they could cause confusion if circulated. Crossref discourages this.
- The trap URLs as implemented by Wiley can be used for malicious attacks. With a list of trap URLs, it's trivial to craft an email or a web page that causes the user to request the full list of trap URLs. When the trap URLs trigger service suspensions this gives you the ability to trigger a suspension by sending the target an email.
- Apparently, Wiley used a special cookie to block the downloading. Have they not heard of sessions?
- The blocks affected both subscription and open-access content. Umm, do I need to explain the concept of "Open Access"?
- It's just not a smart idea (Even on April Fools!) for a reputable publisher to create fake article pages for "Constructive Metaphysics in Theories of Continental Drift. (warning: until Wiley realizes their ineptness, this link may trigger unexpected behavior. Use Tor.) It's an insult to both geophysicists and philosophers. And how does the University of Bradford feel about hosting a fictitious Department of Geophysics???
Instead of trap urls, online businesses that need to detect automated activity have developed elaborate and effective mechanisms to do so. Automated downloads are a billion dollar problem for the advertising industry in particular. So advertisers, advertising networks, and market research companies use coded, downloaded javascripts and flash scripts to track and monitor both users and bots. I've written about how these practices are inappropriate in library contexts. In comparison, the trap URLs being deployed by Wiley are sophomoric and a technical embarrassment.
If you visit the Wiley fake article page now, you won't get an article. You get a full dose of monitoring software. Wiley uses a service called Qualtrics Site Intercept to send you "Creatives" if you meet targeting criteria. But you'll also get that if you access Wiley's Online Library's real articles, along with sophisticated trackers from Krux Digital, Grapeshot, Jivox, Omniture, Tradedesk, Videology and Neustar.
Here's the letter I'd like libraries to start sending publishers:
[Library] has been investigating activity that causes spyware from advertising networks to compromise the privacy of IP-authenticated users of the [Publisher] Online Library, a service for we have been billed [$XXX,XXX]. We have identified numerous third party tracking beacons and monitoring scripts infesting your service as evidenced by the log file below.
We will need to restrict [Publisher]'s access to our payment processes if we do not receive confirmation that this has been remedied within the next 24 hours.Notes:
- Here's another example of Wiley cutting off access because of fake URL clicking. The implication that Wiley has stopped using trap URLs seems to be false.
- Some people have suggested that the "fake DOIs" are damaging the DOI system. Don't worry, they're not real DOI's and have not been registered. The DOI system is robust against this sort of thing; it's still disrespectful.
Update June 23:
- Tom Griffin, a spokesman for Wiley, has posted a denial to LIBLICENCE which has a tenuous grip on reality.
- Smith-Unna has posted a point-by-point response to Griffin's denial in the form of a gist .