Go To Hellman: How to check if your library is leaking catalog searches to Amazon

Thursday, December 22, 2016

How to check if your library is leaking catalog searches to Amazon

I've been writing about privacy in libraries for a while now, and I get a bit down sometimes because progress is so slow. I've come to realize that part of the problem is that the issues are sometimes really complex and technical; people just don't believe that the web works the way it does, violating user privacy at every opportunity.

Content embedded in websites is a a huge source of privacy leakage in library services. Cover images can be particularly problematic. I've written before that, without meaning to, many libraries send data to Amazon about the books a user is searching for; cover images are almost always the culprit. I've been reporting this issue to the library automation companies that enable this, but a year and a half later, nothing has changed. (I understand that "discovery" services such as Primo/Summon even include config checkboxes that make this easy to do; the companies say this is what their customers want.)

Two indications that a third-party cover image is a privacy problem are:

the provider sets tracking cookies on the hostname serving the content.
the provider collects personal information, for example as part of commerce.

For example, covers served by Amazon send a bonanza of actionable intelligence to Amazon.

Here's how to tell if your library is sending Amazon your library search data.

Setup

You'll need a web browser equipped with developer tools; I use Chrome. Firefox should work, too.

Log into Amazon.com. They will give you a tracking cookie that identifies you. If you buy something, they'll have your credit card number, your physical and electronic addresses, records about the stuff you buy, and a big chunk of your web browsing history on websites that offer affiliate linking. These cookies are used to optimize the advertisements you're shown around the web.

To see your Amazon cookies, go to Preferences > Settings. Click "Show advanced setting..." (It's hiding at the bottom.)

Click the "Content settings.." button.

Now click the "All cookies and site data" button.

in the "Search cookies" box, type "amazon". Chances are, you'll see something like this.

I've got 65 cookies for "amazon.com"!

If you remove all the cookies and then go back to Amazon, you'll get 15 fresh cookies, most of them set to last for 20 years. Amazon knows who I am even if a delete all the cookies except "x-main".

Test the Library

Now it's time to find a library search box. For demonstration purposes, I'll use Harvard's "Hollis" catalog. I would get similar results at 36 different ARL libraries, but Harvard has lots of books and returns plenty of results. In the past, I've used What to expect as my search string, but just to make a point, I'll use Killing Trump, a book that Bill O'Reilly hasn't written yet.

Once you've executed your search, choose View > Developer > Developer Tools

Click on the "Sources" tab and to see the requests made of "images.amazon.com". Amazon has returned 1x1 clear pixels for three requested covers. The covers are requested by ISBN. But that's not all the information contained in the cover request.

To see the cover request, click on the "Network" tab and hit reload. You can see that the cover images were requested by a javascript called "primo_library_web" (Hollis is an instance of Ex Libris' Primo discovery service.)

Now click on the request you're interested in. Look at the request headers.

There are two of interest, the "Cookie" and the "Referer".

The "Cookie" sent to Amazon is this:

x-main="oO@WgrX2LoaTFJeRfVIWNu1Hx?a1Mt0s";
skin=noskin; session-token="bcgYhb7dksVolyQIRy4abz1kCvlXoYGNUM5gZe9z4pV75B53o/4Bs6cv1Plr4INdSFTkEPBV1pm74vGkGGd0HHLb9cMvu9bp3qekVLaboQtTr+gtC90lOFvJwXDM4Fpqi6bEbmv3lCqYC5FDhDKZQp1v8DlYr8ZdJJBP5lwEu2a+OSXbJhfVFnb3860I1i3DWntYyU1ip0s="; x-wl-uid=1OgIBsslBlOoArUsYcVdZ0IESKFUYR0iZ3fLcjTXQ1PyTMaFdjy6gB9uaILvMGaN9I+mRtJmbSFwNKfMRJWX7jg==; ubid-main=156-1472903-4100903;
session-id-time=2082787201l;
session-id=161-0692439-8899146

Note that Amazon can tell who I am from the x-main cookie alone. In the privacy biz, this is known as "PII" or personally identifiable information.

The "Referer" sent to Amazon is this:

http://hollis.harvard.edu/primo_library/libweb/action/search.do?fn=search&ct=search&initialSearch=true&mode=Basic&tab=everything&indx=1&dum=true&srt=rank&vid=HVD&frbg=&tb=t&vl%28freeText0%29=killing+trump&scp.scps=scope%3A%28HVD_FGDC%29%2Cscope%3A%28HVD%29%2Cscope%3A%28HVD_VIA%29%2Cprimo_central_multiple_fe&vl%28394521272UI1%29=all_items&vl%281UI0%29=contains&vl%2851615747UI0%29=any&vl%2851615747UI0%29=title&vl%2851615747UI0%29=any

To put this plainly, my entire search session, including my search string killing trump is sent to Amazon, alongside my personal information, whether I like it or not. I don't know what Amazon does with this information. I assume if a government actor wants my search history, they will get it from Amazon without much fuss.

I don't like it.

Rant

[I wrote a rant; but I decided to save it for a future post if needed.] Anyone want a Cookie?

Notes 12/23/2016:

As Keith Jenkins noted, users can configure Chrome and Safari to block 3rd Party cookies. Firefox won't block Amazon cookies, however. And some libraries advise users to not to block 3rd party cookies because doing so can cause problems with proxy authentication.
If Chrome's network panel tells you "Provisional headers are shown" this means it doesn't know what request headers were really sent because another plugin is modifying headers. So if you have HTTPS Everywhere, Ghostery, Adblock, or Privacy Badger installed, you may not be able to use Chrome developer tools to see request headers. Thanks to Scott Carlson for the heads up.
Cover images from Google leak similar data; as does use of Google Analytics. As do Facebook Like buttons. Et cetera.
Thanks to Sarah Houghton for suggesting that I write this up.

Update 3/23/2017:

There's good news in the comments!

29 comments:

Charlie ByersDecember 22, 2016 at 10:40 AM
That's great information! If my library is running Primo, what's the most helpful thing I can tell them about changing this behavior? Is there a 'dont' do that' flag somewhere in the Primo configuration?
ReplyDelete
Replies
JeremyDecember 22, 2016 at 2:32 PM
Thanks Eric, a problem I wasn't even aware existed. On a somewhat related note, I get a Facebook certificate error when I view this page...
ReplyDelete
Replies
Justin HoenkeDecember 28, 2016 at 12:36 PM
Thanks for this! I went through the process but did not see any "images.amazon.com" on the Sources tab...does that mean that we're in the clear? I hope so!
ReplyDelete
Replies
UnknownDecember 30, 2016 at 5:39 PM
This is all good information but surely a huge problem is that this blog is hosted on Google - and therefore Google is both tracking all the users to this blog (including myself) and all their other usage activity.

For all these issues it's culture change. We use Google Analytics on websites because it's quick and easy, we use blogger sites for the same reason. Libraries use Amazon cover images because it's free hosting for enhanced content on their sites and most users don't seem to care.

But to highlight the issue on a blogging platform that is leaking user information all over the place does seem like the height of hypocrisy.
ReplyDelete
Replies
JRSJanuary 3, 2017 at 10:34 PM
Evergreen ILS can be counted in the not effected column. It caches cover art server side, so the client request is always to the Evergreen server. And I don't think amazon cover art is even supported, since the last time I checked it was against Amazon's TOS (granted that was several years ago, I remember something about how the use of the images must be used to drive traffic to Amazon.com as a requirement.).
ReplyDelete
Replies
banerjekJanuary 16, 2017 at 1:33 PM
It is good to be mindful that "free" services that libraries and users alike depend on typically are funded with data about users.

It is also good to keep things in perspective. It's a safe bet that the Internet access that libraries routinely provide hemorrhages much more sensitive patron data than this.
ReplyDelete
Replies
Gustav LindqvistMarch 14, 2017 at 8:56 AM
Thought I'd update that ExLibris decided to proxy all their requests for Book Covers for Primo, which solves the problem for all their users.
ReplyDelete
Replies
UnknownMarch 23, 2017 at 11:12 AM
As we communicated to the entire Primo customer community a few weeks ago I would like to update on the measures Ex Libris already took.
In order to protect privacy in Primo searches, we have redirected all requests for book covers from third party providers such as Amazon and Google through a proxy on the Ex Libris cloud data center. This way, there is no transfer of client IP data or cookies to these providers’ systems.
This solution was rolled out to all cloud environments during February.

Yuval Kiselstein
Director of Product Management,
Ex Libris Discovery and Delivery solutions
yuval.kiselstein@exlibrisgroup.com
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Go To Hellman

Thursday, December 22, 2016

How to check if your library is leaking catalog searches to Amazon

Setup

Test the Library

Rant

Notes 12/23/2016:

Update 3/23/2017:

29 comments:

Blog Archive

Popular Posts

Me

Go To Hellman Fan Page

Labels

Go To Hellman

Thursday, December 22, 2016

How to check if your library is leaking catalog searches to Amazon

Setup

Test the Library

Rant

Notes 12/23/2016:

Update 3/23/2017:

29 comments:

Blog Archive

Popular Posts

Subscribe To

Me

Go To Hellman Fan Page

Labels