Wednesday, June 12, 2024

The PII Figleaf

The Internet's big lie is "we respect your privacy". Thanks to cookie banners and such things, the Internet tells us this so many times a day that we ignore all the evidence to the contrary. Sure, there are a lot of people who care about our privacy, but they're often letting others violate our privacy without even knowing it. Sometimes this just means that they are trying to be careful with our "PII". And guess what? You know those cookies you're constantly blocking or accepting? Advertisers like Google have mostly stopped using cookies!!!

fig leaf covering id cards

"PII" is "Personally Identifiable Information" and privacy lawyers seem to be obsessed with it. Lawyers, and the laws they care about, generally equate good PII hygiene with privacy. Good PII hygiene is not at all a bad thing, but it protects privacy the same way that washing your hands protects you from influenza. Websites that claim to protect your privacy are often washing the PII off their hands while at the same time coughing data all over you. They can and do violate your privacy while at the same time meticulously protecting your PII.

Examples of PII include your name, address, social security number, your telephone number and your email address. The IP address that you use can often be traced to you, so it's often treated as PII, but often isn't. The fact that you love paranormal cozy romance novels is not PII, nor is the fact that you voted for Mitt Romney. That you have an 18 year old son and an infant daughter is also not PII. But if you've checked out a paranormal cozy romance from your local library, and then start getting ads all over the internet for paranormal cozy romances set in an alternate reality where Mitt is President and the heroine has an infant and a teenager, you might easily conclude that your public library has sold your checkout list and your identity to an evil advertising company.

That's a good description of a recent situation involving San Francisco Public Library (SFPL). As reported by The Register :

In April, attorney Christine Dudley was listening to a book on her iPhone while playing a game on her Android tablet when she started to see in-game ads that reflected the audiobooks she recently checked out of the San Francisco Public Library.

Let me be clear. There's no chance that SFPL has sold the check-out list to anybody, much less evil advertisers. However, it DOES appear to be the case that SFPL and their online ebook vendors, Overdrive and Baker and Taylor, could have allowed Google to track Ms. Dudley, perhaps because they didn't fully understand the configuration options in Google Analytics. SFPL offers ebooks and audiobooks from Overdrive, "Kindle Books from Libby by Overdrive",  and ebooks and audiobooks from Baker and Taylor's "Boundless" Platform. There's no leakage of PII or check-out list, but Google is able to collect demographics and interests from the browsing patterns of users with Google accounts.

A few years ago, I wrote an explainer about how to configure Google Analytics to protect user privacy.  That explainer is obsolete, as Google is scrapping the system I explained in favor of a new system, "Google Analytics 4" (GA-4), that works better in the modern, more privacy-conscious browser environment. To their credit, Google has made some of the privacy-preserving settings the default - for example, they will no long store IP addresses. But reading the documentation, you can tell that they're not much interested in Privacy with a capital P as they want to be able to serve relevant (and thus lucrative) ads, even if they're for paranormal cozy romances. And Google REALLY doesn't want any "PII"! PII doesn't much help ad targeting, and there are places that regulate what they can do with PII.

We can start connecting the dots from the audiobook to the ads from the reporting in the Register by understanding a bit about Google Analytics. Google Analytics helps websites measure their usage. When you visit a webpage with Google Analytics, a javascript sends information back to one or more Google trackers about the address of the webpage, your browser environment, and maybe more data that the webpage publisher is interested in. Just about the only cookie being set these days is one that tells the website not to show the cookie banner!

From the Register:

The subdomain SFPL uses for library member login and ebook checkout, sfpl.bibliocommons.com, has only a single tracker, from Alphabet, that communicates with the domains google-analytics.com and googletagmanager.com.

The page is operated by BiblioCommons, which was acquired in 2020 by Canada-based Constellation Software. BiblioCommon has its own privacy policy that exists in conjunction with the SFPL privacy policy.

In response to questions about ad trackers on its main website, Wong( acknowledged that SFPL does use third-party cookies and provides a popup that allows visitors to opt-out if they prefer.

With regard to Google Analytics, she said that it only helps the library understand broad demographic data, such as the gender and age range of visitors.

"We are also able to understand broad interests of our users, such as movie, travel, sports and fitness based on webpage clicks, but this information is not at all tied to individual users, only as aggregated information," said Wong.

The statement from Jaime Wong, deputy director of communications for the SFPL, is revealing. The Google Analytics tracker only works within a website, and neither SFPL or its vendors are collecting demographic information to share with Google. But Google Analytics has options to turn on the demographic information that libraries think they really need. (Helps to get funding, for example.) It used to be called "Advertising Reporting Features" and "Remarketing" (I called these the "turn off privacy" switches) but now it's called "Google Signals". It works by adding the Google advertising tracker, DoubleClick, alongside the regular Analytics tracker. This allows Google to connect the usage data from a website to its advertising database, the one that stores demographic and interest information. This gives the website owners access to their user demographics, and it gives the Google advertising machine access to the users' web browsing behavior.

I have examined the relevant webpages from SFPL, as well as the customized pages that BiblioCommons, Overdrive, and Baker and Taylor provide for SFPL for trackers. Here's what I found:

  • The SFPL website, SFPL.org, has Analytics and  DoubleClick ad trackers enabled.
  • The BiblioCommons website, sfpl.bibliocommons.org, has two analytics trackers enabled, but no advertising tracker. Probably one tracker "belongs" to SFPL while the other "belongs" to BiblioCommons.
  • The Overdrive website, sfpl.overdrive.com has Analytics and DoubleClick ad trackers enabled.
  • The Baker and Taylor website, sfpl.boundless.baker-taylor.com has Analytics and  DoubleClick ad trackers enabled.

So it shouldn't be surprising that Ms. Dudley experienced targeted ads based on the books she was looking at in the San Francisco Public Library website. Libraries and librarians everywhere need to understand that reader privacy is not just about PII, and that the sort of privacy that libraries have a tradition of protecting is very different than the privacy that Google talks about when it says  "Google Analytics 4 was designed to be able to evolve for the future and built with privacy at its core." At the end of this month earlier versions of Google Analytics will stop "processing" data. (I'm betting the trackers will still fire!)

What Google means by that is that in GA-4, trackers continue to work despite browser restrictions on 3rd party cookies, and the tracking process is no longer reliant on data like IP addresses that could be considered PII. To address those troublesome regulators in Europe, they only distribute demographic data and interest profiles for people who've given their permission to Google to do so. Do you really think you haven't somewhere given Google permission to collect your demographic data and interest profiles? You can check here

Here's what Google tells Analytics users about the ad trackers:

When you turn on Google signals, Google Analytics will associate the session data it collects from your site and apps with Google's information from accounts of signed-in, consented users. By turning on Google signals, you acknowledge you adhere to the Google Advertising Features Policy, including rules around sensitive categories, have the necessary privacy disclosures and rights from your end users for such association, and that such data may be accessed and deleted by end users via My Activity.

In plain english, that means that if a website owner flips the switch, it's the website's problem if the trackers accidentally capture PII or otherwise violate privacy, because it's responsible for asking for permission. 

Yep. GA-4 is engineered with what I would call "figleaf privacy" at its core. Google doesn't have fig leaves for paranormal cozy romance novels!


0 comments:

Contribute a Comment

Note: Only a member of this blog may post a comment.