|A Maginot Line fortification. Photo from the US Army.|
At the same time, "big data" has clouded the picture of what constitutes sensitive data. The correlation of digital library use with web activity outside the library can impact privacy in ways that never would occur in a physical library. For example, I've found that many libraries unknowingly use Amazon cover images to enrich their online catalogs, so that even a user who is completely anonymous to the library ends up letting Amazon know what books they're searching for.
Recently, I've been serving on the Steering Committee of an initiative of NISO to try to establish a set of principles that libraries, providers of services to libraries, and publishers can use to support privacy patron privacy. We held an in-person meeting in San Francisco at the end of July. There was solid support from libraries, publishers and service companies for improving reader privacy, but some issues were harder than others. The issues around data collection and use attracted the widest divergence in opinion.
One approach that was discussed centered on classifying different types of data depending on the extent to which they impact user privacy. This also the approach taken by most laws governing privacy of library records. They mostly apply only to "Personally Identifiable Information" (PII), which usually would mean a person's name, address, phone number, etc., but sometimes is defined to include the user's IP address. While it's important to protect this type of information, in practice this usually means that less personal information lacks any protection at all.
I find that the data classification approach is another Maginot privacy line. It encourages the assumption that collection of demographics data – age, gender, race, religion, education, profession, even sexual orientation – is fair game for libraries and participants in the library ecosystem. I raised some eyebrows when I suggested that demographic groups might deserve a level of privacy protection in libraries, just as individuals do.
OCLC's Andrew Pace gave an example that brought this home for us all. When he worked as a librarian at NC State, he tracked usage of the books and other materials in the collection. Every library needs to do this for many purposes. He noticed that materials placed on reserve for certain classes received little or no usage, and he thought that faculty shouldn't be putting so many things on reserve, effectively preventing students not taking the class from using these materials. And so he started providing usage reports to the faculty.
What this example illustrates is that libraries MUST collect at least SOME data that impinges on reader privacy. If reader privacy is to be protected, a "privacy impact assessment" must be made on almost all uses of that data. In today's environment, users expect that their data signals will be listened to and their expressed needs will be accommodated. Given these expectations, building privacy in libraries is going to require a lot of work and a lot of thought.