Monday, February 9, 2015

"Passwords are stored in plain text."

Many states have "open records" laws which mandate public disclosure of business proposals submitted to state agencies. When a state library or university requests proposals for library systems or databases, the vender responses can be obtained and reviewed. When I was in the library software business, it was routine to use these laws to do "competitor intelligence". These disclosures can often reveal the inner workings of proprietary vendor software which implicate information privacy and security.

Consider for example, this request for "eResources for Minitex". Minitex is a "publicly supported network of academic, public, state government, and special libraries working cooperatively to improve library service for their users in Minnesota, North Dakota and South Dakota" and it negotiates licenses databases for libraries throughout the three states.

Question number 172 in this Request for Proposals (RFP) was: "Password storage. Indicate how passwords are stored (e.g., plain text, hash, salted hash, etc.)."

To provide context for this question, you need to know just a little bit of security and cryptography.

I'll admit to having written code 15 years ago that saved passwords as plain text. This is a dangerous thing to do, because if someone were to get unauthorized access to the computer where the passwords were stored, they would have a big list of passwords. Since people tend to use the same password on multiple systems, the breached password list could be used, not only to gain access to the service that leaked the password file, but also to other services, which might include banks, stores and other sites of potential interest to thieves.

As a result, web developers are now strongly admonished never to save the passwords as plain text. Doing so in a new system should be considered negligent, and could easily result in liability for the developer if the system security is breached. Unfortunately many businesses would rather risk paying paying lawyers a lot of money to defend themselves should something go wrong than bite the bullet and pay some engineers a little money now to patch up the older systems.

To prevent the disclosure of passwords, the current standard practice is to "salt and hash" them.

A cryptographic hash function mixes up a password so that the password cannot be reconstructed. so for example, the hash of 'my_password' is 'a865a7e0ddbf35fa6f6a232e0893bea4'. When a user enters their password, the hash of the password is recalculated and compared to the saved hash to determine whether the password is correct.

As a result of this strategy, the password can't be recovered. But it can be reset, and the fact that no one can recover the password eliminates a whole bunch of "social engineering" attacks on the security of the service.

Given a LOT of computer power, there are brute force attacks on the hash, but the easiest attack is to compute the hashes for the most common passwords. In a large file of passwords, you should be able to find some accounts that are breachable, even with the hashing. And so a "salt" is added to the password before the hash is applied. In the example above, a hash would be computed for 'SOME_CLEVER_SALTmy_password'. Which, of course, is '52b71cb6d37342afa3dd5b4cc9ab4846'.

To attack the salted password file, you'd need to know that salt. And since every application uses a different salt, each file of salted passwords is completely different. A successful attack on one hashed password file won't compromise any of the others.

Another standard practice for user-facing password management is to never send passwords unencrypted. The best way to do this is to use HTTPS, since web browser software alerts the user that their information is secure. Otherwise, any server between the user and the destination server (there might be 20-40 of these for  typical web traffic) could read and store the user's password.

The Minitex RFP covers reference databases. For this reason, only a small subset of services offered to libraries are covered here. The authentication for these sorts of systems typically don't depend on the user creating a password; user accounts are used to save the results of a search, or to provide customization features. A Minitex patron can use many of the offered databases without providing any sort of password.

So here are the verbatim responses received for the Minitex RFP:

LearningExpress, LLC
Response: "All passwords are stored using a salted hash. The salt is randomly generated and unique for each user."
My comment: This is a correct answer. However, the LearningExpress login sends passwords in the clear over HTTP.

OCLC
Response: "Passwords are md5 hashed."
My comment: MD5 is the hash algorithm I used in my examples above. It's not considered very secure (see comments). OCLC Firstsearch does not force HTTPS and can send login passwords in the clear.

Credo
Response: "N/A"
My comment: This just means that no passwords are used in the service.

Infogroup Library Division
Response: "Passwords are currently stored as plain text. This may change once we develop the customization for users within ReferenceUSA. Currently the only passwords we use are for libraries to access usage stats."
My comment: The user customization now available for ReferenceUSA appears at first glance to be done correctly.

EBSCO Information Services
Response: "EBSCOhost passwords in EBSCOadmin are stored in plain text."
My comment: Should note that EBSCOadmin is not a end-user facing system. So if the EBSCO systems were compromised only library administrator credentials would be exposed. 

Encyclopaedia Britannica, Inc.
Response: "Passwords are stored as plain text."
My comment: I wonder if EB has an article on network security?

ProQuest
Response: "We store all passwords as plain text."
My comment: The ProQuest service available through my library creates passwords over HTTP but uses some client-side encryption. I have not evaluated the security of this encryption.

Scholastic Library Publishing, Inc.
Response: "Passwords are not stored. FreedomFlix offers a digital locker feature and is the only digital product that requires a login and password. The user creates the login and password. Scholastic Library Publishing, Inc does not have access to this information.”
My comment: The "FreedomFlix" service not only sends user passwords unencrypted over HTTP, it sends them in a GET query string. This means that not only can anyone see the user passwords in transit, but log files will capture and save them for long-term perusal. Third-party sites will be sent the password in referrer headers. When used on a shared computer, subsequent users will easily see the passwords. "Scholastic Library Publishing" may not have access to user passwords, but everyone else will have them.

Cengage Learning
Response: "Passwords are stored in plain text."
My comment: Like FreedomFlix, the Gale Infotrac service from Cengage sends user passwords in the clear in a GET query string. But it asks the user to enter their library barcode in the password field, so users probably wouldn't be exposing their personal passwords.

So, to sum up, adoption of up-to-date security practices is far from complete in the world of library databases. I hope that the laggards have improved since the submission date of this RFP (roughly a year ago) or at least have plans in place to get with the program. I would welcome comments to this post that provide updates. Libraries themselves deserve a lot of the blame, because for the most part the vendors that serve them respond to their requirements and priorities.

I think libraries issuing RFPs for new systems and databases should include specific questions about security and privacy practices, and make sure that contracts properly assign liability for data breaches with the answers to these questions in mind.

Note: This post is based on information shared by concerned librarians on the LITA Patron Privacy Technologies Interest Group list. Join if you care about this.

6 comments:

  1. Actually, md5 is considered to be broken for security purposes. "Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use." http://www.kb.cert.org/vuls/id/836068

    ReplyDelete
    Replies
    1. I agree, MD5 by itself is not considered secure. Thanks for the correction.

      Delete
    2. It's worth understanding the implications of the MD5 vulnerability. An attacker with the MD5 hash should be able to find another password with the same hash, giving them access to the account. But it's not the user's password, so it won't be useful to access a bank account, which is unlikely to be using an unsalted MD5 hash.

      Delete
  2. I very much like this approach. I also appreciate the user-oriented approach to understand the importance of various access points. That said, it's surprisingly easy to encrypt passwords. If MD5 is considered insecure, what about SHA encryption? Any thoughts about this in the context of the value of the data (e.g. user stats, as mentioned above)?

    ReplyDelete
    Replies
    1. I'd guess that even with the MD5 vulnerabilities, rainbow table attacks, which evade the strength of the hash function will be easier to execute. (Rainbow tables are what result when you take the top million most common passwords, and compute their hashes.) Nonetheless, we use an SHA256 hash with a random salt on passwords for Unglue.it - that's what comes built-in with Django.

      Delete
  3. These salting techniques are all obsolete and now trivial to crack. Reasonable security now requires a real password-hashing algorithm like bcrypt or PKBDF2:

    http://arstechnica.com/security/2012/08/passwords-under-assault/1/

    http://en.wikipedia.org/wiki/PBKDF2

    http://en.wikipedia.org/wiki/Bcrypt

    The difference is stretching: multiple rounds of hashing.

    Here I calculated 500,000 Windows passwords (salted and hashed, but not stretched) in a few seconds: https://samsclass.info/123/proj10/px16-hashcat-win.htm

    Here I could only calculate 500 Kali Linux password hashes in a reasonable amount of time, because it uses salting, hashing and stretching: 5000 rounds of SHA-512:

    https://samsclass.info/123/proj10/p12-hashcat.htm

    ReplyDelete