On Password Protection

Joël Alwen, Wickr Cryptography 

April 17, 2018


The Wickr platform is built around two types of secrets: cryptographic keys and passwords. This paper is focused on passwords. First, I’ll explain why Wickr has opted to use passwords. Next, I’ll describe how exactly how they are used. Finally, I’ll address the potential threats password use exposes us to and what Wickr does to mitigate those threats.

Why Passwords?

A central privacy feature of Wickr is its ability to support accounts identified by arbitrary usernames. This plays a particularly important role in providing users with a platform for both end-to-end secure and anonymous communication (Wickr Me). However, when departing from a model where accounts are bound to verifiable external identities — such as, say, a phone number, email address, Facebook account we must address the question of how access to Wickr accounts can be restricted to their owners? This is even more critical when Wickr’s goal is to allow users to access their accounts from multiple devices. To this end, presently, access to Wickr accounts is protected using passwords.

On the positive side, this means that users can create completely anonymous accounts not tied to any external identifying information (especially when using Wickr via an anonymizing service like Tor). It also means that the security of these accounts does rely on the security of an external system — think an email provider or a phone network. However, if not designed and implemented correctly, it also may create potential new vulnerabilities. This paper goes over exactly how passwords are used by Wickr and explains how we help minimize these risks. 

How Are Passwords Used?

From the user’s point of view, passwords serve a single purpose: to log in to user accounts from a fresh device. Behind the scenes, the login process actually consists of two steps. Once a user has provided the Wickr app with a username and password, the app first executes the authentication protocol with the Wickr server. Assuming this step succeeds, in the second step, the server sends over the account’s escrow bundle. This is basically a ciphertext encrypted using the account’s password containing all the secret key material, contacts and other data belonging to the users account. The app uses the password to decrypt the escrow bundle and finishes setting up the account on the local device.

In order to better understand the threats (and their countermeasures), it is helpful to go over these two steps in more detail. Suppose a user wants to log in to her account with username U using password P on her device D. Once she has filled in U and P in the login screen of her Wickr app, the authentication protocol is executed between D and Wickr server S as follows:

1. Device D uses its pinned public key for server S to establish an encrypted and authenticated secure (bi-direction) channel with S. It sends username U to S to initiate a fresh login attempt for that account.

2. S retrieves the pre-salt s1 which it has on file for account U and sends that over to D.

3. D computes salt s2 = SHA256(s1, U) and new password P’ = P || U.

4. D computes auth-key = scrypt(P’, s2) with scrypt parameters N = 217, r = 8 and p = 1. Then it sends auth-key to S.

5. S retrieves AES256-CTR “verification ciphertext” Cauth it has on file for user U. Authentication is accepted by S if and only if Cauth decrypts to plaintext U when using key auth-key.

6. If the authentication failed, then S returns an error is to D. Otherwise, S retrieves the escrow bundle B for U and sends it over to D.

7. Bundle B is parsed by D to read out a salt value s3 and an (AES256-GCM) ciphertext Cescrow.

8. D computes k = scrypt(P, s3) with the same script parameters as before.

9. Finally, D uses k to decrypt Cescrow obtaining all the account data for U. **

An immediate takeaway here is that the only data for account U stored on S is the tuple (s1, Cauth, s3, Cescrow). Moreover, this is the only way that the password for an account is ever used by the Wickr platform.

It is also worth noting that the server is essentially stateless in that a successful login attempt has no other effect than to provide D with the escrow bundle. In other words, successfully completing the authentication step does not confer any special privileges or abilities to D beyond obtaining B. Rather, such abilities (e.g. sending messages on behalf of U) are gained via possession of the key material contained in Cescrow.

Threats & Mitigations

Now with a better understanding of the details of how passwords are used, we can talk about what threats this system faces.

  • Dictionary Attacks

When it comes to passwords, one of the biggest threats is attackers simply trying as many possible passwords until the correct one is guessed. Such attacks are referred to as “Dictionary Attacks” as the guesses are normally derived based on a dictionary of common words, names and passwords. Broadly speaking, dictionary attacks can either be “on-line” or “off-line”.

An on-line dictionary attack requires interacting with an honest party (such as the server S) in order to verify each new password guess. For instance, an attacker could try to repeatedly log in as user U continuously guessing passwords until they find the right one. In fact, such an attack is at least somewhat unavoidable. However, as a very effective countermeasure, the Wickr server limits the number of failed login attempts it is willing to serve before it locks the account for a 24 hour period (the exact number depends on the type of Wickr network).  Effectively, this slows the attack to such an extent that all but the most easily guessable passwords will remain out of reach within any reasonable time frame.

An off-line dictionary attack can be launched by an attacker who has obtained some kind of data that allows them to test each guess using nothing more than some local computation. For example, an attack that obtains the escrow bundle B of an account can perform such an attack by extracting the salt s3, computing the key k for a given guess and attempting to decrypt Cescrow. As no further interaction is needed by the attacker, the rate limit on login attempts will not help here. Therefore, to mitigate these attacks the login protocol is designed in such a way that testing any password guess requires evaluating a fresh instance of scrypt. Concretely, the only data that can be used to test password guesses are the two ciphertexts Cauth and Cescrow (and the matching salts). See the next section for the details about how the parameters for script are chosen and what effects they have on slowing off-line dictionary attacks.

  • Network Based Attacks

Suppose an attacker is somehow able to break the privacy of the channel between D and S. That is, the attacker learns all the messages exchanged (inside the encrypted channel). The data observed by the attacker in this scenario can be used to mount an off-line dictionary attack. However, that is as far as it goes. Clearly, neither the password nor any other sensitive account specific information is ever transmitted in an unprotected form. Moreover, although the attacker learns auth-key which suffices to trick the server into accepting the authentication step, there is no added benefit of this to the attacker. That is because the only effect passing this step has is to cause the server to send the escrow bundle to the adversary. In particular no state is changed on the server and the attack gains no further access or abilities by supplying auth-key to the server.

  • Server Database Leak 

Suppose an attacker gets access to the server’s database (e.g., via an SQL injection). In this case the adversary learns sufficient information to launch off-line dictionary attacks on each account (although the attack will have to be relaunched for each target account since they all use different salt values). However, being able to mount an off-line dictionary is the limit of the benefit to the attacker.

  • Server Compromise

Suppose an attacker A takes complete control of the server learning not just the database but being able to interact with honest users attempting to log in. As in the case of a database leak, the attacker learns enough to mount an off-line dictionary attack. However, once again this is the limit of the benefit, at least in terms of extracting passwords and/or key material for honest users accounts.

To see this, notice that decrypting Cauth is of no direct use to A since the plaintext encrypted in Cauth is fixed anyway. Moreover, to decrypt Cescrow, A must first somehow learn k = scrypt(P, s3). Now, the only data besides U provided by D to the server (i.e. to A) is auth-key. However, auth-key is computed using a modification of both the password and the pre-salt provided by the server. So, for example, to trick D into using s3 as the salt when computing auth-key, A would have to come up with pre-salt s in step 2 such that s3 = SHA256(s || U). However, efficiently inverting SHA256 is not feasible, making this an unrealistic task for A. Thus, even if D is willing to repeatedly supply auth-key values to A for various pre-salts, nothing more is obtained by A than data for use in off-line dictionary attacks.

Scrypt Parameters

The scrypt function [2] is designed specifically such that evaluating it on any input in a reasonable amount of time requires a large number of reads and writes to random locations in a large array [3]. As soon as the array becomes too big to store in cache, the entire computation begins to slow down dramatically because off-chip memory access is orders of magnitude slower than pure computation (and even on-chip memory access). The size of the array is determined by setting the N and r parameter for scrypt. 

The choice of these parameters used by Wickr is quite conservative. A standard scrypt evaluation with Wickr’s parameters results in an array of size approximately 16MB of data being accessed 221 times. More to the point, with these parameters each fresh evaluation of scrypt (i.e., each fresh password guess) requires approximately 380 milliseconds [4]. Moreover, even optimized implementations on a GPU or an FPGA result in speed ups of little more than a factor of 2 for this parameter range [1]. In contrast, an off-the-shelf general purpose CPU can decrypt around 2 GB/s [5], so even for very large escrow bundles this would easily result in tens of thousands of passwords guesses per second instead of 2 or 3.

To get an idea for what one can expect in terms of defense against off-line dictionary attacks, here are a couple of scenarios for how passwords could be chosen and how long one can expect a dictionary attack to run before such a password is found:

  • Suppose the password is chosen by selecting two random words from a list of 4096 words. That would give the password 24 bits of entropy which means that, on average, an attacker would need about 223 guesses before they find the password. At .38 seconds per guess, this would take roughly 37 days to find.
  • Suppose the password is chosen by human and is 10 characters subject to the rules that it may not be an English word and must contain at least one uppercase character, lowercase character and one digit. NIST estimates the entropy of such a password to be roughly 32 bits. (See Appendix A in [6]). Thus, on average about 231 guesses are required to find such a password. At .38 seconds per guess, this takes about 26 years.
  • Suppose the password is chosen by selecting 8 characters at random (with repetition) out of the total 95 characters on a standard QUERTY keyboard. That results in about 52.5 bits of entropy which means that, on average, about 251.5 guesses are needed to find the password. At .38 seconds per guess this would take roughly 38 million years.
  • Suppose the password is chosen by selecting 10 random characters from either uppercase letters, lowercase letter, digits 0-9, ! or *. This results in a password with exactly 60 bits of entropy. So on average 259 guesses are required to find the password which would take, at .38 guesses per second, roughly 13.9 billion years. So even using 1 million off-the-shelf desktop computers in parallel for the dictionary attack it would still take nearly 14 thousand years.

Password Policy

In light of the above discussion one of the most effective and yet relatively easy to implement defense mechanism available to Wickr users is to institute sound password policies. Such a policy should be aimed at ensuring users select their passwords with sufficient entropy to cause off-line dictionary attacks to become computationally infeasible (or at least extremely expensive). In general, the topic of what constitutes a good password policy is surprisingly involved and itself an active topic of research and a full exposition is beyond the scope of this document. Nevertheless, we do highlight three important points to consider (and further references) when formulating a password policy that balances security against cost and usability.

  • Blacklists 

Analysis of several sets of real world passwords shows [7] that humans tend to choose their passwords according to Zipf’s Law. In other words, the most common password is twice as likely to appear as the second most likely password, three times as likely as the third most frequent password, etc. What this means is that a relatively small number of passwords end up being used a disproportionally large amount of the time. Clearly this is bad for security. To mitigate this effect, one option is to institute the use of a blacklist policy, which bars users from selecting the most common choices (e.g. “12345”, “password”, “asdf”, etc.) as their passwords. Thus, the overall entropy of the resulting choices can be significantly increased even using only relatively short (and thus inexpensive to use) blacklists. To this end, various blacklists are maintained: some available for free [8] while others are available commercially (e.g., [9]) and their use is commonly recommended (e.g., by the UK [11] and US governments [12]).

  • Password Resets 

For a many years, it was common practice to force users to reset their passwords periodically. However, recently this wisdom is more and more being contravened [10] as such a policy does little to increase the entropy of passwords. In fact, it has been found to often lead to weaker password choices for various reasons. For one thing, as users need to keep track of more and more passwords, they tend to be more and more likely to reuse the same or similar passwords across accounts. Moreover, when forced to choose a new password, users tend to introduce very small and predictable differences to their password (e.g., by adding digits “1” then “2” then “3," etc. at the end). So on the one hand,  the entropy of passwords subject to periodic reset policies tend to be only marginally greater, if at all, than those subject to no such policy. Yet, on the other hand, forced resets tend to introduce significant new risk factors and vulnerabilities. These insights are reflected in various recent advisory documents on password policy recommendations [11,12] which explicitly recommend against periodic forced resets.
It is important to note though that in the event of a breach or even just the suspicion of a breach mandatory password resets are absolutely in order. The above discussion applies only to otherwise unmotivated period reseting of passwords.

  • Computer Assisted Password Generation 

A potentially very strong policy which can, when implemented correctly, almost completely eliminate the threat of off-line dictionary attacks involves assisting users in selecting their password by providing them with a password generator. Not only does this guarantee sufficient entropy in the resulting passwords so as to make off-line dictionary attacks infeasible, it also eliminates the need for the blacklists described above.

However, some care must be taken with this approach. If not done correctly, it can lead to significant usability issues as well as additional costs to the wider organization: e.g. due to frequent password loss caused by users not being able to remember their passwords [13]. To mitigate this, various systems have been developed to help encode sufficiently random passwords into forms more amenable to human memorization (e.g., repeated consonant-vowel-consonant triples, human pronounceable passwords, or even randomly chosen pass phrases to name a few such encoding schemes).


Bibliography

[1] Markus Dürmuth, Thorsten Kranz: On Password Guessing with GPUs and FPGAs. PASSWORDS 2014: 19-38.

[2] C. Percival, S. Josefsson: RFC7914 - The scrypt Password-Based Key Derivation Function.

[3] Joël Alwen, Binyi Chen, Krzysztof Pietrzak, Leonid Reyzin, Stefano Tessaro: Scrypt is Maximally Memory-Hard. EUROCRYPT 2017: 33-62

[4] Aaron, Toponce. Further Investigation Into Scrypt and Argon2 Password Hashing. June 29, 2016

[5] AES-NI SSL Performance. December 07, 2017

[6] NIST Special Publication 800-63-2: Electronic Authentication Guidelines.

[7] J. Blocki, B. Harsha and S. Zhou, On the Economics of Offline Password Cracking2018 IEEE Symposium on Security and Privacy (SP), San Fransisco, CA, US, , pp. 35-53.

[8] Carey Li, NIST Bad Passwords - JavaScript library for detecting common passwords. Retrieved March 22, 2018

[9] PasswordRBL

[10] National Cyber Security Center - GCHQ, The problems with forcing regular password expiry. Retrieved March 22, 2018.

[11] National Cyber Security Center - GCHQ, Password Guidance: Simplifying Your Approach. 08 Aug 2016. Retrieved March 22, 2018.

[12] NIST Special Publication 800-63B: Digital Identity Guidelines. Wed, 21 Mar 2018.

[13] Center For the Protection of National Infrastructure - GCHQ, Password Guidance. Retrieved 22, March, 2018.