Secure Messaging Protocols Part 2: The Bleeding Edge

The rapid proliferation of Secure Messaging Protocols (SMPs) in government, industry, and the wider public have accelerated both industry and academic cryptographic and security engineering research and development to create better SMPs. In this post, I’ll go over several of the most innovative and exciting directions these efforts are leading.

In case you missed it, in my previous post in this series I covered some important terms including end-to-end (E2E) security, forward secrecy (FS), and asynchronous communication, as well as some important milestones over the past 3 decades that laid the foundations for today’s R&D. So, if you’re unfamiliar with any of the terms or want more background, you can check out the previous post to see what I mean as I’ll be using them again here.

The Wickr Messaging Protocol and Double Ratchet based SMPs have shown us how to get E2E security, FS, and at least some level of PCS for our asynchronous message protocols. But truth be told, this is just the beginning of what we could ask for from an SMP. To that end, various companies and research institutions have stepped up their R&D efforts, resulting in a bunch of new and interesting ideas.

Managed Secure Messaging & Federation

One of my personal favorites is the advent of what I call Managed Secure Messaging. Put simply, these are messaging networks which have special administrator users in charge of them. The admins can do things like manage user accounts on the network and set security policies. For more info see my blog post on the subject over here. It seems quite clear that as SMPs are used more and more within organizations (be they government, industry, or others), we will see more and more need for managed secure messaging.

Strictly speaking, having a managed network like Wickr does not automatically place any special requirements on the underlying SMP. However, a natural and very powerful feature managed networks can have is the capability to federate with other managed networks (say, within a wider government or with the networks of business partners). That is, users of different federated networks can communicate with each other (subject to the restrictions defined by their administrators).

For this to work, the SMP (or at least its routing and setup) must be designed to accommodate federation. This, in turn, raises interesting (and sometimes surprisingly subtle) security considerations. For example, it requires striking a balance between the privacy of users with respect to admins vs. the capability of those admins to dictate and enforce federation policy. After all, the more E2E crypto used by the SMP, the less insight into the communication there will be by middle-men such as the admins.

Much remains to be discovered and explored in this area. Today, few publicly available secure messaging platforms out there are managed, let alone federated. Wickr is one of the few exceptions, with federation capabilities of its managed AWS Wickr networks (which can also federate with the unmanaged public Wickr Messenger network).

Post Compromise Security

One hot topic of research in this area amongst cryptographers focuses on the security property of an SMP called Post Compromise Security (sometimes also called “Future Security” or “Channel Healing”). In a nutshell, PCS means that even though participants in a messaging session may be compromised at times, the security (i.e. privacy, integrity, authenticity, etc.) of such a session should eventually be restored through normal use once the adversary loses access to compromised devices. For example, access could be lost when the device owner uninstalls a malicious app or when they install a security update closing a vulnerability being exploited by the adversary.

Given how long many messaging sessions last (often for years), it is not surprising that this security notion has caught the attention of both industry and academic cryptographers. In fact, some of Wickr’s own peer reviewed research in the area focuses (amongst other things) on precisely characterizing and identifying what type of PCS is obtained by Double Ratchet-based protocols (such as Signal, WhatsApp, Wire, etc.). For more on those results and a much more detailed exposition of the state-of-the-art in PCS research have a look at Wickr’s SMP research program.

I think it’s safe to say that much remains to be understood around PCS. Generally, PCS can be understood as a class of security goals rather than a concrete notion. The quicker and easier that a protocol recovers — and the more invasive the type of compromise it can recover from — the “better” the concrete version of PCS the protocol achieves. Currently, we know that deployed SMPs like Wickr and the Double Ratchet-based SMPs satisfy reasonable (though incomparable) notions of PCS. However, recent academic results show that significantly stronger PCS variants are, at least in theory, still achievable. What’s more, when it comes to group SMP (like MLS or ART), we know next to nothing about the type of PCS they achieve, let alone what the spectrum of achievable PCS might look like when considering arbitrary practical group SMPs. It will be very interesting to see how our understanding and the feasibility of PCS evolves with new research in the future.

Group Messaging

From the perspective of the day-to-day use of secure messaging, one of the most straightforward focuses of research is the goal to support much larger groups.

Currently, all SMPs used in practice have the following in common: when sending to a group of N recipients, the amount of computation and bandwidth required by the sender both scale linearly in N. For example, for every additional user, an extra call to AES encryption is needed and an extra 512 bytes of data must be sent out on the network. Roughly speaking, a Secure Group Messaging Protocol (SGMP) should allow the sender’s computation and bandwidth to scale logarithmically in the number N of recipients. So, for example, the amount of computation and bandwidth today’s protocols need to send a message to 12 recipients should be enough that using an SGMP one could send to 212 = 4096 recipients!

While there are several closely related areas in cryptography which consider similar settings and constraints, none seem to really capture all of the key functionality and security requirements one would expect of an SGMP. In particular, an SGMP should allow for fully asynchronous communication, both PCS and FS security, a dynamic user base with participants continually joining and leaving multiple concurrent sessions, all while not making use of any sort of trusted central authority (e.g. for generating secret key material or sampling a common reference string). This means that group messaging remains an almost entirely unexplored topic of cryptographic research.

The Asynchronous Ratchet Tree Protocol. One of the rare exceptions is the recent paper on the Asynchronous Ratchet Tree SGMP protocol. While it lacks formal definitions or proofs, the paper does introduce some great new ideas for how to build an SGMP by introducing the ART protocol. Put simply, an ART messaging session revolves around a group secret known to all group members. As time goes by, the group secret is constantly being updated by the group members. Each new update begins a new ‘epoch’ with a fresh group secret. All messages sent during an epoch are encrypted (and authenticated) using keys derived in an FS secure way from the current group secret. ART also has a type of PCS security. Suppose some group member’s internal state leaks to an adversary (including all of their secret key material and group secrets). Once all members have initiated an update, the adversary can no longer use the leaked state (and network traffic) to compute the new group state. That is, privacy and authenticity for the session are restored.

Technically, my favorite part of ART is the Ratchet Tree data structure it uses for updating the group secret. ART arranges the N group members as leaves in a binary tree of depth log(N). Each node in the tree is assigned a public/secret key pair that also evolve over time (to help with FS and PCS security). The root keys are used to compute the group secret. As an invariant, each member always knows all (and only) the secret keys for all nodes on the path from the user’s leaf up to the root of the tree. Very roughly speaking, whenever a member updates keys in the Ratchet Tree (including the group secret), they can disseminate the new secrets to the rest of the group by encrypting them to the sibling nodes of all the nodes on the path from their leaf to the root. Crucially, for every other leaf in the tree, at least one of those siblings is contained on that leaf’s path to the root. So, by the invariant mentioned above, the member assigned to such a leaf can decrypt and recover the updated Ratchet Tree. In other words, using only log(N) cipher texts, the updated information can be disseminated to the entire group!

Messaging Layer Security. This brings us to the bleeding edge of R&D around SGMP; namely the Messaging Layer Security (MLS) work group. In my opinion, this is probably the most important ongoing project in the entire area of secure messaging protocols. MLS is the name of a new SGMP protocol being developed under the auspices of an eponymous IETF workgroup consisting of many of the key industry players in secure messaging (including Wickr), as well as a growing list of academic cryptographers from around the world. The goal is to develop a new, cryptographically principled SGMP as an open protocol for use by the wider industry as a foundation for building asynchronous secure communication platforms. It is roughly based on the paradigm I just described above from the ART paper. But it goes much further in fleshing out many practical considerations: e.g. how to handle members joining and leaving during a session, how to implement crypto and version agility, support for a post-quantum secure variant, federation, and much more. It is a work in progress and, frankly speaking, a gold mine in terms of well-motivated open cryptographic research problems.

Message Franking

Another topic where the boundaries of modern SMPs are being pushed forward is Message Franking. First introduced by Facebook in 2016 when they rolled out a Messaging Franking solution for their Messenger platform, the topic has since been picked up by various teams of cryptographers.

Basically, the goal of a message franking system is to provide users with the means of proving receipt of content to third parties while not breaking E2E privacy. In Facebook’s case, the purpose was to introduce verifiable abuse reporting for their users in a way that only leaks the content of encrypted comms to Facebook when the receiver decides to report the content as abusive. The system was designed to be “verifiable” in the sense that it shouldn’t be possible for a malicious user to convince Facebook that it received some fake content (or from a different sender).

It turns out that building such a system is actually a fair bit more subtle than one might expect. Indeed, after slew of research papers in 2017 and 2018 laid out a more rigorous, fine-grained, and principled approach to what exactly such a system should achieve and how they can be built, a group of cryptographers from Cornell, NYU, and the Royal Holloway University Of London took another look at Facebook’s construction. Unfortunately, they found they could actually break the system in a very strong sense. They were able to send two attachments that would decrypt to two different JPGs (representing arbitrary pictures). Yet, when the receiver reported the second as abusive, Facebook would only see the first JPG as having been sent twice. Central to the attack was a technique to produce an (AES-GCM) cipher text for the Messenger protocol that, depending on which of two keys was used to decrypt it, results in one of two JPGs representing any pair of target pictures. Of course, the researchers also introduced a new construction which prevents this attack. Moreover, they proved that it satisfies a mathematical definition of security for what a Message Franking scheme should achieve. Still many questions remain unanswered in this subfield and I expect further results to come out soon.