Christian Huitema's Latest Posts

Cloudy sky, waves on the sea, the sun is 
shining

The list of past posts including those previously published on Wordpress is available here.

Another Picoquic bug, not limiting the size of TLS messages

Posted on 05 Sep 2024

I just fixed another bad bug in Picoquic. The bug, found by Asakura Mizu, relates to unbounded storage of TLS Stream.

The issue dates to the decision back in 2017 to embed TLS in QUIC, and to send the TLS data in a series of “crypto streams”. Initially, the crypto stream was just a data stream with the number 0, but after discussion we modeled the TLS stream as a set of 3 “crypto streams”, one for each of the three stages of the TLS handshake: initial mode, handshake mode, and application mode.

The crypto streams are a calque of the data stream, with two key differences: they use they own “CRYPTO_HS” frames that are similar but different from the stream data frames, and there is no flow control. Instead of flow control, the Section 7.5 of [RFC9000](https://www.rfc-editor.org/rfc/rfc9000.html] suggests limiting “data received in out-of-order CRYPTO frames”. The idea is that data received “in order” can be fed directly to the TLS controller, and thus does not need to be buffered, so the limit only applies to “out of order” data. This is a bit more lenient than simply limiting the overall size of TLS messages, but not by much: if a TLS message is split into several packets and the first one is lost, the amount of out of order data will be more or less the size of the message.

Somehow, I missed the memo and did not implement this “crypto flow control” in Picoquic. Thanks to Asakura Mizu for reminding me! The fix was easy: add a test of the amount of out of order data in the code, then add a in the test suite to verify that. To do the test I had to use error injection, which should have been simple but needed a fix. Of course I made a mistake in writing this fix, but the fuzz test in the test suite caught the error. The usual.

And then, I found out that the 4096 bytes limit suggested in Section 7.5 of [RFC9000](https://www.rfc-editor.org/rfc/rfc9000.html] caused one existing test to fail. That test verifies that the code can abide by the amplification limit of 3 specified in RFC 9000 even if the server’s first flight is very large, 8K in my test requiring 8 packets. The test anticipates the large messages than may be needed if we handle the large keys of Post Quantum algorithms. And the test include injection of packet losses, thus almost 8K of “out of order crypto data” — which showed that the 4KB limit was too low. I bumped it to 16KB in the code, and that works now.

I can’t help thinking that this “maximum amount of out of order TLS data” should be exposed in the QUIC protocol. The current state is that any host picks a limit, but that means senders do not know how much data they can send exactly. It leads to a game of handshake roulette: send more data than the peers limit and it will work very well most of the time, but cause a connection failure if the wrong packet is lost. A sender that wants to be safe should treat the limit as a flow control window, initially set to the lowest common denominator (4096 bytes) and opening progressively as packets are acknowledged. That will work, but be very slow. Maybe we should have a transport parameter explaining the local value of the window. It would not help the very first packets, but it would help everything else. In particular, if the largest TLS message is the server first flight, knowing the client’s flow control limit would help sending that flight quickly and safely.

Attacking the QUIC handshake

Posted on 18 Aug 2024

QUIC is an encrypted control protocol. That’s a big difference with TCP, and it prevents some of the known attacks against TCP, such as a third party sending a TCP “Reset” packet (RST) and closing a TCP connection. In QUIC, the equivalent of the RST packet would be a “Connection Close”. It will only be accepted if sent inside an encrypted packet, and once the connection is established only the two end points can send that. But there is a catch. The cryptographic keys are negotiated during the TLS handshake carried in the initial packets. These packets are encrypted, but the encryption key is a function of the initial connection identifier and the QUIC version number. We can thus have the following scenario:

In practice, this looks a lot like the RST attack against TCP, the main difference being that the attacker has to be quick – if Bob’s response arrives before Eve’s message, the attack will fail. This is not a new attack that I would just have discovered. It was discussed at length during the standardization of QUIC. There is in fact a whole class of such attacks – Eve could use Version Negotiation packets, Retry packets, spoofed Client Hello packets, or packets that contain a protocol error. Eve could also send the sppofed packets to Bob, such as a “connection close” packet that arrives after Alice’s first message but before Alice’s response to Bob’s initial message. Fixing all that was deemed too difficult, and so RFC 9001 merely states that “After completing the TLS handshake, the client will have learned and authenticated an identity for the server, and the server is optionally able to learn and authenticate an identity for the client.”

Back in 2019, Kazuho Ohu and I proposed an [Authenticated Handshake for QUIC] (https://datatracker.ietf.org/doc/draft-kazuho-quic-authenticated-handshake/). The idea was to use a shared secret established before the connection to authenticate the Initial messages, and prevent third parties from injecting spoofed messages. The draft is now expired, maybe because the requirement to share secrets before the connection was not practical. There were later attempts to use the Encrypted Client Hello to establish a shared secret and protect the handshake, but that too did not work because the ECH derived secret is only available to late to protect the first initial packets. Overall, it seems that protecting the QUIC handshake is one of those ideas that comes up periodically and always fails to attract traction, maybe because we have not seen much of these RST-style attacks yet so there is no urgency. But I would rather see development of security before the attacks rather than after them.

Maybe it is time to revisit these old drafts, and come up with a robust solution that does not require shared secrets at all.

Token overflow in picoquic

Posted on 12 Mar 2024

I was just checking my mail today when I found a new issue filed in picoquic: Stack-based buffer overflow in picoquic_verify_retry_token. Oh well, time to eat some humble pie. The issue happens in the function that decodes the “Retry Token” - see Section 8.1.2 of RFC 9000. This function tries to decrypt a retry token into uint8_t text[128]. This token is received from network, so it could be crafted by a malicious actor. There’s no explicit restriction on its length, so there’s no guarantee the result will fit into the buffer. The length of the buffer is not passed on from this function, the decryption function just assumes it is big enough. This could result in stack overwrite with potential exploit implications, or denial of service.

This was easily fixed, but it shows a couple of interesting issues. First, a classic failure to check an external input. When I wrote that code, I was assuming that the clients would only send the tokens provided by the proper server, and that in any case, since the tokens were encrypted, any tempering would result in a decryption error and cause the token to be rejected. But the particular decryption procedure that I use will overflow if the decryption buffer is too small. The fix was to test that.

The second issue is that this should have been found in testing. Picoquic has extensive tests, including fuzzing of initial packets. But the fuzzing process was not programmed to create tokens of arbitrary sizes, so the bug was not detected.

Well, nobody is perfect…

The Retire Connection ID stuffing Attack against QUIC

Posted on 12 Mar 2024

Back in December, Marten Seeman found an attack against QUIC. Malevolent clients could exploit the “path validation” mechanism to create large queues of “Path Response” messages, eventually saturating the server”s memory. Back then, it turned out that the picoquic implementation was not vulnerable, because it only responded to the last challenge received. Marten has since found another similar issue, this time exploiting the handling of “NEW CONNECTION ID” frames. And this time, I did have to fix the picoquic code.

QUIC was designed with security in mind. There were lots of reviews, lots of discussions, and the development of formal proofs that the security of QUIC is equivalent to that of TLS 1.3. There are known limit to that security: on path observers can cause the initial handshake to fail before the session keys are fully negotiated; they can “fingerprint” the encrypted traffic and match it to known patterns; on path attackers can drop packets or mess with IP headers. These are very hard issues, the kind that will need serious efforts to resolve – QUIC is not better than TLS1.3 in that regard. But outside of that, most attacks against QUIC are attacks against implementations that did not correctly implement the specification. Marten’s attacks are different, because they are based on the QUIC specification itself, and affect potentially every implementation.

QUIC packet headers include a “connection identifier” that allows node to link packets to connection contexts. The connections ID are created by the QUIC node that will eventually decode them in packet headers: the client uses connection IDs provide by the server in the packets sent to the server, and vice versa. To facilitate that, each node sends “NEW CONNECTION ID” frames to its peer. And to control the use of resource, both nodes tell their peer how many “NEW CONNECTION ID” they are willing to receive – trying to send more than that triggers a protocol violation. When a node does not need an old connection ID anymore, it sends a “retire connection ID” frame, and the peer will provide a new one to replace it.

But there is a catch. The servers and clients need to maintain a list of valid connection identifiers so they can successfully direct packets to connection contexts. In big server farms, this is done by the load balancers, and it is often done using encryption and decryption: the connection ID typically contains bytes encrypting a server identifier and a random number, with a key known to the load balancer, see for example the QUIC-LB draft. These keys are rotated regularly, and when an old key is discarded, the corresponding connection ID need to be discarded and new ones need to be provided. The new connection IDs will be carried in “NEW CONNECTION ID” frames, with an attribute asking to retire connection IDs with a sequence number lower than a “retire prior” value.

This is the mechanism exploited in the attack. The malevolent client will send a series of “NEW CONNECTION ID” frames saying something like “this is connection ID number N, please retire connection ID number N-1”. The server will accept the frame because once the previous connection ID has been retired, the total number of connection ID provided by the peer remains at or below the maximum number allowed. The server will also send a “RETIRE CONNECTION ID” frame confirming that the old connection ID number “N-1” has been retired.

That, by itself, does not sound too bad, but wait. The client sends a series of “NEW CONNECTION ID” frames to force the server to send a series of as many “RETIRE CONNECTION ID” frames. Clients have many other ways to cause the server to send traffic, such as for example requesting web pages. But all other types of traffic are regulated by flow control mechanisms, which limit how much data will be queued. In contrast, there is no limit to the number of “RETIRE CONNECTION ID” frames that could be queued.

Still, these frames are very small, so they would normally not be queued for very long. The attack only works if the client manages to slow down the server, for example by only acknowledging a fraction of the packets that the server sends. This will cause the congestion control mechanisms to reduce the congestion window, dropping eventually to a minimum size like 1 or 2 packets per RTT. Each such packet would carry several hundred “RETIRE CONNECTION ID” frames. The queues will start to build up if the client sends thousands of “NEW CONNECTION ID” frames per RTT. There are certainly network scenarios in which that is doable. In those scenarios, the queues will build up, and the process memory will keep growing. Do that long enough and the server will run out of memory, especially if it is a small server, such as an embedded system.

The discussion above only considers the sized of the queued frames, which is the only effect per specification. But implementation choices may make servers more sensitive to the attack. For example, some servers may keep the memory allocation for a connection ID until the “RETIRE CID” frame has not just been sent, but also acknowledged by the peer. In that case, the attack will cause much larger memory allocations. And then, the code handling connection IDs may be designed for the small number of CID expected, but the attack could increase the size of the table, the handling would be inefficient, and the CPU load will increase.

The attack is not hard to mitigate: just limit the number of “RETIRE CONNECTION ID” frames that the server is willing to queue, detect an anomaly and break the connection if the client sends more than that. But it is yet another example that designing network protocols is hard. Dozens of engineers have reviewed the QUIC specification, yet Marten found the issue three years after the specification was complete. We have seen reviewed the specification again and could not find other similar issues. But I guess that we have to keep looking!

Oh, and if you are building a project using picoquic, the fix for that issue was merged on March 12 at 18:00 UTC. Please update to a recent version!

The long slow path to QUIC multipath

Posted on 28 Feb 2024

I started working on QUIC multipath in 2017, see this old draft. 7 years and 21 drafts later, we have still not converged on a solution. We thought we were close last November with version 6 of the IETF draft, but no, we had to go back to the drawing board.

At this point, my implementation of QUIC in picoquic supports three ways to implement multiparty:

The simple version made just the minimal set of changes to RFC 9000 to allow transmission over multiple paths in parallel. All application data packets were numbered from the same number space, regardless of the transmission path. This leaves the protocol simple, which is very good if we only need to support simple scenarios like “make a transition to the new path before breaking the old one.” On the other hand, if the scenario requires high speed transmission on parallel paths, implementers need to be very careful in the way they handle acknowledgements and detect packet losses. With enough care, the performance is within a few percent of the more complex solutions, but there is still a hole: we would need more work to properly track ECN marks per path. Most developers were convinced that we needed instead a solution that manages multiple number spaces, so we can keep the ACK management and loss discovery algorithms of RFC 9000, without extra complexity.

Using multiple number spaces means that packets sent on a path could be numbered “naturally” after the order of transmissions on that path. That means we could have packets 1, 2, 3, 4, etc. on Path A, and different packets with the same numbers of path B. Loss recovery is simpler, but we have a problem with encryption. QUIC relies on AEAD, AEAD encryption requires unique “nonce” per packet, and RFC 9001 solves that by using the 64 bit packet number of QUIC as a nonce. But if we have different packets with the same numbers, the nonce is not unique anymore and simple tricks can be used to guess the contents of packets with colliding nonce. The solution is to build a 96 bit nonce that combines 64 bits of sequence number and up to 32 bits of path identifier.

The version 6 draft does just that. The design starts by observing that all QUIC packets start with a “Connection Identifier” (CID) — a string of bytes that are unique to a connection. In principle, a CID is used on only one path, if only for privacy reasons. The CIDs are allocated by one node and sent to the other in “New Connection ID” frames, and are identified by a “CID sequence number”. We can use that sequence number to identify a path, but when we do that we are also using it to identify a packet number space. That works, but require having data structure in which packet numbers and ACK management are “per CID”, which is only an approximation of “per path”. It is a pretty good approximation and it works well most of the time, but there is a corner case.

QUIC nodes will often perform “CID renewal” for privacy reasons. Since the CID value is carried in each packet, renewing the CID breaks the easy correlation between an old series of packets with the old CID, and the new series with the new one. That’s a gain for privacy. The draft 6 handles that easily: when a packet arrives with a new CID but the same source and destination IP addresses and port numbers as some previously defined path, we can handle that as “CID renewal”. We will start a new packet sequence number, but we can retain old path properties like round trip time (RTT) measurements and congestion control state.

The paths carrying QUIC packets sometimes incorporate Network Address Translation devices (NAT). These NAT map an incoming IP address and UDP port to different outgoing values, a relation described as “port mapping”. In principle, the port mappings remain stable long enough for most connections, but sometimes they change in the middle of a connection. QUIC nodes detect a mapping change when they receive packets with the same CID as an old path, but a different IP address and port. And then they apply “NAT rebinding” rules to continue the connection while keeping the timing and congestion control information of the path.

But what if the CID renewal happens at the same time as a NAT rebinding? This is not a theoretical question, because nodes will want to renew the CID when there is a significant time gap between the old series of packets and the new one, and time gaps are exactly when NAT may lose the port mappings. The simple “CID based” path identification breaks when NAT rebinding and CID renewal happen at the same time. The new series of packets will be treated as a completely new path, requiring new RTT measurements and reinitialization of the congestion control algorithm. Arguably this does not happen very often, and some rare reinitialization of congestion control is not a very big deal. But still, this is a weakness in the design.

Enter the third proposal, which starts by redefining how CIDs are organized. RFC 9000 organizes a series of CIDs with unique sequence numbers within the connection. The new design organizes the set of CID per path, or rather per potential path. Each CID has a path identifier, and a sequence number that differentiates it from other CIDs in the same path. When doing CID renewal, nodes must replace the CID of the path with another with the same path identifier. The encryption nonce is composed of the packet sequence number and the path identifier. The sequence number is per path instead of per CID. If NAT rebinding happens in the middle of a CID renewal, the path identifier derived from the CID tells us that this is the same path, and the node can retain their path information. It is a nice structure, and many path management functions become simpler.

Of course, this has a cost. Path management is simpler, but CID management is more complex. The proposal has to replace the New Connection ID and Retire Connection ID frames with new “Multipath” variants, and to introduce a mechanism to control the flow of New Connection IDs. There are still options to be discussed, such as whether the CID chosen by the sender and the receiver should have the same path number, or the details of NAT Rebinding. Plus there is the cost to implementers — it took me a week to implement the “unique Path ID” proposal in picoquic, and I probably need to work some more to reach the quality of the previous code.

In the old days, the IETF was prone to find a solution that was good enough, ship it, gather experience, and then revise the standard later to fill in the gaps. If we had followed that process, we could probably have published a QUIC Multipath RFC last year, or maybe the year before — the draft 6 was definitely good enough, and the previous draft was probably OK as well. But we have decided instead to discuss all details before approving the draft. The result will probably be better, although gathering experience sooner would also have helped improve quality. In any case, the process is not quick. As things go, we will be lucky if we converge on a proposal by the next IETF meeting in July 2024, and even November 2025 will be challenging!