29 Nov 2012
I just read the paper on Teredo published in the Computer Communication Review: Investigating the IPv6 Teredo Tunnelling Capability and Performance of Internet Clients by Sebastian Zander, Lachlan L. H. Andrew, Grenville Armitage, Geoff Huston and George Michaelson. This is a very well done study, which used image links on web sites to test the capability of clients to use IPv4, IPv6, and distinguish in IPv6 the native connections and the Teredo connections. Their conclusion is that many more hosts would use IPv6 if Microsoft was shipping Teredo in "active" instead of "dormant" state in Windows Vista and Windows 7, but that the communications using Teredo incur some very long delays, at least 1.5 second more to fetch a page than with native IPv4 or IPv6 connections. Both of these issues can be traced to specific elements of the protocol design and especially to our emphasis of security over performance.
I proposed the first Teredo drafts back in 2000, shortly after joining Microsoft. The idea was simple: develop a mechanism to tunnel IPv6 packets over the IPv4 Internet, in a fashion that works automatically across NAT and firewalls. It seemed obvious, but was also quite provocative – the part about working automatically across firewalls did not sit well with firewall vendors and other security experts. In fact, this was so controversial that I had to revise the proposal almost 20 times between July 2000 and the eventual publication of RFC 4380 in February 2006. Some of the revisions dealt with deployment issues, such as minimizing the impact on the server, but most were answers to security considerations. When Microsoft finally shipped Teredo, my colleagues added quite a few restrictions of their own, again largely to mitigate security issues and some deployment issues.
The connection between Teredo clients and IPv6 servers is certainly slowed down by decisions we made in name of security. When a Teredo client starts a connection with an IPv6 server, the first packet is not the TCP "SYN," but rather a Teredo "bubble" encapsulated in an IPv6 ICMP echo request (ping) packet. The client will then wait to receive a response from the server through the Teredo relay closest to the server, and will then send the SYN packet through that server. Instead of a single round trip, we have at least two, one for the ICMP exchange and another for the SYN itself. That means at a minimum twice the set up delay. But in fact, since the client is dormant, it will send first a qualification request to the server, to make sure that the server will be able to relay the exchange, thus adding another round trip, for a total of three. The server happens to be often quite overloaded, and the queuing delays in the servers can cause quite a few additional latency. This is very much what the study is demonstrating.
We could have engineered the exchange for greater speed. For example, instead of first sending a ping to the server, we could have just sent the TCP SYN to the server, and used the SYN response to discover the relay. This would have probably increased the connection success rate, as many servers are "protected" by firewalls that discard the ICMP packets. But at the time we convinced ourselves that it would be too easy to attack. A hacker could send a well-timed spoofed TCP SYN response and hijack the connection. The randomness of the source port number and of the initial TCP sequence number provide some protection against spoofing, but these are only 16 and 32 bits, and that was deemed too risky. The ICMP exchange, in contrast, can carry a large random number and is almost impossible to spoof by hackers not in the path. So the protocol design picked the "slow and secure" option.
The connection to IPv6 hosts is just an example of these design choices for security over performance. There are quite a few other parts of the protocol were we could have chosen more aggressive options, using optimistic early transmission instead of relying on preliminary synchronization. But we really wanted to deliver a secure solution – secure computing was indeed becoming a core tenet of the Microsoft culture by that time. We were also concerned that if Teredo was perceived as insecure, more and more firewalls would simply block it, and our deployment efforts would fail. All of these were valid reasons, but the long latencies observed in the study are also an impediment to deployment. If I did it again, I would probably bring the balance a bit more towards the side of performance.
But then, the really cool part of the study is their point that removing some of the restrictions on Teredo would almost triple the number of hosts capable of downloading Internet content, adding IPv6 capability to 15-16% of Internet hosts. That would be very nice, and I would be happy to see that!