Christian Huitema's blog

Cloudy sky, waves on the sea, the sun is
shining

Locator-ID split in a new transport protocol

15 Apr 2012

It seems that the Internet engineers like to periodically revisit old discussions. For example, every year or so, there will be intense exchanges on the IETF mailing list on the continuous use of ASCII for formatting RFC. The discussion will explore a variety of options before choosing to just maintain the status-quo. Of course ASCII RFC is just one of these recurring topics. There are a couple more, notably the "ID-locator separation," which periodically resurfaces as one of the opportunity that was supposedly missed during the selection of IP next generation, now IPv6.

The ID-locator separation argument is fundamentally about routing, and how to scale the routing infrastructure to match an ever larger Internet. In the current Internet, IP addresses have two roles. They tell the network the destination of a packet, i.e. the location of the target, and they tell to the TCP layer the context of the packet, i.e. the identification of the peer. The argument is that if we separated these two roles, for example by splitting the address in two parts, we could use one part as a host identifier that never changes, and another part as a "locator" that could be changed in time to accommodate multi-homing and mobility, or to facilitate NAT traversal. TCP connections would be identified by the identifier part, and that same identifier part could also be used by firewalls and other filters.

The proposal looks nice, but it is actually quite hard to deploy. The proposal makes most sense if the locators can be rewritten inside the network, e.g. when crossing the boundaries between management areas, but that can only be done if we design and deploy a new service that retrieves the adequate "locator" for the identifier. This seems expensive, and the practical solutions are merely variations of NAT, which only rewrite the locators when entering a "private" network. But more importantly, the proposals belong to the "infrastructure" category, i.e. new functions that must be deployed by everybody before anybody reaches the full benefits. That means deploying new TCP stacks in pretty much every host and every router of the Internet. We saw with IPv6 how long this kind of deployment takes. See you back in 10 years!

Even if we could actually deploy it, we would have to resolve two nasty issues, security and privacy. The current linkage of address and identification provides for a minimal form of security, by checking that the return address works. If host A sends a message to host B and receives back a valid response, host A has a reasonable assumption that the initial message was delivered to the intended address, and that the response is indeed coming from host B. The assumption is false if the packet routing was somehow hacked, but hacking packet routing is hard, so there is some level of precaution. If there is no linkage between location and identity, if locators can be changed at will, we lose that minimal security.

Today, hosts get new addresses when they move to different network locations. That provides some measure of privacy. Of course hosts can still be tracked by other means, e.g. cookies, but at least the address is different. If we give hosts a strong "network identifier" that will not change with network location, we enable a new way of tracking. If the identifier cannot be changed, then network services can track users using their network identifier, even if these users take the precaution of destroying the web cookies in their browsers. Of course, the privacy issue could be mitigated by changing the identifier often, but that is contradictory with the idea of a stable identifier and a variable locator.

The more I think of it, the more I believe that the locator-ID split would be better addressed by building the ID in the transport protocol, instead of trying to redefine the network layer. Basically, the IP addresses with all their warts and their NATs become the locators, and the identifiers are entirely managed by the transport protocol. Of course, changing TCP is just as hard as changing IP, but we don't necessarily have to "change TCP." There can only be a single network protocol, but there can definitely be multiple transport protocols. We could define a new transport protocol for use by applications that want to manage multi-homing and mobility, or even NAT traversal. The broad lines are easily drawn:

Each connection has its own identifier, a large number that is independent of the IP addresses used to route the packet.
The identifier is negotiated during the initial exchange, using some cryptographic procedure.
The cryptographic procedure generates a secret key used to prove the authenticity of packets.
Packets can be sent or received through any pair of IP addresses, as long as they arrive to the destination.
Apart from the change in identifiers, the protocol behaves like TCP.

This will provide a large part of the advantages of the id/loc separation, without requiring any update to the network nodes. If I had to design it now, I would define an encapsulation on top of UDP, so that the packets could be sent across existing NAT. In fact, adding a NAT traversal function similar to ICE would enable deployment behind NAT. The main issue there is the design of the connection identifier negotiation, and its linkage to some form of host identity. There are many possibilities, from variations of Diffie-Hellman to designs similar to TLS. Clearly, that needs some work. But we would get a secure transport protocol that supports multi-homing, renumbering, mobility and NAT traversal. Seems like we should start writing the Internet drafts!