Wednesday, June 13, 2018

Peering with Root DNS Servers

The Domain Name System is a recursive system for resolving host names to IP addresses and from IP addresses back to host names, which is really handy, since ideally no one interacts with IP addresses and instead refers to servers by names like google.com or blog.thelifeofkenneth.com.

When you're resolving a hostname like blog.thelifeofkenneth.com, it's actually a multi-step process where you first figure out the DNS server for the .com domain, then ask them where the name server for thelifeofkenneth.com is, then you ask them what the address for the "blog" server is. This is a well documented process elsewhere, but what I'm particularly interested in is that first little step where you somehow find the first DNS server; this is done by asking one of the 13 root name servers, which are 13 specific servers (lettered A through M) hard coded into every recursive DNS implementation as a starting point to resolve any other address.

The reason that I'm interested is because I recently became part of the team running the Fremont Cabal Internet Exchange. IXPs often peer with root name servers to make the fabric more valuable since root name servers tend to be really important for the other networks connecting to the IXP. This is possible because many of the root servers aren't implemented as one enormous DNS server in a specific place like you'd imagine, but are actually many identical copies of the same server advertising the same anycast prefix from every instance.

This means that even though we're a small IXP in the bay area, we actually stand a chance of an instance of several of the root servers being close by, or being willing to ship us the equipment to host an instance of them. We have spare rack space, so hosting their hardware to be able to increase our value and make the Internet generally better is worth providing them the space and power.

For curiosity's sake, I've been stepping through the list of root DNS servers to try and find what information I can on them, and figured these notes would be useful for some small fraction of other people online.


  • A ROOT - Run by Verisign
    • Homepage
    • Status: Only hosted in six locations; Ashburn, Los Angeles, New York, Frankfurt, London and Tokyo
  • B ROOT - Run by University of Southern California, Information Sciences Institute
    • Homepage
    • Status: Only hosted in Los Angeles and Miami
  • C ROOT - Run by Cogent
    • Homepage
    • Status: Only hosted in 10 locations; LA, Chicago, New York, etc
  • D ROOT - Run by University of Maryland
    • Homepage
    • 136 Sites
    • Partially hosted by Woodynet (AS42), which means they're already in FMT2
  • E ROOT - Run by NASA Ames Research Center
    • Homepage
    • Status: 194 sites
    • Also partially hosted by Woodynet (AS42), which means they're already in FMT2
  • F ROOT - Run by Internet Systems Consortium
  • G ROOT - Run by Defense Information Systems Agency
    • Homepage
    • Status: Only 6 sites, none in California
  • H ROOT - Run by US Army Research Lab
    • Homepage
    • Status: Only 2 sites; San Diego and Aberdeen
  • I ROOT - Run by netnod
    • Homepage
    • Hosting Requirements: Contact info[at]netnod[dot]se
    • Peering Requirements
    • Status: 68 sites, including one somewhere in San Francisco 
  • J ROOT - Run by Verisign
    • Homepage
    • Requirements include:
      • 1U space, 2x power
      • 2x network, peering LAN and /29+/64 management interface
    • Status: Already somewhere in San Francisco 
  • K ROOT - Run by RIPE
    • Homepage
    • Hosting Requirements include:
      • Provide a Dell server with 16GB RAM, quad core, 2x500GB HDD, etc.
      • Public IPv6 address with NAT64
    • Status: Seems the physically closest one is on TahoeIX in Reno.
  • L ROOT - Run by ICANN
    • Homepage
    • Hosting Requirements:
      • Sign NDA
      • Purchase code named appliance to host inside own network
    • Status: Somewhere in San Jose, per their FAQ they are not joining any additional IXPs.
  • M ROOT - Run by WIDE Project
    • Homepage
    • Status: Somewhere in San Francisco, nine sites total

Summary:
  • Roots that will never be in the bay area: A, B, C, G, H
  • Roots already in the bay area: D, E, F, I, J, L, M
My rationale for the first list is that several of the root servers only have 2-10 instances spread across the world, so they're presumably not in the business of deploying the 100-200 anycast nodes that several of the other ones are. If they don't happen to already be in the bay area, it's not like we can afford to lease fiber out to where they already happen to be. 

Root servers on the "already in the bay area" list is also problematic since our exchange currently is only in Hurricane Electric's building, so if they already have a local node, it's unlikely that we would be able to convince them to build another node in the east bay just for us unless they already happen to be co-located with us.

But you'll notice that between those two lists, there's only 12 roots... K root isn't in the bay area. 
So I did some more digging. Using the Atlas probe in my rack, I can see that K root is currently 70ms away from us, so it has a not quite optimal latency to the bay area. It looks like it's currently reachable via its node in Utah, but it has a physically closer node in Reno connected to the TahoeIX.

TahoeIX is interesting for two reasons:
  1. They're a fantastic example of another tiny IXP who has done a remarkably good job of collecting value-add peers to their network; Verisign, PCH/WoodyNet, Akamai, AS112, K root, and F root.
  2. Hurricane Electric is in "provisioning" with them, so presumably at some point soon, HE will have access to K root from Tahoe, dropping its latency to the bay area quite a bit.
So this opportunity posed by K root being the last root server not yet built out in the bay area very well might disappear soon. This is a bit of a drag since that might make RIPE less likely to entertain us hosting yet another California node, and the K roots don't come for free. We would need to provide a Dell server meeting all of their specifications, which I just priced out at $1475 for a Dell R230.

Bummer.

So, at this point, it's possible that I will be able to get F root to join FCIX, since they're just a cross connect away in the same building (and I happen to already be friendly with them), plus D and E if they happen to be on the local AS42 node. I, J, and M are in the area, so getting them on the fabric is conceivable except that they aren't in our building, so that problem would need to somehow be solved. And K is currently several states away, so I'd need to convince them that yet another west coast node is worth their bother, and we'd need to pony up $1500 to get the gear they require.

There are other value-add networks we can work on getting on our IXP, such as AS112 to trap bogon DNS requests, CDNs like Akamai, CloudFlare, and (maybe?) NetFlix, and it seems like there's also 13 DNS servers for gTLDs, but I can't find much information on who hosts those or how they're rolled out. Presumably they're hosted by just Verisign so one of those would come with a J root.

3 comments:

  1. 👍 fascinating stuff.. never really thought about this before. How are the servers setup to do anycast? Are BGP advertisments made for smaller than /24 to enable this?

    ReplyDelete
    Replies
    1. Each server uses a whole /24 just for the one service.

      Delete
  2. Would love to set up an IXP this your's but where I live, there is no colo facility. :(

    ReplyDelete