Sunday, April 15, 2018

Creating an Internet Exchange for Even More Fun and Less Profit

Last quarter, I was pulled into the slightly odd underground of people running their own autonomous systems, and since then, our circle of friends running autonomous systems at Hurricane Electric's FMT2 has slowly been growing.
Which is great, except that we're all running autonomous systems, which means that we can set up peering links, and are you really friends with another network engineer if you're not running a cross connect between your two networks? This wasn't too bad for the first few networks joining our little cabal of networks, but due to that pesky quadratic growth issue, the number of new cross connects needed when the fifth or sixth person joined started getting ridiculous. (It's like, four or five!)
This is, of course, an issue that real networks have to deal with as well, so when we had an eighth friend sign a service agreement with Hurricane Electric this week, the idea was (half jokingly) floated that we should just start our own Internet Exchange Point to cut down on the number of cross connects we need for each new member.

An Internet Exchange is basically just a single L2 Ethernet switch which every network plugs into, such that every network can directly set up BGP peering / route packets to each other network on the fabric. Furthermore, to make it even easier to add new networks to an Internet Exchange, many IXs run "route servers," which are BGP peers which re-distribute all the connected routes. This is convenient because it means that only the IX operator and the new network need to adjust their BGP configuration when a network joins; everyone else is already peered with the route server and start getting the new routes (and which router on the switch to send that traffic to) as part of their already existing connection to the route server.

So we were all sitting there, contemplating the idea of ordering seven more cross connects and once again all logging into our routers to update our configs, and at that point, the idea of creating an Internet Exchange instead didn't seem too bad.

We could instead have all gotten cross connects into one of the existing Internet Exchanges in the HE FMT2 building, such as SFMIX, but they charge $995/year for a port on their fabric, which is more money than it's worth for all of us to cross connect for amusement's sake (most of us are amateurs and not making money on our networks). So screw it, hold my other beer, and away we go!

And that's how the Fremont Cabal Internet Exchange was born. 

We even made a website and everything.

We allocated a /64 IPv6 subnet from my /48 (which was originally allocated from another guy's /32), drummed up an IPv4 /24 that was currently between projects, and very carefully selected the private ASN 4244741280, and all that was left to get was a switch to all connect to.
Thankfully, my entire network in my cabinet is built on a Cisco 6506, which is technically a switch, so we called that close enough, and instead of having to find another piece of hardware, just allocated a VLAN on my 6506 as the switch fabric, and we were all set. Besides, we were getting a little worried that there were getting to be too few Internet Exchanges running on Cisco 6500s these days.

Now whenever someone wants to connect to the FCIX (Fremont Cabal Internet Exchange) fabric, they just get a cross connect to my cabinet, I set another port to be an access port to the FCIX VLAN, and they're hooked up to everyone.

It's only 1Gbps to each network, but most of us are only originating a few prefixes for a few servers, so we aren't really pushing the limits of single 1G links per participant yet, but just like in any real IX, as soon as someone starts saturating their link to FCIX, they can start setting up direct peering links to other networks to start shedding that traffic off their exchange links. You know... when that happens...

Ideally we would have applied for a public ASN for the exchange, but that $550 + $100/yr for a registered ASN kind of went against the objective of saving money on cross connects, and I figured the chances of someone connecting to FCIX already using one random 4 byte private ASN inside their network was pretty low. Since the IX ASN is never appended to any routes going through the exchange, there's also the fact that no one outside the exchange will ever see this ASN, so it seems like a pretty acceptable trade-off for a group of amateurs for now. (The biggest downside I can think of is that we might not be able to register this IX on peeringDB with a private ASN, to further prop up the facade that this is an Internet Exchange to be taken seriously)

Edit: OK, I stand corrected. peeringDB had no problem and we're now live on there as well. That was not expected.
The last piece to really make adding new members to this peering fabric convenient is setting up two route servers, so that each new member doesn't trigger everyone needing to log into their routers to add a new BGP peer. Instead, everyone peers with the route servers and they handle the full N-to-N exchange of routes. When a new member joins, they set up their router on the fabric's /24+/64, and peer with the two route servers, and the only other involvement needed is from one of the IX admins (which is really just me, currently) to add them to the route server. Every other member doesn't need to be involved and can just enjoy the new routes appearing on their router.
We have two BGP route servers so as I need to restart each one for maintenance reasons, everyone can still trade routes over the other one and I don't trigger a reconvergence every time I restart the daemon or VM. We even managed to get the second VM on a different hypervisor in Javier's cabinet instead of mine, for further fault tolerance.

We're still working to figure out exactly which route server software we want to use. I'm the most familiar with Quagga, but Quagga tries to emulate the Cisco model of all config changes are made on the fly through the console, where I don't want to be hand crafting config changes every time we add a member, so I'm currently taking a crash course in running BIRD as one of our route servers, and will likely be swapping various daemons in for each route server as we learn more.

Sunday, April 1, 2018

Measuring the Internet

Progress on the whole "running my own bit of the Internet" project has been going well. We've got a router, and some servers, and even a NAS, so one of the next questions is how well is our network behaving over time?

There are plenty of different ways to ask this question, and plenty of different metrics to look at. For example, to track my bandwidth usage, I'm using LibreNMS, which is a pretty good SNMP front-end to query my router every five minutes to see how many packets I'm moving.


One network monitoring tool that I've discovered as part of this project is the RIPE Atlas. It is a world-wide network of measurement probes spread across the Internet, which they use to measure the health of the Internet as a whole, but also allow others to request measurements on it.

To get started, you can request a probe and if approved, they mail you the simple hardware (clearly based on a TP-Link router running their custom firmware) to plug into your network. Once it's powered on, the probe starts taking measurements from your Internet connection, and you start earning credits to spend on your own custom measurements.
For example, I requested the probe, and although I never got any kind of email, a DHL package showed up about 6-8 weeks later with the probe + some cables inside. Once I plugged it in and registered its serial number to my account, I'm now accruing 21600 credits per day for keeping the probe online, plus another 5000-50000 credits per day for "results delivered" which I presume is running other people's custom measurements.
I haven't come up with any long term custom measurements yet, but to give you a sense of scale, a single traceroute costs 60 credits, so running a traceroute to my network from 10 random probes costs 600 credits, and RIPE's traceroute report is pretty slick.
The main reason I haven't programmed any periodic custom measurements yet is because the probe comes with a set of "built-in" measurements where it automatically measures the latency and packet loss to its first and second hop, all the root DNS servers, and some Atlas infrastructure, which already answers most of my questions on how well my network is doing. I really should set up some user-defined queries to monitor my HTTP servers, but for now I'm just accruing credits.

You can see all the public information coming from my probe here. You can even order Atlas measurements specifically from my network to your network if you specify the measurement should be sent to probe #34742, which I find rather amusing.

One thing I noticed right away is that I'm seeing 100% packet loss (solid red) to a few IPv6 root DNS servers... This is actually because Hurricane Electric and Cogent have been having a peering dispute over IPv6 for... pretty much forever, so the IPv6 Internet is actually split brain and I'm just not able to reach some parts of the IPv6 Internet from my Hurricane Electric transit...

One of the perks of running my own autonomous system is that I'm able to work on getting myself blended transit from someone other than Hurricane Electric and fix this problem myself... (Anyone with blended IPv6 transit on the west coast want to help me out?)

The probe uses about 3kbps to take its measurements, so the network load from it is what I would describe as "pretty much undetectable" considering my main transit link hovers around four orders of magnitude higher than that. This plot is from LibreNMS for my "DHCP" /29 subnet, which I use for my Atlas probe and plugging in my laptop to a spare port on my router when I'm standing in the datacenter working on my rack.