Tuesday, April 3, 2012

Enabling QoS on WRT54G(L) Tomato

After my previous aside explaining the basic concept of buffer bloat and what is making consumer internet such a miserable experience at times, let's go on to talk about what you can do to improve the situation.

Consumer Internet connections have been getting progressively faster, but this growth has been more than off-set in the inappropriate growth of the buffer the modem uses to stage packets before being able to fit them through the inevitably finite upload bandwidth available.  As these buffers fill, packets start taking longer and longer to get uploaded, just as a longer line at Disneyland means you spend more and more time waiting in line, where if they had limited the ride queues to only 15 minutes worth of people, you could have gotten on and ridden Star Tours thrice over instead of doing nothing but waiting in line for Space Mountain.  Faster Internet connections mean that you can download and upload larger files in less time, but the vast majority of the time this isn't the property of your internet connection that you even care about.

When you're using an Internet browser, you usually care more about the latency of your Internet connection than your bandwidth.  What difference does it take if a picture loads in 50ms or 100ms when it doesn't even start loading until 2 seconds after you clicked on it?  Under heavy load, your modem becomes a black hole, where packets disappear to for several seconds before hopefully reappearing again out on the internet.  Even once the modem has run out of buffer and dropped a packet, it takes so long for the event of this packet being dropped to reach the sender (via (it not) being delivered to the receiver, who then gives up waiting for it before deciding that it really is lost) that the damage has already been done. TCP continues to merrily dump packets onto the Internet, and a whole bunch of control theory mumbo-jumbo shows that the system is very likely to oscillate and be unstable (if not entirely collapse, as happened several times in the ARPANET and NSFNet days).

So this needs to be fixed.  Jim Gettys et al. have already made remarkable progress on bringing the issue to light, and hopefully more future Internet device manufacturers will better follow the rules.  This is all well and good, but what can the home user do now to help mitigate this while waiting for someone to build a better modem?  Many routers, such as the WRT54GL and the newer WNDR3800, are build using open source firmware, meaning that researchers and home users can reach in up to their elbows right now and start tweaking knobs in their routers to fix the issue.  Another advantage of running a custom third-party firmware like Tomato comes to light...
Fixing the buffers on the modems would be the "better" solution, but the modern home router is intelligent and powerful enough that it can be used to mitigate the less-accessible modem buffer right now using a strategy based on prioritizing traffic.

The key to mitigating the modem's buffer is to keep the modem's buffer empty.  Using a service like SpeedTest or Netalyzr, figure out how fast your modem can push data out onto the Internet, and then use your (much smarter) router to ensure that you never give the modem packets faster than that.  By feeding it packets slower than it can upload them, we make sure the modem never has an excuse to start hoarding packets and destroying the uplink's latency, at the small cost of a few percent gross bandwidth.  This moves the critical buffer off of the modem and onto the router, which is very much more under our control.
Almost all third-party firmwares offer some form of Quality of Service system, which allows you to write rules determining how to classify traffic, and how much of your finite upload bandwidth to guarantee to each kind.  For example, in your typical Tomato installation, by visiting, you are given the choice of enabling QoS, setting the maximum upload bandwidth (which you should set to a few percent less than the upload result you got from testing), and then allocating bandwidth to up to ten different priorities of traffic (Ten is a little excessive; I can only come up with enough meaningful rules for five).  You also have the option of controlling in-bound traffic, but remember that this traffic has already been handled by the DSL bottleneck in our network, so I tend to not bother with download QoS on edge routers (but do use them to rate-limit guest access points inside my network).

One thing that is not particularly clear on the basic settings page is what the significance of the two numbers are for each class of traffic.  I will go into this in more detail when I talk about the tc tool running the back-end of this, but they mean "this class is guaranteed this much bandwidth" and "this class can borrow from higher priority classes up to this much bandwidth."  So in the screenshot above, I guarantee "high" priority traffic 20% of the bandwidth, but that it can go on to borrow unused bandwidth from "Highest" class up to 100%.  Instead looking at "low" and "lowest" classes, I guarantee them almost no bandwidth, and don't even let them borrow all of the bandwidth from higher classes, because these classes of traffic are background jobs where latency is unimportant so I'd rather let bandwidth go unused while making sure it's available for higher classes than try and have my BitTorrent cake and upload it too.

As for exactly how many classes and what bounds you use, that is more than anything else a matter of personal preference and experimentation. My rules are all written with the first priority being everyone else in the house not noticing how much I hammer the internet, but obviously your priorities may differ.

In conjunction with defining each classes' guaranteed and allowed  bandwidths, you need to tell your router what kind of traffic it should classify as each priority.  Tomato gives you a very wide range of choices, letting you assign specific IP addresses or Computers traffic classes, or even drill down farther and try and classify traffic based on their port destinations (web traffic to port 80, ssh to 22, etc) or even looking inside packets to try and identify which application created them (L7 or "deep" packet scanning).  Tomato will take each packet and, starting at the top, work its way down until it finds a criteria that matches, and assign that class, so the order of the rules on this page does matter.  For example, I give web traffic a high priority (since uploaded web traffic is usually short requests for content), but once a web connection transfers more than 512kB, I decide that the user is probably uploading something big, like a picture, instead of doing something latency-sensitive like requesting an HTML page.

Again, how you write these rules depend most on what kinds of traffic your network tends to produce and what of that traffic you care about most.  Getting the settings "right" on the first two screens is very much an iterative process based on qualitative measurements, lots of small changes, and watching results from the last two screens (and eventually asking the internal traffic control system exactly what is going on, but that's for my next blog post).
Tomato just wouldn't be Tomato without fancy vector graphics. You'll notice that the connection count includes a large number of "unclassified" connections; these are connections which the router sees, but which don't happen to leave the network, so they aren't important for the QoS system.
The details tab lists every connection that it is tracking and indicates how it is classifying it.  I find this useful when fishing for new rules, by trying to figure out what each connection is doing and deciding if it is important (or unimportant) enough to write a filter to reclassify it as something other than the default "medium."

Now to finish off this article on the matter, lets walk through my classification list and talk about my justifications for each one.  QoS design is a very qualitative art, so I don't only expect, but hope, that you will disagree with my decisions and have your own ideas on what you think is important.
The "Highest" class is guaranteed the lion's share of our network's upload bandwidth, but I only classify traffic as "highest" if it is time sensitive, so they inevitably won't use all 70% and will give away left-over bandwidth to lower classes, but this class gets first pick.  The first rule is specifically looking for any traffic originating from our Femtocell, which is a miniature cell phone tower giving us flawless reception routed through our internet.  Of course, cell phone VoIP traffic is very sensitive to latency, so we want to give it the utmost priority.  The other highest priority rule is for any traffic destined for port 53, which is the Domain Name System, which is needed to resolve web URLs into IP addresses.  Every time a user types a URL into a web browser, the computer needs to first use DNS to turn that URL into an IP address before it can even start loading the website, so this traffic should be handled before any other.  DNS traffic tends to be very small ("google.com" -> ""), so handling this traffic before anything else should be trivial.

Moving down the list, any packets that look like they're SIP (the VoIP system I use myself) and small web requests (which are usually destined for ports 80 or 443) are given "high" priority.  Again, the vast majority of web requests are small and trivial.  Remember that the majority of web traffic is from the server to the client (which is why your internet connection's download bandwidth is so much greater than your upload), so we are not dealing with the lion's share of web traffic which is the content you ask for, but only the initial requests ("Facebook, please give me these 12 jpegs to look at").  Ideally, we would also look at this traffic from the rest of the internet and filter it, but that must be done on the far side of the DSL connection bottleneck; a point well beyond our control and why download QoS is usually ignored.

While most out-going web requests will be very small, only asking for the contents of specific documents, some web requests can become very large.  While you will usually be looking at pictures uploaded by other users, you will intermittently decide to upload your own batch of photos.  It's not unusual to see jpegs that are half a megabyte or more, so uploading an entire album of photos can quickly turn into a web request on the order of 10s of MBs.  Of course, when you're uploading photos, you don't much care if it completes in 5 minutes or 5 minutes and 10 seconds, so while interactive requests for content are latency sensitive, these bulk uploads are very much less so, and should be treated as "Low" priority traffic.  Prioritizing the small requests for content over the large uploads of new content mean that you have the ability to open another tab while waiting for the upload to finish and continue to browse the web, without the upload making the Internet unbearably slow.

Finally, the last three rules are looking for traffic which is obviously large bulk uploads, and giving them the lowest priority.

Anything the router can identify as BitTorrent traffic, which is my P2P transfer client of choice, is clearly unimportant since interacting with that program is measured on the order of hours, not seconds.  This L7 classification rule does unfortunately depend on being able to read the BitTorrent packets, which is easily defeated by encrypting the torrent connections.  I personally make sure that any torrent clients on my network are configured to not encrypt traffic, so this single rule works quite well, but a more adversarial network may require more sophisticated rules.  Some peers will remotely require you to encrypt your traffic to defeat QoS on their end of the internet, but these users will also often use port 53, since it much be open to traffic for the web to work at all.  Adding the additional rule that any traffic seen on port 53 which is more than 2kB is clearly not actually DNS does an appreciably good job for this encrypted traffic.

Finally, I have the last rule that any connection which uploads more than 4MB must be of the lowest priority.  Of course, scenarios where a single connection could upload this much data while latency is still important do exist; any real-time video game will likely generate this much data during the course of a game, and would be inappropriately classified as lowest priority.  I do not see any gaming traffic on my network, so I ignored this possibility.  It's likely you may not be able to make this decision.

If the router has made it all the way to the bottom of this list of filters, and still hasn't managed to find a match for a connection, the QoS system will then look to the setting in the Basic Settings tab for the default traffic class. I decided to treat this traffic as the unknown middle-ground between latency-sensitive and obviously bulk traffic and classify it as medium priority, but others will argue that if they can't decipher what the traffic is, it must be either significantly lower or higher priority than anything matching a written rule, and change the default class accordingly.  Again, your choice for the default behavior is your own.

More even more detail, and probably a better design overall, Toastman in the Linksys forums has some excellent write-ups on the topic: QoS Tutorial, Example Rule Set.

So that's the glossy exterior of the Tomato QoS system.  Using just these four pages, you can write a set of rules which will do a decent job of making your Internet experience less painful in face of adverse circumstances.  In the next part of this series, I'll lift the hood on the QoS system and discuss the mechanics of exactly how the traffic is treated between when it is classified and when it is finally sent off to the Internet.  This next part of the system won't lend itself to be easily reconfigured in the Tomato firmware, but understanding the router's queuing behavior and being able to monitor the queue buffers can be useful while tuning your QoS rules in the web interface.


  1. I've used m0n0wall to do this before at the office. It can be easily ran on an old PC if someone just wants to try it out.

  2. That is another approach, which does give you much more flexibility, at the expense of more work and hardware cost. You can spend an incredible amount of time tweaking these rules in any regard.