The Bufferbloat Crisis
Jan. 9th, 2011 04:08 pm[Warning: this one's mainly for the techies]
I confess that I'm way behind on my technical blogs, so everyone may already know this one. But I only found out about it today, when my mother (the least techie member of the family) sent me a link to a brief Mother Jones article which boiled down to "Death of the Internet -- news at 11". That led to a slightly overwrought and confusing blog post from Robert Cringely, on the subject of "bufferbloat", which he predicted is going to be the next year's big online problem. Fortunately, *that* led me to a long series of blog posts by Jim Gettys, who was the one that put all the pieces together, and who actually explains the problem in ridiculously gory detail.
The above link to the Gettys series picks up in the middle, but seems to be where he is transitioning from lots of posts about experiments, and begins to talk about what's going on. I commend the series to anybody who is inclined to the technical side of the Internet, understands terms like "TCP", "router", "buffering", and suchlike, and has some attention span -- it's not brief.
The upshot, though, is that the Internet appears to be Pretty Damned Broken at the moment. The problem mostly shows up in weird latency lagginess, especially when you're sharing a line with a high-bandwidth connection (such as video). The crux of the issue is a well-intentioned tragedy of the commons, an accident of Moore's Law. It appears that, since memory is so cheap, everyone is building Massively Huge Memory Buffers into pretty much every piece of networking equipment -- and unless those buffers are integrated with smart traffic shaping, they totally screw up TCP's traffic management. The result is that everyone is slamming way more traffic onto the Net than they should be, over-saturating a lot of connections and causing surprisingly bad packet loss and latency.
Or to put it more simply: there's no *good* reason why people see as much lagginess on the Internet as they do today -- it's heavily the result of building buffers into network equipment that individually make some sense, but in context make things worse.
At least, I think that's what it's saying -- I won't kid you, I haven't spent anywhere near enough time to really internalize his argument yet. But it does sound like there's a subtle but pervasive bug in the way people are building equipment, which is causing the Net to badly underperform relative to what it *should* be able to do, and the problem is getting steadily worse...
I confess that I'm way behind on my technical blogs, so everyone may already know this one. But I only found out about it today, when my mother (the least techie member of the family) sent me a link to a brief Mother Jones article which boiled down to "Death of the Internet -- news at 11". That led to a slightly overwrought and confusing blog post from Robert Cringely, on the subject of "bufferbloat", which he predicted is going to be the next year's big online problem. Fortunately, *that* led me to a long series of blog posts by Jim Gettys, who was the one that put all the pieces together, and who actually explains the problem in ridiculously gory detail.
The above link to the Gettys series picks up in the middle, but seems to be where he is transitioning from lots of posts about experiments, and begins to talk about what's going on. I commend the series to anybody who is inclined to the technical side of the Internet, understands terms like "TCP", "router", "buffering", and suchlike, and has some attention span -- it's not brief.
The upshot, though, is that the Internet appears to be Pretty Damned Broken at the moment. The problem mostly shows up in weird latency lagginess, especially when you're sharing a line with a high-bandwidth connection (such as video). The crux of the issue is a well-intentioned tragedy of the commons, an accident of Moore's Law. It appears that, since memory is so cheap, everyone is building Massively Huge Memory Buffers into pretty much every piece of networking equipment -- and unless those buffers are integrated with smart traffic shaping, they totally screw up TCP's traffic management. The result is that everyone is slamming way more traffic onto the Net than they should be, over-saturating a lot of connections and causing surprisingly bad packet loss and latency.
Or to put it more simply: there's no *good* reason why people see as much lagginess on the Internet as they do today -- it's heavily the result of building buffers into network equipment that individually make some sense, but in context make things worse.
At least, I think that's what it's saying -- I won't kid you, I haven't spent anywhere near enough time to really internalize his argument yet. But it does sound like there's a subtle but pervasive bug in the way people are building equipment, which is causing the Net to badly underperform relative to what it *should* be able to do, and the problem is getting steadily worse...
(no subject)
Date: 2011-01-09 10:29 pm (UTC)(no subject)
Date: 2011-01-09 10:34 pm (UTC)(In that respect, it's somewhat like the IPv4/IPv6 problem -- it's clear what's wrong, it's clear how to fix it, but since it requires *everybody* to work together to fix it, we've come right to the brink of Absolute Disaster and it's still not fixed.)
I do suspect things will gradually be improved, but it may well be a slow and painful process...
(no subject)
Date: 2011-01-10 12:53 am (UTC)(no subject)
Date: 2011-01-10 01:20 am (UTC)The difference is that it doesn't do any good for one person to switch to IPv6, but one router getting its buffers reduced might help everybody a little bit.
(no subject)
Date: 2011-01-10 05:54 pm (UTC)And as a problem, 'too much memory' is one of the better ones to have. Just pull/reallocate it.
(no subject)
Date: 2011-01-10 08:51 pm (UTC)(no subject)
Date: 2011-01-10 08:52 pm (UTC)(no subject)
Date: 2011-01-10 09:00 pm (UTC)What might need coordination is adopting ECN (explicit congestion notification), which has been implemented for 10 years, but has never been widely deployed, because it can lead some old firewalls to drop packets. Even there, the coordination required is less "everybody do this on Friday" and more "everybody take your ancient routers to the dump by Friday".
(no subject)
Date: 2011-01-11 05:23 am (UTC)(no subject)
Date: 2011-01-11 12:50 pm (UTC)But it isn't, really. The large buffers don't actually help anybody, and at least some people can benefit by shrinking the buffers in their own hardware.
(no subject)
Date: 2011-01-11 01:33 pm (UTC)(no subject)
Date: 2011-01-14 06:02 am (UTC)1. The problem is that one of the assumptions of TCP (the protocol that ensures that data gets delivered, for the non-tech audience) is that things aren't buffered. But they are, and that breaks some of TCP's algorithms in ways that cause re-transmissions. So, yes, I think you understand it.
2. For any given pair of points on the Internet (say, www.livejournal.com and your computer), the problem can only be entirely solved if all of the individual links between the two points fix the problem. It's not enough for you to swap out your router for a new one if every other router between you shows the problem.
(no subject)
Date: 2011-01-14 12:56 pm (UTC)No, not really. Er...strictly speaking, yes; but each link that removes bufferbloat reduces the problem. The problem is that, under load, a buffer increases the latency on a link. If a link's bitrate is N bits/second, and the buffer is M bits, then a full buffer adds M/N seconds of latency. That extra latency on each link adds up over the whole path, so any link that reduces its buffers reduces the total latency of the path.
In practice, I suspect bufferbloat is almost entirely a function of edge equipment—home routers, cable modems, and the stuff at the ISP's POP. Core routers have generally been built to route at wire speed, with no need for buffers. At least, that's how things were done when I was participating in the IETF around 2001. Bit rates have not been keeping up with Moore's Law, so I expect wire-speed routing is easier these days, not harder.