About Corey Quinn
Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.
An IPv6 packet walks into a bar. Nobody talks to it.
Welcome back to what we're calling a networking in the cloud, a 12 week networking extravaganza sponsored by ThousandEyes. You can think of ThousandEyes as the Google maps of the internet. Just like you wouldn't dare leave San Jose to drive to San Francisco without checking to see if the 101 or the 280 was faster, businesses rely on ThousandEyes to see the end to end pads their apps and services are taking and for localized traffic stories that mean nothing to people outside of the Bay Area. This enables companies to figure out where are the slowdowns happening, where are the pile ups and what's causing issues. They use ThousandEyes to see what's breaking where, and importantly they share that data directly with the offending service providers to hold them accountable in a blameless way and get them to fix the issue fast, ideally before it impacts their end users.
Learn more at thousandeyes.com. And my thanks to them for sponsoring this ridiculous podcast mini-series.
This week we're talking about load balancers. They generally do one thing and that's balancing load, but let's back up. Let's say that you, against all odds, you have a website and that website is generally built on a computer. You want to share that website with the world, so you put that computer on the internet. Computers are weak and frail and often fall over invariably at the worst possible time. They're herd animals. They're much more comfortable together. And of course, we've heard of animals. We see some right over there.
So now you have a herd of computers that are working together to serve your website. The problem now of course, is that you have a bunch of computers serving your website. No one is going to want to go to www6023.twitterforpets.com to view your site. They want to have a unified address that just gets to wherever it has to happen. Exposing those implementation details to customers never goes well.
Amusingly, if you go to Deloitte, the giant consultancy's website, the entire thing lives at www2.deloitte.com. But I digress. Nothing says we're having trouble with digital transformation quite so succinctly.
So you have your special computer or series of computers now that live in front of the computers that are serving your website. That's where you wind up pointing twitterforpets.com to, or www.twitterforpets.com towards. Those computers are specialized and they're called load balancers because that's exactly what they do; they balance load, it says so right there on the tin. They pass out incoming web traffic to the servers behind the load balancer so that those servers can handle your website while the load balancer just handles being the front door that traffic shows up through.
This unlocks a world of amazing possibilities. You can now, for example, update your website or patch the servers without taking your website down with a back in five minutes sign on the front it. You can test new deployments with entire separate fleets of servers. This is often called a blue green deploy or a red black deploy, but that's not the important part of the story. But you can start bleeding off traffic to the new fleet and, "Oh my god, turn it off, turn it off, turn it off. We were terribly wrong. The upgrade breaks everything." But you can do that; turn traffic on, turn traffic off to certain versions and see what happens.
Load balancers are simple in concept but they're doing increasingly complicated things. For instance, you're a load balancer. How do you determine which of the 200 servers that you're in front of that all do the same thing because they have the same website and the same application code running on them, how do you determine which one of those receives the next incoming request?
There are a few patterns that are common. The first and maybe the simplest is called round robin. You'll also see this referred to as next in loop. Let's say you have four web servers. Your first request goes to server one. Your second request goes to server two. Server three and server four, and the fifth request goes back to server one. It just rotates through the servers in order and passes out requests as they commit.
This can work super well for some use cases, but it does have some challenges. For example, if one of those servers get stuck or overloaded, piling more traffic onto it is very rarely going to be the right call. A modification of round robin is known as weighted round robin, which works more or less the same way, but it's smarter. Certain servers can get different percentages of the traffic.
Some servers, for example across a wide variety of fleets can be larger than others and can consequently handle more load. Other servers are going to have a new version of your software or your website and you only want to test that on 1% of your traffic to make sure that there's nothing horrifying that breaks things because you'd fundamentally rather break things for 1% of your users then 100% of your users. Ideally you'd like to break things for 0% of your users, but let's keep this shit semi-real, shall we?
You can also go with the least loaded metric type of approach. Some smarter load balancers can query each backend server or service that they're talking to about its health and get back a metric of some kind. If you wire logic into your application where it says how ready it is to take additional traffic, load balancers can then start making intelligent determinations as to which server to drop traffic onto next.
Probably one of the worst methods you can use to determine how to pass out traffic to load balancers is random, which does exactly what you'd think because randomness isn't. There's invariably going to be clusters and hotspots and the entire reason you have a load balancer is to not have to deal with hot spots; one server's overloaded and screaming while the one next to it is bored, wondering what the point of all of this is.
There are other approaches too that offer more deterministic ways of sending traffic over to specific servers. For example, taking the source IP address that a connection is coming from and hashing that. You can do the same type of thing with specific URLs where the hash of a given URL winds up going to specific backend services.
Why would you necessarily want to do that? Well, in an ideal world, each of those servers is completely stateless and each one can handle your request as well as any others. Here in the real world, things are seldom that clean. You'll find yourself very often with state living inside of your application. So if you have a backend server that handles your first request and then your next request goes to a different backend server, you could be prompted to log in again and that becomes really unpleasant for the end user exper...