Check out the full unconventional guide here!
Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if wanting new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit and tell them Corey sent you, and watch for the wince.
Pete: Hello, and welcome to AWS Morning Brief. I am Pete Cheslock.
Jesse: I'm Jesse DeRose.
Pete: This is Fridays From the Field. Triple F.
Jesse: I feel like we've really got to go full Jean-Ralphio, Parks and Rec there. “Friday From the Feeeeeeeeeeild.”
Pete: Yeah, so we're going to need to get an audio cut of that and add some techno beats to it. I think that's going to be our new intro song.
Jesse: [imitates techno beats].
Pete: Yeah, we're going to take both of those things. I'm glad we got this recorded because that's going to turn into a fantastic song. So, we're back to talk about The Unconventional Guide to Cost Management. And this is the first episode, this is the first of a whole slew of these that we're going to be going through from the field, these different ways that companies can impact their spend. And no, it doesn't mean go and buy the cloud management vendor of the moment to look at your spend or fire up Cost Explorer. Those are all pieces of it, but broader things, the big levers, the small levers, the levers that don't actually go back and forth, but you turn and you would have no idea because it was designed by an Amazon UX engineer.
Jesse: Yeah, it's really important to call out that this discussion is looking at your cloud spend from a broader perspective and if you didn't get a chance to listen to our episode from last week, we did a little bit of an intro, framing this entire discussion. Go back and take a listen, if you haven't yet. Really talking about why looking at cloud costs through these different lenses is important. Why are you thinking about cloud cost, not just from the perspective of, “Oh, I'm going to delete these EBS snapshots,” or, “I'm going to tag all my resources,” but why is it important to think about cloud costs from other mediums?
Pete: Exactly. So, don't forget, you can go to and put your questions right in that box. Your name is optional. You can just leave your name blank if you don't want anyone to know who you are. Or if you want to say something really nice about me and Jesse, and you just feel a little shy—
Jesse: Aww.
Pete: —that's fine, too. But just put a question in there. And we're going to dedicate some future episodes to answering those questions and diving a little deeper for those that want to know a little bit more. But as being the first episode, we got to talk about something, so what are we talking about today, Jesse?
Jesse: Today we are talking about architecture and architecture context. Now, this is a really, really interesting one for me because the first thing that I think anybody thinks about when they think about cutting costs with their AWS spend is architecture decisions: something related to your infrastructure, whether that's tearing down a bunch of resources, or deleting data that's lying around. But there's a lot more to it than that context is everything. Knowing why your infrastructure is built the way it is, knowing why your application is designed the way it is, is really important to understanding your AWS cloud costs.
Pete: This is where I feel like the Cloudabilitys CloudHealth, CloudCheckr Cloud-whatever companies, their products, sadly, fall down. And similar for every Amazon recommendation engine inside of AWS, they all break down. They lack the knowledge and the context of your organization. I remember a really long time ago, I had installed CloudHealth for the first time, and it said, “Hey, we've identified all these servers. They're sitting idle. Do you want us to turn them off for you?”
Those servers were actually my very large Elasticsearch cluster. They were idle because if no one's querying them they don't do anything, but they sure do hold a lot of data, and they really do need to be available. So, please, please don't turn those off. But that same thing could happen if you were—you know, due to risk or compliance reasons, you had to run some infrastructure as a warm standby in another availability zone or region. Yeah, sure, it's not taking requests, it’s not doing anything, but that doesn't mean that it's not supposed to be running.
Jesse: And this is really getting at one of the first big ideas, which is: work with other teams within the company. Not just other engineering teams, but product teams, possibly also security teams to understand all of the business context for your application and for your infrastructure in terms of data retention, in terms of availability, in terms of durability requirements. Because ultimately, you as a platform engineer, or an SRE, or a DevOps engineer, or whatever the hot new title is going to be a year from now, you need to understand why the infrastructure exists, and you may see servers that are sitting around idly doing nothing, but that's your disaster recovery site that is required by the business, by a service level agreement to be available at a moment's notice if something goes wrong. And so it's really important to understand what those components are and how they work together to build your overall application infrastructure.
Pete: Yeah, that's a great point. I mean, having that knowledge that if you've been at a company for years, you've got a lot of this historical knowledge. People have come and gone, they've come, they've done things, they've implemented items, they've brought new features, they've gone. As companies grow may or not— may not be a single person who really truly understand the impact of various changes. I think we saw that most clearly when Amazon had their Kinesis outage: the amount of different services that were impacted was pretty large because it's just all too big for any one person to understand.
But that doesn't mean that you shouldn't always continually be working to understand those different usage requirements, and chatting with the non-tech teams. Product teams, I feel like are often ignored in startups because you don't really want more work, and that's what those product teams normally do, right? But they're going to have a lot of context.
I remember working in SaaS companies and looking at things like, “This? We don't use this anymore. There's no way we use this. I'm going to turn this off.” And then, I then say, well, the smarter minds prevail. I say, “Well, let me go talk to product people.” And they go, “Oh yeah. We can't get rid of that one super important API because this one client of ours paid us an obscene amou...