Quick note, this episode isn't sponsored. I'm building a new kind of IDE for the AI error called Rex. If it's interesting, the link is in the description. OK, let's unpack this. You hear serverless, right? And the pitch is it's basically magic, right? No servers to manage infinite scaling. And the best part the the part that gets everyone to sign up is you only pay for what you use. Sounds, well, perfect. It does sound perfect until the end of the month rolls around.
Exactly. The magic fades pretty fast when you open that AWS invoice and realize pay for what you use actually means pay for every millisecond you didn't realize you were wasting. That is usually how the story goes. You start with this, this dream of efficiency, and you end up with a bill that makes your CFO want to have a very serious, very uncomfortable conversation with you. So that is the mission for this deep dive. We are strictly looking at the
bottom line. Today we've pulled together a stack of technical reports, the latest AWS pricing guides, and some really rigorous architectural analysis from experts like Cloud Zero, Edge Delta, and Lumigo. We want to stop guessing how AWS Lambda charges work and start engineering them to be cheaper. And to do that, you have to adopt A specific mindset. It's not just about writing code anymore. It's about understanding the
machinery underneath the code. It's about, you know, looking at your logs, seeing where the money is bleeding out and and sometimes realizing the best way to save money on Lambda is to not use Lambda at. All I love that the best Lambda is no Lambda. But OK, before we get to the philosophy, we have to start with the paradox. There is this core tension in the research we looked at.
It's this idea that making your function smaller, you know, giving it less memory, should logically save you money. But we're finding that might actually double your bill. It's a classic trap. It is intuitively correct, less resources equals less money. But in the world of Lambda it is technically wrong. We are going to dig into that because it kind of breaks my brain a little bit, but first we
have to eat our vegetables. We need to look at the billing equation itself because if you don't understand the unit of measurement, you can't optimize it. You. Have to know the rules of the game to win it. So lay it out for me when I look at that Bill, what are the actual levers? There are essentially 2 main levers. First, you've got requests. This is just a flat fee. OK? Every time your Lambda wakes up to do something, whether it succeeds fails. Times out. That's one request.
Currently the going rate is like $0.20 per million request. Which, to be honest, sounds incredibly cheap. I mean, $0.20 for a million in vacations? I feel like I could just ignore that. For many people you can. I mean, unless you're operating at massive massive scale like ad Tech Levels or something, that request cost is often just noise. So where's the real money? The real money, the place where budgets go to die, is the second lever duration. The time the code is actually running.
Precisely. But here is where it gets a little nuanced. You aren't just paying for time. You don't ay per second. You are paying for a compound unit called GB seconds. OK, GB seconds unpack that for me. It's basically a multiplication game. You pay for the amount of memory you allocated to the function. Multiply by how long the code runs. O let's say you have a function
configured with 1 gig of RAM. OK, if you run that for one second, the price is, let's call it X. But if you configure that function with two gigs of RAM and run it for that same second, the price is 2X. So you're paying for the size of the container and the time it exists. Exactly. It's volume times duration, and there's a crucial detail here that changed fairly recently. It used to be that AWS rounded your duration up to the nearest 100 milliseconds.
Oh right, I remember this. So if my code finished in like 12 milliseconds, I was paying for 100. You paid for 100. If it ran for 101 milliseconds, you paid for 200. There was so much waste. That feels like the old mobile phone plans. Were you paid by the minute? It was exactly like that, but now billing is rounded to the nearest single millisecond. Wow, so that changes the incentive structure completely? It does. It means micro optimizations
actually pay off now. Before shaving 20 milliseconds off, your code was just vanity, didn't change the bill. Now it's direct savings. Every millisecond you cut is money kept in your pocket. However, nothing is ever purely good news, is it? There was a scary note in that Edge Delta report about a new tax involving the initialization phase. Yes, the init billing change this kicked in around August 2025. Right, so walk me through the life cycle. A request comes in. What happens the?
Lambda life cycle has two distinct parts. First you have the in it. This is the cold start. OK, the container has to spin up, the OS loads, it downloads your code, and it starts the runtime. Then once that's ready, you enter the invoke phase where your actual handler function runs. For the longest time AWS didn't charge for that first part right? The in it was on the House it.
Was a free lunch and developers loved it, especially those using heavy languages like Java or C#. You could have a massive Spring boot application that took 5 or 6 seconds just to wake up and AWS just ate that cost. But that's gone now. Gone. You now pay for the initialization phase too. So if you're running a heavy Java app that takes 5 seconds to wake up, yeah, you are paying for those 5 GB seconds every single time a cold start
happens. Ouch. So looking at your logs becomes critical here. You need to see how long that init phase is actually taking. Exactly. If you check your Cloudwatch logs and see a massive spike in duration during cold starts, you have a problem. You might need to look at mitigation strategies like Snapstar. Which is like a memory snapshot. Right, right. It resumes instantly. Or you could use provision concurrency. That's where you pay to keep them warm. You can, yes.
That's basically paying AWS to keep a certain number of environments warm and ready to go. That sounds like the solution. Just pay to keep the lights on. It solves the latency issues, sure, but be very, very careful. You are effectively moving from a serverless paper use model back to a server model where you're paying per hour. So if your traffic drops at night, but you're paying for provision concurrency, you're just burning cash.
So it's a double edged sword. OK, so we have the billing basics down. Request counts are cheap, duration is where the pain is, and we're paying for init time now. That's the foundation. Now I want to circle back to that paradox you teased at the beginning. The aha moment regarding memory. The memory duration paradox. This is where most people get Lambda optimization completely wrong. So play devil's advocate with me. I'm a developer. I'm looking at the console.
I see a slider for memory. Logic tells me I'm paying for GB seconds. If I cut my memory from 1 gig to 512, I'm paying half the rate per second, therefore I save 50%. Why is that wrong? It's wrong because in AWS Lambda you cannot decouple memory from CPU. Memory equals power. Memory equals power. When you slide that memory toggle in the console, you aren't just giving the function more RAM to store variables, you are proportionally allocating more CPU cycles and network bandwidth.
Oh interesting, so 128 megabyte function isn't just small memory, it's weak CPU. It is incredibly weak. You are getting a tiny sliver of a processor. In fact, the sources highlight a very specific magic number you want to remember. It's roughly 1769 millibyte. That is a suspiciously specific number, 1769. It is at around 1.8 gigs of memory. AWS allocates your function the equivalent of 1 full VCPU. And below that. Below that you are fighting for
fractional CPU time. At 128 millibyte you might only be getting like 7% of a core. So let's play that out in a real scenario. Say I have ACPU intensive task like resizing an image or parsing a massive Jason file. If I put that on 128 millib setting. It's going to struggle, yeah. It might take 10 seconds to process because the CPU is just totally bottlenecked. You're paying a low rate, sure, but you're paying it for 10 long seconds. What if? I bump that memory up to two gigs.
Then you have a full VCPU. That same task might finish in what, 200 milliseconds? Wait, so let me do the mental math. The rate per second went up maybe 10 or 15 times, but the duration dropped by like 50 times. Exactly. You are paying a higher rate for a much shorter time. When you multiply it out the total cost, the GB seconds is actually lower. At the higher memory setting you get the job done faster and cheaper.
That is wild. It completely flips the cloud optimization mindset of turn it off or turn it down. It does, yeah, but there is a catch. This works for CPU bound tasks. If your function is just, you know, waiting for a database to respond, adding more CPU doesn't make the database faster. Right, you're just paying more to wait. You're just paying more to wait so. How do I know? I mean, I can look at the logs, right? Yes, start with your logs. Look for two things.
First, look at Max memory used. If you provision 2 gigs but your function only ever uses 100 megs you might be over provisioned. But again, remember the CPU link right? Second, look for time outs. If you see your function timing out at the lower memory settings, it means it's just not powerful enough to finish the job. You are paying for the execution up to the time out and getting 0 value from it. So do I just guess let's try 1 1/2 gigs today and see what happens?
Please do not guess you'll go crazy trying to manually benchmark this. There's a fantastic open source tool mentioned in the reports called AWS Lambda Ower Tuning. Ower tuning? Is that an AW service? It's a community tool, but it uses AW step functions. You deployed it into your account, you oint it at your function and you tell it. Test this function at 128 million beat 2565121GB and 2GB. And it just runs them all.
It runs them all in parallel, measures the execution time and the cost for each, and then this is the best part. It plots a curve on a graph. You can visually see exactly where the sweet spot is. That seems like a no brainer. Every engineering team should be running that as part of their CICD pipeline. Absolutely. It takes 5 minutes to set up and it can save you 2030% on your
bill permanently. Speaking of easy wins, we have to talk about a hard The reports mentioned that if you're running on the default settings, you're probably on old hardware. Likely yes. By default many functions still run on by 86 architecture. You know, think Intel processors, but AWS has their own custom silicon called Graviton. The ARM based processors.
Right, Graviton 2 or Graviton 3 for almost all interpreted languages, Python, Node, JS, Ruby's Switching to Graviton is literally just changing a drop down menu in the settings. And the benefit? Usually about 20% better price performance instantly. It's cheaper per millisecond and often faster. And no code rewrite. For those scripting languages, usually 0.
If you're using compiled languages like Rust or Go, you do have to recompile for ARM, but for most people it's the easiest money they'll save all year. So step one, check your logs and verify memory with power tuning. Step 2, switch to Graviton. Now let's get into the code itself, because you can have the best hardware in the world, but if you write bad code, it's still going to cost you. True. And when we talk about Lambda code efficiency, we have to talk about scope.
Scope. You mean like variable scope global versus local? Exactly, this is the number one coding mistake I see in serverless functions. Remember we talked about the init phase and the invoke phase. Right init is the startup, invoke is the handler running. The handler function is what runs on every single request, but any code you write outside the handler function runs only once during the init phase. OK, so this is about where you put your heavy lifting. Give me a concrete example.
Let's say you need to connect to a database. A classic mistake is putting the DB dot connect line inside the handler function. I can see why people do that. I received a request. I need the database let me connect. Logical, but so expensive. If you do that, your code has to open a new fresh connection to the database every single time a user makes a request. That has to do the handshake, authenticate, establish the socket. Which takes time.
It takes time, latency you pay for and it puts massive stress on the database. So what's the fix? You move that connection logic to the global scope outside the handler. You initialize your database clients, your AWS SDKS, your secrets, all of it at the top of the file. So it runs during the init phase. Yes, and here's the magic AWS reuses containers. If a second request comes in five seconds later, AWS will likely use the same container.
Your handler runs, but it sees the DB client variables already populated. It skips the connection and goes straight to business. That's the warm start. Exactly, you are caching the connection. It's faster and cheaper. But you mentioned stress on the database. I feel like I've heard horror stories about Lambda and relational databases like MySQL or PostgreSQL specifically. Well, the stories are real.
It's the connection storm. See, Lambda scales pretty much infinitely if your marketing team sends a push notification and 10,000 users open your app at once. Lambda spins up 10,000 concurrent functions. And if each one tries to open a connection to my poor little PostgreSQL instance, boom. The database runs out of memory, rejects connections and crashes. And your service goes down and you still pay for the 10,000 lambdas that failed. Nightmare scenario. So what's the fix?
Do we just not use relational databases with serverless? You can, but you need to meet mediator. That's where RDS Proxy comes in. It's a managed service that sits between Lambda and the database. Like a bouncer. Exactly like a bouncer, it pools the connections. So even if 10,000 lambdas wake up, RDS proxy might only keep 50 efficient connections open to the database and just route the traffic through them. That's amazing.
It prevents the database from dying and it reduces the time your Lambda sits idle waiting for a handshake. OK, so we've optimized the memory, the code scope, protected the database, but one of the most provocative ideas in the research was this concept of Lambda less architectures this. Is my favorite part. It requires really thinking out-of-the-box. The cheapest millisecond of compute is the one you don't use.
Which sounds like a Riddle. It means we often use Lambda as glue where we don't need to. Like let's say you have an API endpoint that just receives a contact us form and saves it to Dynamodb. Sure, standard operating procedure. API Gateway triggers A Lambda. The Lambda parses the JSON, maybe validates it and writes it to Dynamo. But why? Why pay for compute just to move data from point A to point B?
API Gateway is perfectly capable of writing directly to Dynamo DB. Wait, really without any code in the middle? Without any Lambda code, you use something called VTL templates, Velocity Template Language or newer direct service integrations. You configure API Gateway to say take the body of this request and put it in this Dynamo DB table. So you cut the Lambda out entirely. Completely the request hits API Gateway. API Gateway talks to the database and response.
You paid $0.00 for Lamb to compute. You also get lower latency because there is no cold start. I have to admit though, VTL I've seen it, it looks a bit nasty. It is not developer friendly. I will grant you that it has a learning curve, but for high volume simple endpoints it is worth its weight in gold. So the mindset shift is, is this function adding value or is it just transport? Exactly. If it's just transport, delete
the code. I love that now another pattern that came up for cost saving is using queues, specifically SQS. But again, this feels counterintuitive. I'm adding more infrastructure, a queue to save money. It sounds wrong, but goes back to batching. Let's look at the math without a queue. If 1000 requests come in simultaneously, you invoke 1000 lambdas. You pay for 1000 in it's 1000 execution overheads. But if you put an SQSQ in the middle, the requests pile up in the buffer.
Then Lambda can wake up and say give me a batch of messages. How big of a batch? Up to 10 or even 10,000 with some configurations, but let's say 100. So now you process 100 user requests with 1 Lambda invocation. Oh, I see. You amortize that startup cost, that expensive in IT and network setup across 100 records instead of 1. Exactly, it is drastically more efficient, but there was a historical gotcha. Here which was. Partial failures. Let's say you grab 100 items.
You process 99 of them perfectly, but item number 42 is malformed and causes an error. Does the whole batch fail? Do you have to rerun everything? In the old days, yes, the whole batch would fail, the messages would go back to the queue and you'd reprocess the 99 good ones again. Wasted money. That sounds messy.
But AWS fixed this with a feature called Report Batch Item Failures. Basically your Lambda can return a specific response saying hey I processed 99 of these perfectly, but here is the ID of the one that failed. And the queue handles the rest. The queue deletes the successful ones and only keeps the bad one for a retry. That is huge so you don't waste money reprocessing successful work. Precisely, And if you want to take filtering a step further, maybe you don't even want the
data to reach the queue. You should look at Eventbridge Pipes. Pipes. That's a newer service, right? Relatively new. Imagine you have a stream of data coming in, maybe transactions, but you only care about transactions over $100. Historically, you'd trigger a Lambda for every single transaction. Right code to check is amount 100 and if not just exit. But you still paid for the invocation just to say no, right? Exactly. You paid to say no.
With Eventbridge pipes. You can put a filter rule before the Lambda. The pipe checks the data payload. If it's under $100, it drops it instantly. Your Lambda never wakes up. You never pay. That connects back to the cheapest code is no code idea. It really does filter at the infrastructure level, not the application level. OK, I want to pivot to a hidden cost that seems to catch everyone off guard. The sources call it the VPC trap. Oh, this is a painful one.
Yeah, I have seen grown engineers cry over this bill. So the scenario is, I'm a responsible engineer, I want security, so I put my Lambda inside a VPC. It feels like the right thing to do. It does. It feels secure. But here is the catch. By default a Lambda inside a private VPC cannot talk to the Internet. It is locked in a padded room. OK, but often your code used to call a third party API or even public AWS services like Dynamo DB or S3. Yeah, to get out of that private
room, it needs a door. And that door is. A Nat gateway. And I'm guessing Nat gateways aren't free. Far from it. They're one of the most expensive line items on many AWS bills. You pay an hourly charge just for it to exist. Roughly 30 to $40 a month per availability zone. So for high availability, that's over 100 bucks a month before you even send anything, right? But the killer and the data processing fee you pay per GB of data that passes through that gateway.
So if I'm processing a lot of data, say downloading images from S3 or reading from Dynamo DB and I'm routing it through this Nat. You paying a tax on every single byte. I have seen bills where the actual Lambda compute was $50 and the Nat gateway charges were 500. That is brutal. So what is the fix? Do we take the Lambda out of the VPC? That is the simplest fix if you don't strictly need private networking. If you aren't connecting to a private database, just take the
Lambda out of the VPC. It gets public Internet access for free. But what if I do need the VPC? What if corporate security requires it? Then use VPC endpoints. Out of those help. A V PC endpoint is like a secret tunnel. It allows you to talk to AWS services like S3 or Dynamo DB without leaving the AWS private network. The traffic never hits the Nat gateway and the cost for S3 and Dynamo DB specifically the gateway endpoints are completely
free $0.00. Free is my favorite price, so check if you're routing traffic through a Nat that doesn't need to be there. Absolutely. It's one of the first things I've looked for at a cost audit. We have covered a lot. Memory, math, code, scope, queues, networking, traps. There is one last area I want to touch on orchestration. What about when we need to wait for things? This is the golden rule. Lambda is terrible at waiting. Because we pay for duration.
Right, if you write code that says sleep 1000 row pause for 10 seconds because you are waiting for an API response, you are literally burning money while the CPU does absolutely nothing. You're paying for a taxi to sit in the driveway with the meter running. Exactly. So what should you use instead? AWS Step Functions. This is the state machine service. Yes, it allows you to coordinate steps visually.
You can have a state that says wait for 10 seconds or wait for a call back from this external system. And the billing there. For standard workflows, you pay per state transition, basically per steps you take, but the time spent in the state. The waiting time is free. You can wait for a year and it costs $0.00. That is a massive difference compared to paying per millisecond. It is so the architectural pattern is Use Lambda for a compute, transforming data,
calculating things. Use step functions for flow, waiting, retrying, branching logic. Don't mix them up. I like that distinction. Lambda for compute step functions for flow. And just a quick note, if you have high volume really fast workflows, look at Express workflows and step functions. They were cheaper for high throughput though that they don't have the wait forever for free benefit, but for most orchestration get the logic out of the Lambda. This has been incredibly comprehensive.
Let's try to summarize this into a checklist before people go audit their AWS accounts. Sure, let's break it down #1 right? Size your memory. Don't assume smaller is cheaper. Use the AWS Lambda Power Tuning tool to find the sweet spot. Remember that 1.8 GB threshold #2 hardware? Switch to ARM based Graviton processors. It's an almost guaranteed 20% saving for most languages. #3. Think out-of-the-box with
batching. Put an SQSQ in front of your Lambda process messages in bulk to amortize the startup costs. And the architecture stuff. Use direct integrations. If you are just moving data, see if you can skip Lambda entirely and please watch out for that Nat Gateway tax. Use VPC endpoints wherever possible. And finally stop waiting inside your functions, use step functions. Exactly.
Don't pay the taxi to wait. You know what strikes me about all this is that cost optimization in serverless isn't really about writing cheaper code in the traditional sense. It's not about writing a more efficient sorting algorithm. That's the key insight. In a traditional server environment, efficient code meant using less RAM or fewer cycle. In serverless, efficient code is often about architecture. It's about knowing when not to
use the compute service. Which leads to our final thought for you, the listener. We've given you a lot of tactics, but what's the big strategic take away? I'd leave you with this. The ultimate goal of serverless optimization is to delete your code. Every line of code you write is a line you have to debug, maintain, and pay to execute. If AWS has a service that can do the job for you, use it. That is a great question to chew on. Are you reinventing the wheel
and paying for the privilege? Ask yourself, are you writing code that AWS has already written for you? Thank you so much for breaking this down. This was a true deep dive. My pleasure. And to our listener, good luck with those bills. Go crack the code. We'll see you on the next deep dive.
