Design Local Delivery Service - GoPuff - System Design Interview Series | Programmers podcast

00:00

Quick note, this episode isn't sponsored. I'm building a new kind of IDE called Rex because existing ones make it hard to work across multiple projects in parallel. I'm sharing it to get feedback from listeners. I'd really love to hear your thoughts. The link is in the description. And now let's move on with today's super interesting episode.

00:20

It is 11.0 PM on a Tuesday. You are sitting on your couch, you're 3 episodes deep into a show you have definitely seen before, and suddenly it hits you. You need chips. And it is never just chips, is it? It's a very specific annoying craving. Exactly. It's not just salty potato. It is specifically the jalapeno cheddar kettle cooked chips. The ones in the purple bag.

00:43

You know the ones, of course. So you pull out your phone, you open an app, maybe it's go puff, maybe it's gorillas or Gateer. You tap a button and the app says 15 minutes. Which, if you actually stop to think about the logistics involved, is. It's absolutely absurd. It feels like magic, but you and I know better. We know that underneath that magic is a terrifying, complex web of logistics, databases and code. Right, it is not just a driver driving fast.

01:07

It is a system that has to know exactly which dusty corner of which micro warehouse has that specific bag of chips, and it has to promise it to you before someone three blocks away grabs it. And. It has to do all of that math. All of those checks in what, milliseconds? Maybe 200 milliseconds. If it takes longer than that, you have probably already closed the app or switched to a competitor.

01:30

Exactly, and that high stakes, high speed environment makes this specific problem a goldmine for engineering interviews. So that is our mission today. OK, We aren't just analyzing how these apps work from the outside. We are going to build 1. We are entering the simulation. This is the system Design mock interview. We are going to walk through a classic senior level prompt design a local delivery service. And we've pulled together well a whole stack of resources for this.

01:58

We're looking at system design guides, some pretty dense research on database concurrency, even some real world logistics papers. We're going to need them. This is one of those problems that looks easy on the surface. I just want a bag of chips, but turns into a nightmare as soon as you look at the scale. Oh yeah, the goal is simple.

02:13

By the end of this hour, you, the listener should be able to walk into a whiteboard, interview at a top tech company and explain how to handle inventory scaling, heavy read traffic, and the absolute nightmare that is the double booking problem. And hopefully get hired. Hopefully. So. Let's set the scene. We walk into the interview room. The air conditioning is humming. The whiteboard is fresh. The marker smells like chemicals.

02:38

And the interviewer looks at us and says design a system for instant delivery and the panic sets in. Where do we even start? You start by taking a deep breath and not building everything. What do you mean by that? The biggest mistake candidates make, and I see this all the time when I mock interview people, is that they try to boil the ocean. OK. They try to design the user registration flow, the credit card processing, the driver routing algorithm, the notification system, the push

03:07

notifications, and then. Run out of time? They run out of time in 10 minutes. In a 45 minute interview. You have to be ruthless. You have to draw a box around the problem. We call this scoping. You need to clarify the requirements and cut anything that isn't the core challenge. OK, let's practice that. If we are building a Go Puff clone, what stays in the box and what gets thrown out? Well, let's look at the problem. What makes this hard? Is it signing up a user? No.

03:31

No, that's a standard CRUD app. Create, read, update, delete. It's You know it's a solve problem. It's boring for an interview. OK, so user profiles are out. What about payments? That seems pretty important. It's critical for the business, but for a system design interview, it's boring. Boring. How can payments be boring? From a design perspective, yes. You aren't building Stripe or PayPal, you are just integrating

03:57

with them. You draw a box on the whiteboard, you label it payment gateway and you move on. So you just assume it. Works. You assume it works. Do not waste 5 minutes discussing credit card validation or PCI compliance unless the interviewer specifically asks for. It got it. So we are laser focused. What about the drivers? I feel like the routing getting the driver from point A to point B efficiently is the hardest part. Don't we need to solve that?

04:20

It is a hard problem. It is the traveling salesman problem. It is a massive mathematical challenge involving graph theory and optimization. Right, And that is exactly why you throw it out. Laughs. You just ignore it because it's too hard. For this specific interview slot, yes. If you get bogged down trying to calculate the optimal route for a driver to hit 5 houses in a specific order, you will never get to the database architecture. So you just hand wave it away.

04:45

You just tell the interviewer. I'm assuming we have a separate downstream service that handles routing logistics You delegate. OK, fair enough. We are delegating the hard math to another team. So what is left? What are the core puzzle we are actually solving here today? We care about two things, availability and ordering. OK. Can the customer see if the item is in stock right now based on where they are standing and can they lock that inventory and buy it without the system crashing?

05:12

Availability in ordering it sounds simple. It sounds simple until you look at the numbers. Let's talk scale. Yeah, How big is this hypothetical app? Are we building this for my neighborhood or for the world? To impress the interviewer, you always want to aim high. If you design a system that works for 100 people, you are designing a toy. If you design one that works for 100 million, you are operating at a staff engineer level. So let's go staff level. What are our parameters? Let's.

05:39

Assume we are operating globally, we have maybe 10,000 distribution centers or DC's. Whoa wait, 10,000 Amazon has what, a few 100 fulfillment centers? 10,000 sounds insane. It does, but remember these aren't those massive million square foot warehouses with robots and conveyor belt. These are micro warehouses, dark stores. Exactly. A converted garage in Brooklyn or a basement in London. Small footprint but high density. They are everywhere. OK.

06:07

And the items. Let's say each one holds maybe 5000 to 10,000 unique items, but across the whole network our total catalog is maybe 100,000 different items. OK. And the traffic, How many people are buying chips? Let's push for 10 million orders per day. 10 million orders. Let me do some napkin math here. We have 10 million orders. There are 86,400 seconds in a day. So that is roughly what, 115 orders per second? On average. But averages are a trap. Why?

06:35

Because nobody orders chips at 4:00 AM on a Tuesday. The traffic is almost 0, but everyone orders lunch at noon and everyone orders snacks during the Super Bowl halftime show or the season finale of a big show. So the peak traffic isn't 115 orders per second. No, your peak traffic might be 5000 or 10,000 orders per second and you have to design for the peak, not the average. If you design for the average, your system crashes every single lunch hour. Every single time.

07:04

And that is just the orders, that is the rights to the database. Right. Think about how you use the app. You don't just open it and buy. You scroll. You search for ice cream. You check candy. You click on item, look at the picture, check the price, go back. You are generating dozens, maybe hundreds of reads for every single write. O if we have 10 million orders, we might have a billion read requests per day.

07:24

Exactly O we are looking at a system that is incredibly read heavy but has these terrifying bursty spikes of write traffic that absolutely cannot fail. That is the constraints profile, high read volume, bursty write volume, and this is critical low latency. Right. If I tap chips, I need to know if they are in stock now. And not just low latency accuracy, we need strong consistency. Can you define that for us? What does that mean in this context?

07:54

It means if the app says in stock, it must be physically on the shelf. We cannot sell the same physical bag of chips to two different people. We cannot double book inventory. OK, so we have our constraints. Fast reads, strictly consistent rights, massive scale. Now we have to actually draw the shapes on whiteboard. We need to define our data model the nouns and verbs of the system. This is where I see a lot of

08:16

people trip up immediately. They grab the marker and draw a box called items and in that box they put quantity. That seems logical. I mean, I have an item, it has a quantity. Why is that wrong it seems? Logical. For a grocery list on your fridge, it is catastrophic. For a distributed system, you have to distinguish between an item and inventory. OK, pause. Aren't they the same thing? If I have a bag of Cheetos, that is the item and it is also the

08:40

inventory. Think of it like object oriented programming, or even simpler, think of it like a menu versus a meal. Go on an item is the definition. It is the abstract concept of Flaming Hot Cheetos. 8 oz bag. It has a name, a description, a photo of the bag, a nutritional label and a UPC code. And that information is static. It's the same everywhere. It is the same whether you're in New York, London or Tokyo. A bag of Cheetos is the bag of Cheetos. That information is the class or

09:09

the definition. It lives in a global product catalog service. And it rarely changes. It rarely changes. Maybe once a year they update the packaging photo. You can cache that data aggressively and put it on ACDN content delivery network. You almost never have to query the master database for the picture of the chips. So what is inventory? Inventory is the instance. It is the physical reality of

09:30

that item at a specific place. It is a row in a database that says a distribution center #four O2 in Seattle. On shelf B, there are 42 bags of item hashtag 999. I see is global inventory is hyperlocal. Exactly. And inventory is volatile. It changes every second. Every time someone buys a bag, that number ticks down. Every time a delivery truck arrives to restock, that number ticks up. This distinction matters because of how we store it. Right. Precisely.

09:59

You don't want to store the product description, the text, the photo URL in the inventory table. That is a waste of space. The inventory table should be lean and mean. So it's a high performance table. It should basically just be 3 columns, Distribution Center died, item died and quantity. That makes sense. It is a mapping table, so we have the user, the distribution center, the item, and the inventory. Now how do these talk to each other? What does the API look like?

10:24

We need to keep it simple for the interview. We really only need 2 main endpoints to satisfy our scope. First we need get T availability. And what does that take as input? Just the item ID. No, and this is the trap. If I just ask the server, do you have chips? The server has to ask where are you? Right. It doesn't help me if there are chips in Chicago if I am in Miami. So the input must include the user's location, latitude and longitude. OK, input lat, long output.

10:51

Yeah, a list of items and there's specific quantities available to that specific location. Correct. And the second endpoint. POST order. This is where the magic happens. The input is the user ID, the location again, and the list of items they want. The output is essentially success or fail. OK, we have the box defined. We have the data model. Now let's start building the flow. I open the app. I want to see if I can get my chips. What happens? First, the first step is the

11:18

read path. The user opens the app and the request hits our API gateway. Now, before we can check inventory, we have to figure out where to look. We need to solve the nearby problem. Right, which warehouses are close enough to deliver to me in under an hour? Exactly, if we have 10,000 warehouses, we can't check all of them. That would be insanely slow. Imagine querying 10,000 database tables just to load the homepage. The spinner would spin forever, so we need a dedicated nearby

11:43

service. This service does one thing, it takes a lot long and returns a list of distribution center IDs that are relevant. How does that actually work? Do we just draw a circle around the user? Essentially, yes. In a simple interview answer you would use geospatial math. You store the warehouses in a database that supports spatial queries, like Postgres with the POST GIS extension. I've heard of POST GIS that lets you treat location like a data type, right?

12:10

Exactly, instead of just storing numbers you store a point and then you can run a query like select from warehouses where STD within location user location 5 miles. Wait, STD within that sounds fancy. What is that doing? It stands for a spatial type Distance within. Under the hood, the database isn't calculating the distance to every single warehouse. That would be way too slow. It uses a spatial index. Break that down for me. What is a spatial index?

12:37

OK, think of a regular index like a phone book. It's sorted alphabetically so you can find a name fast. A spatial index is kind of like a grid. Imagine laying a big grid over the map of the world. OK, each square of the grid has a list of all the warehouses inside it. When your request comes in, we figure out which square you are in. Then we only have to look at the warehouses in that square and maybe the 8 squares touching it. So we ignore the warehouses in

12:59

the other 99% of the world. Exactly, it makes the look up incredibly fast. Common implementations are things like quad trees or geohashes. OK, so we use a grid system to find the candidates. We get a list of maybe what 3 or 4 warehouses that are within 5 miles. Right now in the interview you might mention a constraint check here the prompt usually says one hour delivery. Does 5 miles equal 1 hour? Because where I live, definitely not.

13:25

Not in Manhattan at 5:00 PM 5 miles in Manhattan is a day trip. 5 miles in rural Kansas is 5 minutes. So simple distance isn't enough. And this is where you get bonus points. You mentioned the Haverseen formula. The Haverseen formula. I feel like I learned that in trigonometry and immediately deleted it from my brain. It is the formula for calculating the distance between two points on a sphere. Because the Earth is curved, a straight line on a map isn't

13:50

accurate over long distances. Haverseen gives you the as the crow flies distance. But crows don't have to wait at traffic lights. Exactly. So a senior or staff engineer would say this. For the initial filter I will use Haverseen or a post GIS query because it is fast. But to be truly accurate, we would take those top three candidates and call an external distance matrix API like Google Maps or a proprietary routing engine to get the real time driving ET. AI see. So it's a funnel.

14:19

Start with thousands of warehouses. Use the grid to get it down to 50. Use Haverseen to get it down to five. Use the traffic API to get the final list. Perfect. That shows you understand trade-offs between speed and accuracy. OK, so the nearby service has done its job. It returns Warehouse A, Warehouse B and Warehouse C What happens next? Now we have the union problem. Warehouse A might have the chip. Warehouse B has the soda. Warehouse C has the ice cream. But the user needs to see a

14:45

unified menu. They don't care which warehouse it comes from, they just want to know can I get this stuff? Right, so the availability service has to query all three of them. It queries the inventory tables for those specific DCI DS and aggregates the results. We call this Effective availability. It is the sum of what is reachable. But here is where it gets tricky. We talked about scale earlier, 10 million orders a day means maybe 100 million or 500 million page views.

15:11

Or a billion. Querying the main SQL database for every single page load sounds like a recipe for disaster. It is. It is a massive bottleneck. Relational databases like Postgres are amazing at consistency, but they are expensive to scale for massive read throughput. If every person browsing the app hits the hard drive of the database, the server will melt. So we need to offload that traffic. We need caching.

15:35

Caching is our best friend here. We need a layer between the service and the database that stores frequently accessed data in memory. Redis or Memcached are the industry standards for this. Walk us through the strategy here. What exactly are we cashing? We represent the inventory as a key value pair. The key would be a combination of the distribution center I and the item mid. So DC123 item cheat OS and the value the quantity 50. OK, so when I. Open the app. What is the flow?

16:01

We use a pattern called cash aside. Step one the app asks the Redis cash do we have chips at DC 123. Step 2. If the cash says yes 50, we return that immediately. This takes microseconds. We don't touch the database. It's a cache hit, right? Step three. If the cache says I don't know which we call a cache miss, then and only then do we ask the database. And once we get the answer from the database, we. Write it into the cache so the next person doesn't have to ask the database right?

16:31

We populate the cache for future requests. That saves the database from millions of hits. But I see a risk here. What if the cash says there are 10 bags of chips, but someone just bought five of them a split second ago? The database knows the truth, but the cash is old. It's stale. That is the stale data problem. There is a famous quote in computer science. There are only two hard things in distributed systems, naming things and cache and validation.

16:57

So how do we handle it? Do we show the user the wrong number? For browsing for the read path, we accept a little bit of staleness. It is a trade off. It is better to show a slightly outdated number instantly than to make the user wait 3 seconds for the perfect number. So we just set a time limit on the data. Yes, ATTL or time to live. We might set the cache to expire every minute, so at worst the data is 60 seconds old. But wait, if I try to buy it and

17:22

it isn't there, that is bad. That is a bad user experience. It is, but it is better than the system being down. However, we can be smarter. When a purchase happens on the right path, we can actively fix the cash. We blast the cash. We invalidate it as soon as the database is updated with an order. We delete that specific key in Redis. The next time someone asks, the system is forced to go to the database and get the new true number. OK, so we have handled the browsing.

17:51

Millions of people can look at pictures of chips without crashing the system because we're serving 99% of requests from Redis, right? But now comes the moment of truth, the Buy button. The right path. This is where the interview is won or lost. I like to call this the Taylor Swift ticket effect, or maybe the PS-5 launch. You have one item left. You have two users, Alice and Bob. They are neighbors. They're both hungry. It is the classic race condition.

18:14

Let's play this out. Alice and Bob both have the app open. They both refresh their screens at 11.0000 PM. The system tells both of them one left. So far so good. Alice has quick fingers. She taps Buy at 11.000000, point 01. Bob taps Buy at 11, 1.00.01 and 1 millisecond. Yeah, and a naive system. A system designed by a junior engineer who hasn't dealt with concurrency. Here's what happens. Tell me. Alice's request hits the server. The server reads the database. It sees quantity.

18:45

One thinks great one is greater than 0. Proceed. Alice's process enters the checkout phase. Maybe it takes 10 milliseconds to format the order object. And inside those 10 milliseconds, Bob's request arrives. Exactly. Alice hasn't finished yet. She hasn't decremented the number, so the database still says quantity 1. Bob's server thread looks at it and says great, one is greater than 0. Proceed. Alice's thread finishes it subtracts 1. The quantity is now 0.

19:10

She gets a confirmation screen chips are on the way. 2 milliseconds later, Bob's thread finishes. It subtracts 1 from 0. The quantity is now negus 1. And Bob also gets a confirmation screen. He does, and now the warehouse manager is staring at an empty shelf with two orders to fill. And you have a customer service nightmare. This is the double booking. Problem. And to solve it, we need locking. Yeah, we need to stop Bob from even looking at the shelf until Alice is done.

19:34

So we need a bouncer at the door of the database row. Effectively, yes. In an interview you need to discuss the two ways to implement this bouncer, optimistic locking and pessimistic locking. Let's start with optimistic. That sounds hopeful. It is hopeful, optimistic. Locking assumes that conflict is rare. It assumes Alice and Bob aren't usually fighting over the same bag. So it doesn't actually lock anything it. Doesn't actually lock the database row physically, instead

19:59

it uses a version number. Like a time stamp similar. Yeah, imagine every row has a column called version. Right now it is at version 1. Alice reads the row, she notes OK, I see one bag and it is version one. OK, she does her processing. Then she sends a command to the database update quantity TO0B UT only do this if the version is still 1. Ah, I see it is a conditional update, compare and swap. Exactly. If she succeeds, the database changes the version to two.

20:30

Now poor Bob comes in. He also saw version one originally. He tries to save update quantity to 0 if version is 1. The database says no. The database looks at the row and says sorry buddy, the current version is 2. Your condition failed. And Bob gets an error message. Sorry, this item is no longer available. Correct. The database remains accurate, no negative inventory. That sounds perfect. Why wouldn't we just use that? It sounds really fast. Because of the Taylor Swift

20:57

problem. Imagine it is not just Alice and Bob. Imagine it is 1000 people trying to buy that last ticket. OK, with optimistic locking, 1000 people try 1 succeeds, 999 fail. But the database still had to process 1000 requests, check the versions and send back errors. It creates a huge amount of churn and wasted CPU cycles. It's noisy. So for high contention items like the PS-5 launch or lunch rush, optimistic locking might actually overload the database with failures. Right.

21:26

So for high contention systems we prefer pessimistic locking. This is a safe approach. This is the select for update command in Sequel. This locks the door as soon as you walk in. Yes, when Alice's transaction starts, she tells the database I am reading this row and I intend to change it. Do not let anyone else touch this row until I am done. So Bob is just stuck waiting outside. His request just hangs. Bob's database transaction will literally pause.

21:53

It blocks. It waits until Alice commits her transaction. Once she finishes, Bob is allowed in. He reads the row, he sees quantity 0 and his logic immediately tells him out of stock. That sounds safer, but doesn't it slow everything down if Bob is waiting and Charlie is waiting behind Bob? It does.

22:11

It reduces concurrency. It turns parallel traffic into serial traffic 1 by 1. But for an inventory system where accuracy is paramount and the business cost of overselling is high, pessimistic locking is usually the preferred answer in these interviews. So the guarantee is worth the performance hit. For this specific problem, yes, it guarantees that operations happen one at a time in order. Now I have to ask about something else. A lot of people online suggest using Redis for locking.

22:36

Since Redis is so fast, why not put the lock there? Bronze Bronze the Redis lock. You don't like it, I can tell. I hate it for this specific use case and interviewers often hate it too. Here is why a lot of developers love Redis because it is fast. So they say hey let's put a lock in the cache. I will set a key in Redis that says item locked do my work and then delete the key. That sounds faster though. Redis is in memory. It is faster, but it is archintentionally dangerous.

23:08

You're creating a distributed lock. Now you have two sources of truth. You have the lock in Redis and the data in Postgres. What happens if they get out of sync? Disaster. What happens if your server crashes after running to the database but before releasing the Redis lock? The lock stays there forever. Now nobody can buy the item. Or what if Redis crashes? Or what if there is network lag? The lock might expire before the database right is finished, letting someone else in.

23:33

You lose the ACD properties that this SQL database gives you for free. So the lesson is keep the source of truth in one place. Exactly. Don't reinvent the wheel. Use the database's built in locking mechanisms. It is what they are there for. It's battle tested. OK, so we have chosen pessimistic locking. We are safe. No double bookings, but we mentioned earlier that we have 10 million orders a day. Can one single Postgres database handle that? Absolutely not.

24:02

A single monolithic database will choke. We need to scale the right path. This is where we get into partitioning and sharding. Yes, we need to split the data up. And for a delivery app, the strategy is remarkably obvious. Is it? Think about it, Does a user in New York ever order from a warehouse in London? No, that would be a very stale bag of chips. The user in Texas ever order from a warehouse in Tokyo? No, the data is naturally isolated by geography, so we use Geo sharding.

24:30

We split the database into smaller pieces based on the region ID or distribution center. So Shard A handles all transactions for New York, Shard B handles California. Exactly. This allows us to scale linearly. If we get more users in Texas, we add a Texas Shard. The logic remains the same, but the load is distributed across different physical servers. New York traffic never touches the California database. That is elegant, but what about the read path? Do we Shard that too?

24:54

We can, but typically for reads we use read replicas. OK, what are those? We have one master database for each Shard that handles the rights, the locking, the ordering. Then we have multiple slave copies that just mirror the data. The availability service, the one checking of chips are in stock reads from the copies. But there is a delay, right? The copy isn't updated instantly.

25:16

That is called replica lag. It might take a few milliseconds, maybe even a full second in some cases, for the master to update the slave. So we are back to the stale data problem. I buy the chips on the master but the slave still says in stock for a split second. Yes. And again, for availability checks, we accept that eventual consistency. But here is a pro tip for the interview. You can use something called sticky sessions. Sticky sessions. Imagine you just bought an item.

25:44

For the next minute, the system pins your user ID to the master database. So for that specific user, we skip the replica and go straight to the source of truth. Exactly. Yeah. That way if you refresh the page immediately after buying, you see the accurate data because you are looking at the master. Everyone else browsing can still use the replicas. That is a clever optimization. It keeps the user experience smooth without over loading the master for everyone.

26:08

It balances the load. It shows you're thinking about the user's perception. So let's zoom out and look at the whole architecture we have built. It is quite a beast. It is. Let's trace a user journey from start to finish. Do it #1 user login, they open the app. The nearby service uses a geospatial index to find the three closest. Warehouses. Step 2. Browsing. The availability service takes those three warehouse ID's.

26:33

It checks the Redis cache first. If data is missing, it hits the read replicas of the database shards. It sums up the inventory and shows chips available. Step 3. The moment of truth buying. The user taps buy, The order service takes over. It identifies the correct database Shard based on location. It opens a transaction. It acquires A pessimistic row lock on the inventory items. It checks stock. If good, it decrements stock, creates an order record, and commits.

27:03

And finally clean up. The system fires an event to invalidate the Redis cache for those items, so the next browsing user sees the new numbers. That is a robust system. It handles the scale, it handles the concurrency, and it keeps the data safe. It is a passing answer. This passing that felt pretty comprehensive. Well, in an interview they grade you on levels. What we just described is a solid senior engineer answer. What does a junior answer look like? A junior or mid level engineer

27:30

will usually get the API right. They will say we need a database table for items, but they might forget about the race conditions, the double booking. They missed the concurrency. They missed the concurrency, or they might suggest a naive distance calculation that kills the CPU. They focus on features, not failure modes. And what about the staff or principal engineer? How do you blow the interviewer away? The staff engineer looks at the edge cases and the business

27:54

logic. They talk about the union of inventory, how to handle partial orders, where the chips come from warehouse A and the soda comes from warehouse B. Oh, that's a whole new problem. Do we split the shipment? Do we delay it? That gets into the logistics complexity. They might also discuss advanced locking strategies based on item popularity.

28:15

What do you mean? They might say, maybe we use optimistic locking for toothpaste because no one fights over toothpaste, but pessimistic locking for the hot new video game, they nuance the architecture. They show they're not just applying a single rule to everything. Right. And they might also bring up the academic research. A staff engineer might mention the Calvin protocol, for instance. Calvin, I remember that from one of the papers we looked at. Can you summarize that for us?

28:41

Calvin is a deterministic ordering system for distributed transactions. It's a different way to think about concurrency. It basically eliminates the need for complex commit protocols like 2 phase commit by agreeing on the order of transactions before they are even executed. O it's a scheduler. A very smart 1A staff engineer might say if we really wanted to scale globally without some of the headaches of traditional sharding, we could look at a deterministic scheduler like Calvin.

29:10

It shows they know the cutting edge research, even if they stick to Postgres for the practical interview answer. That is the extra credit knowledge. It shows you aren't just a coder, you are a student of the craft. Exactly. Before we wrap up, I want to pivot to a thought that came up in our research. We have spent this whole time designing a system to be perfectly accurate. We don't want to oversell a single bag of chips. Right. Strong consistency.

29:33

We've been hammering that .0 But does the business actually care? That is the final provocative question. In the real world, business logic often overrides system design. What do you mean by that? Think about it. What is the cost of a database lock? It slows down the system. It limits how many orders you can take per second. It adds friction. Now what is the cost of overselling one bag of ships? Well, you have to e-mail the customer, Say sorry, refund them their $3, and maybe give them a

30:01

$5 coupon for their next. Order exactly for a bag of chips. It might actually be more profitable to run a loose optimistic system, process orders as fast as possible, and just apologize to the point 1% of people you oversell to. The cost of the apology might be lower than the cost of the walk. So we build a Ferrari, but maybe the business just needed a delivery van. It depends on the item. For Taylor Swift tickets you need the Ferrari. You cannot oversell a seat.

30:28

The stadium has a physical limit for chips. The delivery again might be better. True seniority is knowing when to break the rules we just designed. That is a great perspective. It isn't just about code, it is about the product and the business context. Always. Well, we have built the back end of a logistics giant in under an hour. We have covered scope, data modelling, caching, locking and sharding.

30:49

It is a lot to take in, but if you structure your answer like this, scope, model, read, write, scale, you will ace that interview. Good luck to everyone out there preparing. Go build it. Thanks for listening to the deep dive. See you next time.

Transcript source: Provided by creator in RSS feed: download file

Design Local Delivery Service - GoPuff - System Design Interview Series

Episode description

Transcript