Fast Infrastructure

⁠¶ Intro / Opening

00:13

If you enjoy this content and want to support it, go to makeitwork.tv, join as a member, and watch the full conversation as a 4K movie. You can stream it straight from the CDN or from the Jellyfin media server.

⁠¶ Weekend projects

00:33

It was summer, 5pm on a Saturday, and I sent the following email to support at namespace.so. Hi, I would like to debug a GitHub actions workflow locally. Is it possible to run the namespace managed Ubuntu container in Docker. And 12 minutes later I received a reply. Hi Gerhard, unfortunately we don't have that possibility yet, but it is something that we are working

00:59

on. What we often suggest to folks who want to debug image related issues is to rely on breakpoint action, which allows you to stop the execution of a workflow for debugging purposes. So where does for debugging purposes. So where does replying to customer support requests on a weekend fit in your CEO role? It's a great question and thanks for actually reaching out because we love working with developers and I think that just boils down to that.

01:29

We care so much about offering great support and we are engineers, developers ourselves and many of these starting points of projects also started at the weekend. In fact, that's how Namespace itself started. It was a weekend project.

01:46

So I have a lot of kind of a connection with that and balancing it out with regular life, but whenever we see a request coming in that could benefit from being unblocked, we try to do that very quickly because we care deeply about offering great support and kind of from an engineer to an engineer. And that goes to everyone in the company. It happened to be me replying that time, but it could have been

02:12

someone else in the team as well. That's something that we try to embed as much as possible in our company culture. As first experiences go, that was a great one. So thank you very much. And it set the tone, I have to say, and to this day, all my interactions with namespace have been like this. Whenever there's a problem, I am confident there's someone on the other end and I will get the help that I need. Oftentimes something that I didn't know. So always there's

02:40

something to learn. I know about the breakpoint action, for example. This is a very useful one. And since then, obviously, I've learned about the NSC CLI and a couple of other things, but it's all there and we all meet as people and we're all passionate about this thing because who else, or how else could you get this sort of interaction 5pm on a Saturday in the middle of the summer when maybe you would be out and about and doing things.

03:06

You know, I don't remember anymore, but perhaps I was out and about. It's possible, yes, you're on your phone somewhere. And the request came in and you're the first one to pick it. Okay, so I know

⁠¶ Love for all things infrastructure

03:17

that you have a deep love for all things infrastructure. And this is something that I've learned over the months that we've been in contact and have two related questions. How did this love for infrastructure start and how does it translate to your day to day? I've always been fascinated by how things work and it's very hard to put my finger on when did that start. But I think it started early. I think the earliest memory that I have was in some very distant early

03:58

years. Christmas, I got a present, a remote-controlled car. And I'm old, so it's like very rickety, early stages, remote-controlled car. And one of the first things that I did was open it up and see how it was inside. So I think it's just something wired in my head that I have just some curiosity to how things work. And then over time, also how complex things work and how they are the sum of simple things composed together to work in concert.

04:38

And then over time, not just technology, but also people. They also have their own sets of complexities and they're also systems at work. So I think it came down from just a natural curiosity. I got involved with technology a little bit accidentally. Maybe another, Actually, another interesting story. I had my tonsils removed very

05:08

early on. I was four or five years old and I have this distinct memory of turning to the side and seeing a screen which was probably plotting either my heart rate or plotting something. And I started asking about that screen because it's just in the haze of going under for surgery. It probably was the thing that came to mind as a small kid. And the nurse said, "Don't worry, just calm down and we'll show you everything about this screen afterwards."

05:48

And they never did, but that was my first connection with computers and the idea of screens and things that show up on the screen. And a few years later, started at a time where you would still buy magazines that had printouts of code. And then it's how I got introduced into the idea that you can actually program these machines. And then later on, I was lucky enough And then later on, I was lucky enough when I was 12, I got my first

06:27

computer. But a couple years before that, I had access to a school before that, I had access to a school where there was a computer I could use. So I started kind of playing around by myself. But then when I got my first computer, you start to explore, navigate, you start to explore, navigate, eventually the internet becomes a thing. My first connection to the internet was actually with a dial-up modem. actually with a dial-up modem. But where I lived, you didn't

06:51

have an RJ11 plug. So we had like older have an RJ11 plug. So we had like older plugs with three prongs. I actually don't even know what kind of plug it was. And I was 14, and I got the modem for Christmas. And it came with this RJ11 on one side. I really want to use this, but I don't have anywhere to plug it. And I thought, "Well, there must just be electricity." So I unplugged this old-school plug from the wall at my place. And I kind of tear apart RJ11.

07:27

And I start trying different combinations of cables, which probably I shouldn't have. But eventually, I got a dial tone. And this magical (mimicks dial up tone) of the modem starting to dial out, which I had heard before. And I was like, "Wow, this is the beginning of something." And the internet... Yeah, that was the thing. Did it ever happen for you to receive a phone call while you were messing with wires? Well, not messing with the wires, because...

07:59

It's that moment when you're plugging the wires in, because that was my moment when I realized I shouldn't be doing that. I did exactly the same thing. And there was a phone call coming in, so you get a little bit of a shock. Not too much. But I wasn't much older than you. And I tried the same thing. And I remember, "Okay, so that's why you don't mess with wires because they're live." And, yeah, I mean, it's not like the voltage is very low. I forget

08:22

exactly how much it is. It's enough to feel it, to feel the phone call. But that was my moment when I... Same approach. Let's figure this thing out. Let's wire them together. And at the same time as I was wiring them, there was a phone call coming in. So it got a bit of a shock. But nothing happened apart from that. You were shocked twice. I was shocked twice, yes. Once for real? Wow, okay, I shouldn't be doing this. So, yeah, it was no good deal.

08:49

I wasn't lucky enough to be shocked. But it was very common that either my mom would want to dial out or someone would be dialing in and it would interfere with a connection. And that was definitely a lot of drama around the fact that you cannot really utilize the line. The beginning was two phone lines. So, I had one friend that had two phone lines for this very purpose. I was like, "Oh, wow, he is living the dream." Two phone lines. One for internet and one

09:25

for like, you know, regular phone. Yeah. And I had a friend that had ISDN at home. Oh, that was just... He was rich. He was one of the rich kids. I can tell. That's like he lived in a part of the city where people couldn't afford ISDN. Now to this day, we were talking about this yesterday, your connection is unheard of. I think even for most people, like what they have at home, can you tell us a little bit about it, about the connection that you have currently?

09:57

So I live in Switzerland. And there's

⁠¶ Hugo's 25 gigabit home internet connection

09:58

this fantastic ISP here called Init7 And they don't pay me to say this. They're really fantastic. So, they actually started many years back when I moved here 12 years ago. moved here 12 years ago. When I moved here, I already had one gigabit symmetric. So, up and down. And fiber to the home. But nowadays, they have 25 gigabit symmetric to the home. They're nerds as well. And it's a great

10:28

company for other nerds. Obviously, I don't utilize the full 25 gigabit per second because it's kind of unearthing more so than anything else. In the office, we also have Init7. And we do have 25. And there, sometimes we do exercise the full 25. But it's great. It's great. Cannot complain. That's amazing. I can show you a quick demo if you want. Yes, please. Let's see it. We have a Chrome window here. Let's go to fast.com. Oh, wow. That's not real. It's a bit slow. Yeah, it is a bit

11:04

slow. No, that can't be right. Can you try speedtest.net? I can't believe fast.com Oh, wow. 3.5 gigabits per second. Yeah. Oh, wow. So, a couple of things are happening here. So, this is a I'm on my Mac Studio. It has a 10 gig ethernet connection. Then it a 10 gig ethernet connection. Then it goes over to an ethernet 10 gig switch as well. And then I go to our router that has a 25 gig port. But I actually, because I've done a few changes, I used to have so a part of my infrastructure at home is

11:50

fiber. I set up by me and I used to have it like I had a couple of racks downstairs and I had it kind of connected down with fiber. And I think I damaged one of the fibers. So, I think there is some loss. I haven't measured it because this used to be I used to be able to go to eight on my Mac. But so, I think there's actually a constraint now that the signal is not as good. And I haven't checked this. Yeah, it's too slow. Right. 3.5 gigs

12:20

is too slow. I love that. Like only a nerd would say that like, Hey, I'm like pushing almost four gigs per second, both up and down, but it's too slow. This could go faster. That's amazing. Sometimes you want to upload something and it's... Right. well, I think the problem that you will see with this, and I'm sure you have hit it a couple of times, the whatever you're wherever you're uploading to, sometimes they can't accept more than one

12:47

gigabit per second. So, sometimes they're limited, you know, on their end, because they don't expect users to have this type of setup. But that's very nice. Very, very nice. Okay. Where I really have seen so nowadays, I don't have for many years, I haven't really played any games. many years ago, I used to play quite a bit of Blizzard games, so World of Warcraft, Starcraft, and they have an installer that internally uses, it might even be BitTorrent, but something like BitTorrent, at least.

13:18

So you can really get like multi stream. Yeah. And that's just incredible. Like you can easily use your whole link because you just be able to pull from multiple sources. So for things like that, it's, it's really you can really tell a difference.

⁠¶ How does this love for infrastructure translate to Namespace.so?

13:33

So how does like all this love that you have for infrastructure for networks for, you know, fast things translate to namespace? We first and foremost, build something for ourselves. Well, the origin of Namespace Labs, and the name is that we were going to build an infrastructure company that focuses on software defined storage, because it was kind of a big thing that both me and another person that is not here, HDR,

14:04

have a passion for. But as we were building it out, we kind of found a few challenges along the way. And then we moved over to build an application platform. And as we were doing that, we wanted to run a lot of tests in parallel very quickly, because we didn't want to wait minutes for an EKS cluster to be created, or even a GKE cluster to be to be created a Kubernetes cluster.

14:31

I put together something that kind of cut through all the layers and just focus on the essential to start a Kubernetes cluster really, really quickly, because we wanted to run many of them in full isolation to test foundation to test this application platform. So

14:48

that was the genesis. And it was really for us, because we are developers ourselves, actually, majority of the company is engineers, and we have an appreciation for infrastructure that works well, that is understandable, and it's that it's fast. So that's something that we try to project into the products that we that we build. And many of the things that we do at Namespace, one of our product principles

15:17

is fast is a feature. So we try to spend quite a bit of energy on making things as fast as possible. Yeah. When you say Kubernetes clusters

⁠¶ What does it mean for a Kubernetes cluster to spin up fast?

15:30

that spin up fast, or very fast, what does that mean to you, very fast? It had to be seconds, like that's what made sense. But it wasn't just a bullish, a we need to, this should be seconds, but it came from the source of, can we know

15:47

how things work? So we know how long Linux takes to boot up like the kernel to start, we know how long it takes for to scan devices, we know how long it takes to mount a file system, we know how long it takes to start a process we know, you know, if you kind of add all of those things up, you get to a point where where you start questioning, why does it take minutes? And so there's kind of inefficiencies in the system.

16:14

And even today in like the Kubernetes API server is fantastic, but it has a few things built in that are not kind of level triggered. So there's kind of waiting periods, even we wanted to make it faster. But even to make it even further fast, we would have to go and change the implementation. So it really came down to how fast we think this should be. And I did some kind of back of the envelope kind of calculations.

16:44

And I said it shouldn't take more than 10 seconds to start a single node Kubernetes cluster. So that was the starting point. Creating Kubernetes clusters from scratch, fully isolated, so not, you know, a pod running in another Kubernetes cluster, but rather like a virtual machine where you have access to, to the kernel, you, your own kernel, you can, you have your own Rootfs, so you can decide what gets

17:15

packaged into it. And then that allowed us to start kind of running more tests and both faster, but the main thing was the fan out, we wanted to run many in parallel. Yeah. So just to have a better understanding of the scale that we're talking about, and I'm just looking for a magnitude, are we talking thousands of Kubernetes clusters? Are we talking 10s of 1000s, hundreds of 1000s?

17:40

Like, how much are we talking about in like, what period of time as well, just for listeners to have an appreciation of the scale that this operates at? Thinking about Namespace, we do many millions of runs over a short period of time. So that's kind of the scale that we're operating. And every single instance is fully unique. So it's completely new virtual machine. Everything gets started from from scratch, the network gets programmed dynamically for that instance, there's like

18:11

everything is from scratch. But when we started, like our target was to run like 100 Kubernetes clusters in parallel. So you can see that we the humbling starts and now we have customers that now we have customers that start a high magnitude of concurrent jobs. And that's what that's even one of our biggest challenges nowadays is supporting that type of performance. So very low latency creation at a very high

18:42

concurrency, we have tenants. So that's kind of the the unit in our in our system that create 1000s of jobs in an extremely small period of time. And those run over many, many, many machines. But even today, if you go to a Kubernetes cluster, and you start the 1000 pods, which is kind of the quick, if you define some some kind of equivalence with another system, you'll see how long it takes for those posts to be created.

19:12

Because first, you need to commit the state, you need to kind of the scheduler needs to design in which machine they're going to run, then the machine needs, you need to have IPAM. So you need to have like an IP address assigned to the pod. So there's kind of many things that need to

19:28

happen. And as you scale out the concurrency, you hit serialization limits, because some of these they need to be, you need to have like a consistent view of the universe to be able to make a decision, like you cannot assign the same IP address to two pods. So you need to have some sort of serialization. Yeah. And so that's kind of the types of challenges that we're tackling today.

19:50

Because when you have a little bit of an aside, but when you have a natural partitioning scheme, like two customers, for example, scaling across customers is a little bit easier, because you can partition your infrastructure. But when you go inside one customer, then things start to become a little bit more challenging. And that's those types of kind of scaling challenges that we have today. Yeah, especially the big customers that you mentioned that start a lot of jobs at once. And a job,

⁠¶ What does a job mean in infrastructure terms?

20:25

what does a job mean? Like, what does it translate to in infrastructure terms? Are we talking containers, virtual machines, how many CPUs, how much memory, like, what does the job look like? The unit of compute in our world is an instance. But that instance is a combination of a virtual machine. So you get full access to the kernel and everything in that virtual machine. But it's an in that virtual machine. But it's an environment that is designed to run

20:54

containers. So it's not you don't get an Ubuntu virtual machine, and then you go and you know, deploy Systemd units, that's that's not how we think about the problem. We approach it from we use containers as a distribution mechanism. So you define your application, whatever you want to run, you encapsulate that in a container because it has all of the software that you need. It also tells us how to start it as a few other us how to start it has a few other properties. And and we place that

21:24

properties. And and we place that container or multiple containers in a virtual machine that can use in a virtual machine that can use an arbitrary set of resources. So you can decide whether it uses two or 16 CPUs, or whether it uses, you know, two or 256 gigs of RAM. So you have like full flexibility on that. And then also

21:48

full flexibility on that. And then also from a network perspective, like if you want to interact with whatever is running in that in that instance, you get a few management properties out of the box, like you can SSH in, you don't need to configure anything, But if you want to access the service that you have, then you also have primitives for that too kind of program or ingress. kind of program or ingress. We say jobs, because we kind of approach the problem in a layered way.

22:17

We think of the compute platform, which is a little bit more generic, as one thing, and then applications built on top as separate as something separate. And a lot of our customers, they use Namespace to run jobs. And so those jobs are usually something that starts, has a purpose, wants to go really fast, that's usually the case. And then it ends. And it could be a GitHub job, it could be a build guide job, it could be a GitLab job, it could be a CircleCI job job. But it can also

22:50

job. But it can also be your custom job, like that you want to be your custom job, like that you want to run a system test. So for example, we have customers that deploy system tests on instances. And they can rely on something that scales out without being constrained by whatever resources that they have available in the job where they where they started. I think

⁠¶ Let's talk about your last major outage

23:13

how people deal with adversity is very telling. When something fails, especially when it fails, how do you handle that tells everything about you at many levels, as an individual, as a team, as a company. And the reason why I say that is because I know that you had the major outage this year. And it was one of the things that you don't expect will happen. You prepare for it. And when it happens, you're like, wow, I'm so glad we had some preparations. But it's very difficult to

23:48

simulate that. It's very difficult to fire drill that. It's really, really hard. So can you tell us more about what happened? And how did you respond? We've been running Namespace now for some time, so for close to two years. And we've had our challenges along the way. But nothing as big as this more recent outage. A couple interesting things there. Like, we had two issues that happened at the same time.

24:24

And I'm lucky that our team is experienced, and we've operated and kind experienced, and we've operated and kind of supported and built large scale systems over our years before Namespace. So that gives us a little bit of preparation, like how can things fail? That's very often when we approach building something. It's not just a functionality that it has, but something that is part of our something that is part of our conversation is what are the failure

24:49

modes? What if you have an application, it's stateless, but it pushes some state into some database? Well, what happens if you don't have access to that database? What happens if you have multiple requests going concurrently? And you compromise on your serializability of your transactions? Like, how does your application react to potential inconsistent states that you had to do for other reasons? So we try to incorporate as much as possible, like a failure mode into how we approach

25:21

features. This big outage that we had, it was kind of a combination of two things. Namespace, when we started, and we used exclusively hardware provided by others. exclusively hardware provided by others. So actually, we started with bare metal in AWS, with bare metal in AWS, and then we switched over to Equinix metal, or packet. And then metal, or packet. And then metal, or packet. And then we kind of worked with other

25:45

providers over time. And fairly early on, it became obvious to us that in order to offer a great product that had an emphasis on performance, we had to have a lot more control over the hardware, not just individual servers, but also the layout of the rack. So how much network capacity there is? Do we know that one compute node is next to another compute node? Is it in the same switch or not? So all of those things started to play a role in how we approached play a role in how we approached

26:19

our development. And we couldn't find a good mix that would give us both the global reach that we needed, because we have some customers that want to run workloads in North America. We have customers that run workloads in Europe. And we realized, well, we have to do it ourselves. So, Namespace deploys its own hardware, and software stack on its own hardware, and software stack on

26:42

top of that hardware. So that means we decide everything from CPU, RAM, how much storage, how much networking, what's the layout of the rack, how do our racks kind of the spine leaf setup, how that is, so all of that is done internally. And we set ourselves in a journey to kind of move completely to our own hardware. And we've been on a catch up for some time.

27:09

We had in October, a major expansion of one of our sites coming in, where the distributor that we work with, they made a mistake in their order, and they ordered the wrong DIMMs for those servers. And it's a lot of DIMMs. It's not just, you know, 20 DIMMs or 30 DIMMs that you can go to a shop and get. It's actually so many that they had to go and order directly from the source. And that added three weeks more to that delivery. We were counting on that hardware, because we knew that

27:51

we were already running quite hot. So quite hot, as in like our utilization is high. So part of the reason why we were okay with that is because we have tools that allow us to manage utilization across sites. We can, we can run in continuous optimizations where we try to maintain each site kind of hot enough, but not more than that. So we can more than that. So we can kind of move things around. but globally, because of that missed

28:22

delivery, we were running quite hot. At the same time, one of our existing deployments in a company that offers kind of bare metal that we used, they started having an issue in their network product, which we use to connect multiple servers together into a single layer two segment, where it led to sporadic packet loss. And at first, while the internet is built on sporadic packet loss, so things just kind

28:56

of work. But as that became worse over time, it was so bad that it had a real impact into our customers. And we interacted with that customers. And we interacted with that vendor and for various reasons, they kind of acknowledged, but they didn't react quickly enough to the problem. So we decided that that wasn't acceptable, the level of service that we're offering to our customers, the fact that we were a source of flakes, because of that kind of random packet

29:30

loss, it was not acceptable. So we strategized and we made a decision on changing our network setup so that we wouldn't depend on that particular feature. That meant though, that we had a dip in our capacity, because we had to redeploy that part of our infrastructure that it has to do with the fact that we run an immutable, we try to be very immutable. So as machines move from one setup to another setup, they need to be

30:01

reset up to get new keys. So there's kind of something else that kind of plays a role there. And that took some time, we had practiced that, but it took longer

30:14

than we anticipated. One of the challenges was we rely a lot on state that lives on individual machines and not on the networks to enable fast performance, networks to enable fast performance, bootups, etc. And distributing that state, because we had a much bigger fleet versus what we had done before, for that particular region took longer than we expected. So it's kind of distributing all of the state across all machines, it highlighted a few bottlenecks that we had. And that build up took some

30:43

time. So we were running really, really hot for some time, we had part of the team just trying to support our customers, making decisions on, okay, we're going to move this customer to this part, because now it's actually their peak time. And we want to time. And we want to make sure that they get as good of

31:04

experience as possible. So there was kind of part of the team that was just trying to offer as good of support to our customers as possible, where the other part of the team was just kind of rebuilding the region. And we did it, but it was extremely taxing. It's primarily because we feel such a strong commitment to the services that we offer, because then we've experienced, there's something that we depend on as a developer, and then it's not working. It's just the

31:37

worst, right? I cannot do my job. So I think emotionally, it was extremely taxing. I look at it a lot from the human side, like you're trying to do something that is, you're trying to do a great job to your, you're trying to provide a great service to our customers, but then we let them down in that particular moment. And we tried to be transparent about it. We wrote a postmortem. The things that I mentioned and more are there. We learned a lot in

32:12

that experience. And to be honest with you, we were expecting that some customers would come to us and say that this is unacceptable and we're moving on. But not a single customer left due to that outage. And we actually got a lot of support, and I have a big appreciation for our customers. I met one of our customers in San Francisco a few weeks after the outage, and they said, "Yeah, it was 3

32:42

p.m." And we decided, "Okay, we're going to call it a day now, because it seems like our jobs are not running. But we're so happy with the service that you folks usually provide to us that it was one day, and that's okay." But yeah, it felt very bad. It wasn't a complete outage, right? It was a degradation, a significant degradation, but not every customer was impacted. So this was limited to one region. That was the blast radius, and you have multiple

33:12

regions. So that's one. The second one is that not all customers were impacted the same amount, right? Because as this was happening, you're also moving customers off, which I, you know, that's something which I missed. And I will go back to the post-mortem, by the way, I will add a link to the show notes The way you handled something failing, and something failing in a very

33:35

significant way, right? The whole region going away, or being unusable You were able to be hands-on, you understood how all the pieces fit together, which means that you were able to do something about it, rather than putting your hands up in the air and saying, "Hey, it's the provider, we can't do anything about that." Think about what happens when you're in AWS, or GCP, or Azure, or, you know, one of those big providers, what can you do? And you say, "Well, I'm going to move things off it."

34:03

You may have so much stuff to move off that you can't move it off, not to mention that if there's an outage, how are you going to move stuff off? Especially if the DNS is there, you can't get to the DNS, you can't update it. And this happens, you know, for many companies, and many, many businesses. And at the end of the day, we are humans. The internet does go down, or at least half of

34:24

it. I remember when Fastly went down, they had an outage, or CloudFlare went down, or Facebook starts, you know, the BGP routes get all messed up. Now that's bad. But in all of this, there's always something to learn. There's always something to improve. And the best approach is to get better on the other side. You mentioned something that I think is really important that there's a particular company that we work with, and they played, like one of their products wasn't

34:53

working to spec. But from my perspective, that's on us. We decide who are the companies that we work with. And I don't even want to throw them under the bus. Like they I think, perhaps other customers didn't have the same choice. Perhaps it was the way that we were using their infrastructure that led to that. And it's really on us. And obviously, when we work really on us. And obviously, when we work with someone, and they can provide us great support that helps us get the resolution faster.

35:19

Great. But in that case, it was I felt, and the whole team felt like a commitment to our customers. And it doesn't matter if it's, if it was like a delayed delivery, or if it was a particular upstream, or if it's a particular provider, that's really on us. And that's we felt that, okay, we need to do something, we're not just going to say, you know, this is not usable. We need to do something to get back to service to our customers. I think a lot of props should go to the

35:51

team. Obviously, I kind of team. But I think some of these, the our ability to handle some of these situations is, if you find yourself in a situation and it's new, and you're not prepared, it's going to be much harder. So the more you prepare both, hey, this disaster scenario is

36:09

possible. And maybe even just maybe you don't even run an exercise, maybe you just talk about a here's what we would do so that you just have a shared understanding of what are the tools that would be available to us if something like that happens. I think that's already the first level of preparation. And then it goes to the architecture. As well, we try to to the architecture. As well, we try to present a global service to you so that you don't have to think about regions and capacity and all

36:42

of that. But behind the scenes, there is partitioning, both for performance reasons, but also for reliability reasons. And that design principle also allowed us to continue to serve our customers, even at a very degraded state while recovery was going on. Because from their perspective, things got slower, because we just didn't have enough capacity for all of their jobs. So how does

⁠¶ What does Namespace.so look in practice?

37:15

Namespace come together for a user like myself, or for regular users, I'm very keen to basically see what it means when we use Namespace on the command line. How does it compare with the local stuff that you may have running locally? Because that's also like, sure, run things locally. But not everyone has 25 gigabits at home. But even then, when you do you want the we want the resilience. So I'm wondering if there's something that we can screen

37:43

share. There's something we can look at just to see how this comes together in practice. Yeah, I'm happy to show you a couple things. I would preface though with we're very pragmatic. There are certain we're very pragmatic. There are certain parts of your developer workflow or parts of your developer workflow or certain things that you want to do where running them locally will always be kind of the right thing.

38:09

We really like to think about what's the right tool for a particular right tool for a particular job, where we try to excel at is scale out. So you can get things running really well in your machine. But now you want to run 1000 of them. And we could try to find 1000 machines to run them at home. But you just probably

38:31

that's not kind of the best, right. So we try to apply things where we can provide like an where we can provide like an non trivial amount of value where it kind of makes sense to kind of of makes sense to kind of move over from whatever else that you're doing, whether it's local development or something else. Then Namespace, there's, there's different ways to approach the product. So we're both an infrastructure provider. So that's where the nsc CLI comes in. But we're also a service

38:58

provider. And I would say actually, majority of our customers, they use our prepackaged solutions. So they, they want to do Docker builds, they want their Docker builds to be as fast as possible. So we have kind of a prepackaged Docker build product, they want to run a Kubernetes cluster really fast or 100 or 1000 or 10,000 of them, we have a prepackaged

39:21

product for that. They want their CI runs to go really fast, or in a very cost effective way, actually, we start hearing a lot more around kind of cost management. So we have products for that as well. We'll focus today a bit more on the infrastructure. So kind of under the covers. But if you're if you want your CI to go fast, you don't actually have to run nsc CLI, like there's products that make that super easy for you.

⁠¶ Namespace Foundation - Open-source Kubernetes app platform

39:51

I was thinking of starting at the origin So, things started in kind of building this application framework. And this application framework. And this application framework, we we leaned on on something framework, we leaned on something akin to what we had built at Google, which is a platform called BOQ, where it tackles how you write services, how do services talk with each other? How do you build services? How you test services? How do you deploy services? How you observe services in production?

40:25

To see how Namespace tests Foundation, the open source application platform inspired by Google's Boq, find the YouTube video link in the show notes. After Hugo's demo, we look into how a remote Docker build can be faster than a local one. That is a separate YouTube video, link in the show notes. OK, let's start wrapping this episode up. We see more and more use cases around

⁠¶ Complex preview scenarios

40:57

kind of complex scenarios with previews. This is an area that has also previews. This is an area that has also been a pain point for us. And we want to do better. Another thing is instances right now they're fully isolated. So two instances, they don't share any networking. But we... We have a POC internally that uses tailscale where you can connect multiple instances. But we're also thinking of just adding a tagged mode where you tag an

41:28

instance with kind of a network. And then instances that are tagged the same in the same network, they can communicate with each other. So you could have like a front end that calls a back end. Our goal is not to kind of cover all of the possible, you know, compute use cases, but just things that are helpful and ideally kind of easy to use to achieve what you want to achieve.

41:55

Typically creating a preview, you can go all the way to a pass, like go to a solution that just packages everything and then you have very little flexibility, or you can go, "Okay, I need to do everything from scratch." And we try to be somewhere in the middle where you have kind of building blocks that are helpful but you still can make it your own. You can still decide what goes inside of my container? Is it multiple containers?

42:24

Whether I want authentication, I want authentication So all of that, it's kind of more our mental model, kind of our design principle of being somewhere in the middle with not fully packaged but also not completely done from scratch.

⁠¶ One last thought

42:37

As we prepare to wrap up, one last thought, one last takeaway from our conversation for people that stuck with us to the end. What would you like them to take away from our conversation? I was asked recently, how does one become good at something? And I've worked with so many engineers that are extremely good. And I've been looking for patterns, like what are the things that are common across these engineers?

43:10

And I find that it's usually some kind of unrelenting curiosity that really propels people beyond just being good to being excellent. And I think that kind of comes back to when we approach how we build our products is with that same level of unrelenting curiosity and willingness to break through and change things that may help us build a better product.

43:45

And I think having that courage has been helpful for us, courage has been helpful for us, but when we bring people in, try to but when we bring people in, try to instill that same spirit of just go deep, read the code, try different things, see how it works, that just really helps propelling us to just do better. Well, on that note, thank you very much for joining us today, Hugo. I look forward to all the improvements you'll be driving in Namespace. I think you're on to something here.

44:15

I really like the speed, I really like the simplicity in many ways, and I know that behind it, there's a lot of complexity that you need to handle to make things this simple and this fast. Thank you very much. Thank you. And I look forward to the next one. It was a pleasure to be here. Thank you.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript