Docker and Container Optimization: Strategies for Efficient DevOps - DevOps 205 | Adventures in DevOps podcast

00:05

Okay, what's going on? Everybody? Welcome to another episode of Adventures in DevOps. I'm your host, Will Button and back after attending it was the DevOps conference in Zurich last week, right Warren? Yeah for sure. And you know, I actually listened to a little bit of last episode and I felt like you introduced me even though I wasn't there. So I feel like I have a free you know, I don't need to introduce myself this episode.

00:32

I will say that I owe everyone an interesting fact. And this time when we were at the conference, we specifically went out and hosted an open space where I talked with a group of experienced develops practitioners and we were really trying to define what a platform even is. I think this word is thrown away around everywhere, and after an hour we really came to a great conclusion. It's something horrioral and that's I think there's an aspect here where it doesn't

01:06

all doesn't offer value to the end user. I think is an important thing. So one definition we had was everything other than the product, which is interesting to think that the platforms we build aren't aren't value add but I, you know, I there's some interesting and deep dive there. But yeah, no, it was great. It was great. Confidence. I'm back. I'm happy to be back though, right, I'm cool. Happy to have

01:29

you back. And then also joining us in the studio the co founder and CEO of Kurtosis, Galen Marquetti. Galen, welcome, thank you, thank you for having me, and hello everybody. Yeah, I'm excited to have you here. You know, we've talked a little bit in the past, and then we've also I've also been using critosis a little bit and it's very basic level. But tell us just give us like the high level overview of Kurtosis, and then we're going to dig into what led us to this point

02:02

in life. So the highest level overview is that Krotosis makes it easy to get development environments and testing environments set up for back end systems. So I think something like doctor composed or helm charts with more parameterizability, more flexibility, and some things that give you greater portability guarantees. Gotcha. So when we're talking about dev environments, as a developer working on a product, I need to test the code that I'm writing, but that means I need to run

02:37

the application. But then I also need all the supporting micro services, and then those are completely worthless without the data that gives me some kind of real world or pseudo real world scenario to test against. Is that right? That's about right, And I think a good way to think about the pain that kind of motivated the whole thing is that part about running other people's code. We're running your own code. Usually people have a handle on that one.

03:04

It's when you need six or seven other micro services plus the data. Now you have to dig into how do I set these things up? And a lot of the time that I get stuck at that phase for an in order amount of time, and that's kind of the heart of what we're doing. Yeah, for sure. I can think back of different scenarios where I've addressed this in the past, and some of the approaches I've used are the you

03:30

know, like the seven thousand line Bash script. There's the golden machine or the golden VM that no one's allowed to touch because it has everything on it. Right at the moment, there's trying to bribe your security team to let you connect to the staging environment across the firewall. So those are some of

03:53

the approaches I've used in the past. Yeah, it's exactly right. It's exactly right, and it's what a lot of the times people get a good system set up for kind of a golden path they do all the time, but then there's this like long tail of things that you always do a long tail thing maybe once a week, but it's a different long tail thing, and it's that set of issues and security and you have to touch the golden and VM for it, it break it, and it just adds up for

04:24

sure. Orren, what's what's your approach in solving this problem? Oh, well, you know, I think we only have like an hour on this. There's something here where if you have really high separation, low coupling between your individual micro services, you can just spin up the one that you've got on your machine. We use local stack for our infrastructure, but I think that's a very extreme perspective and it takes a while to really get there,

04:57

and we focus on having that at the beginning. And if you come from a monolith world or one where the architecture wasn't really figured out, there's this sort of dead zone in the middle where your services are still unfortunately coupled in some way and you really can't get the value out of them without having some of the ancillary pieces also spun up and working at the same time. Right

05:19

on for sure. So Galen, obviously there was a set of circumstances in your life that led you to say, this is a real problem and I'm going to I'm going to give it my best shot at starting. And it led to a company. What what kind of scenarios were you working in that made you realize this is what you want to do? Yeah, I mean there's a couple. The earliest instance of this innate pain it came at my senior year of college. My senior year of college, I was doing a

05:56

research project on distributed databases. And this was right before Ethereum launch, which is a cryptocurrency, and I was doing a research project to explore different consensus mechanisms, so like low level protocol comms between notes. I wanted to spin up three nodes of this network just to experiment with my different consensus mechanism. Nearly impossible to do that at the time, and a lot of my project

06:29

was oriented around simply getting the things spun up in the first place. Of course, the code worked on the developers' laptops that worked on the DevOps. Laptops didn't work on my laptop. And it wasn't about it wasn't about building a particular note. It was about getting the cluster of the system set up correctly where there's really no good abstraction that gets you that very consistently. You can get it in certain architectures in certain ways decently consistently. But that was

06:58

the first instance. It was very fresh trading, like why won't this thing just spin up? It's the simplest application, just run it as it is. Yeah, that was the very first one. I saw it later in my career as well as working at a company that had had a internal tool that spun up a full micro service deck you can think maybe two hundred micro services, and it was used for a variety of purposes between dev demoing uat so like a high level just testing user workflows to see how they would work.

07:33

That took a lot of effort to get that system spun up, and at the essence of it, it's the same issue. It's I just want to run the system and a configuration that makes sense for the work while I'm doing and it can be remarkably hard to do that. So this these sets of issues. It's kind of the feeling of like, this shouldn't be this way. It should be decently easy to just run software. It should be hard to like build software that's that's that's amazing, but it should be easy

08:07

to run it. It's kind of the feeling of the way the world should be that kind of motivated the entire start into pretosis, right, And yeah, and I think there's I think maybe just part of that is we build these complex systems, but for most of us, once we build that, like once we build our production environment, the people who really have the skills

08:33

to build those, they don't build a second production. The part where this pain has really felt is in the development teams where they're trying to you know, build and tear these down to test different features and add new features to it. Yes, and that's one of the main themes that I've learned while going through this journey is it's a lot of the times these development environments they're about getting us close for production as possible but safe and you can do the

09:03

workfload you need to do. But the main value access is how close can you get to production? So, like Warren mentioned, local stack, you can run a tows services on your laptop. It's one approach to it, bring production locally or bring production to your development environment. The other approach is

09:24

the opposite. You put your development environment into the cloud, where you then basically fork everything that you have in production and you have your cloud services are sitting in the exact same environment where your development code is another approach to it.

09:37

But basically, everyone who's putting a lot of investment into getting these great development environments does one of those two things, or some combination of both, bringing that bringing production locally or putting your local into production in the sense of into your cloud environment. Right right, what's the barrier to intrig for using something like critosis, Yeah, right now. The biggest barrier to entry is defining the setup logic for your environment. So we have a definition language.

10:15

We use Starlarc, which is the language. It's a Python dialect that Basil uses to define the build rules. We use it for similar similar properties that it has for the build system. It's you can determine that the program is going to halt because it's not a full computing language. But you need to define your set up logic in this language. And once you do it once, everyone else can reuse it. But what happens a lot of the time

10:43

is who knows how to do that? You have to It's especially in an environment where there's a lot of different open source tooling being used to varying degrees of maturity. The person who has the expertise to understand how to spin it up in a different right situations, that person might not be maintained that project anymore, even so track that down or relearn it somehow, for sure.

11:11

Yeah. And then I think there's a related piece to that is not only building it, but then making sure like reverse engineering, I guess, for a lack of a better term, to make sure that what you've built for your local dev tool actually mimics what's really going on out in production, right, because there's a lot of different ways you can make all of these different components talk to each other, and you've got to try to replicate that.

11:39

Yes, yes, and that's that's a lot of work, and it's that's one of the reasons why a lot of people do opt to forego that whole process and just put their development code directly into a production like cluster in the cloud as as possible. Because they're like a localist, never going to be close enough to production. It's going to take a lot of never ending effort to get it close. So let's just get to it a stage in cluster as fast as possible, let things soak there, and then go to abroad.

12:11

It's another approach to it. There's benefits and cons to each one, for sure. Yeah, what are some of the big Well, first of all, how long have you been building critosis? Yeah? We've been doing this full time since December of twenty twenty, So what is that? Okay, three years and change? Yeah, yeah, yeah, cool. So what are some of the how is how is your thought process changed in those three plus years for building development environments? I mean, I'm sure you've learned

12:48

lots of lessons and tried a lot of things that didn't quite work. Absolutely, absolutely. The couple of things. First, we started out, we started out our company selling a basically binaries. So one thing is, how do developers and development teams want to consume developer tooling? So we started out saying we're going to sell a compiled binary that you get to run as a license fee, and their first couple of clients we operated that way and It

13:26

was fine in the very beginning. We were solving a very important problem for those clients. It was acceptable at the time, but we very quickly learned that this is it's not the way that developers want to and for really good

13:41

reasons, want to be interacting with software. You do have to have the source code available, You do have to have a free license, you have to have a free period of trying it where they know, because they're integrating at a deep level with their tooling, if they're going to try it out, if they're going to invest into it, I do want to know it's not going to get yanked out from under them, and that's totally fair.

14:03

So we change our licensing entirely, made the reboot open, and that change changed everything about the rate at which we were learning, the rate of the exposure to different developers, people able to try out to too long gate different perspectives. In retrospect, it was pretty obvious. Maybe a lot of these learnings are pretty obvious in retrospect, but you know, that was that was maybe the biggest change that we had in the business, was opening a tool

14:26

like that. No, I think you're in good company. The past few weeks months I feel like there's been a huge shift and focus on the podcast. It keeps coming out. Wills bringing up this point about selling your idea within the devlof space as a product management challenge, So really understanding what your teams need that are your consumers. It's difficult but also really interesting and can

14:50

help you build a great a great product for them. Yeah, and it's developers are they're amazing users to work with, but it's a very different dynamic. I think then products that maybe are selling the sales folks or business focused folks, you're the way that products are integrated and interactive with is quite different and as different dynamics. But once you're solving a problem that the developers really really care about, you really do develop really deep relationships for users, especially

15:24

with like the community based approach with like an open product. So it's it's been game changing having that kind of a relationship working for us. Yeah. I think that's a really cool point there, because you're talking about who your target market is, and so whenever you think about like production deployments, you know, deploying my application to what my target customers, that's a very different process than deploying my stack to the local development team so they can build on

15:56

it. It's a different, different customer and they have different goals and objectives like the customers, the consumers of my product want high availability, up time and fast response, whereas the developers want more granular access and the ability to tweak and turn the knobs across the environment to see how they can solve their problems. Is that a fair statement. Absolutely? Absolutely. In fact, there's even three different profiles to consider. There's the developers that are the end

16:30

users of the tool. There's the we call it maybe the platform team DevOps team in for a team, they're really the ones that are responsible for the implementation and configuration of the tool. A lot of the times they're the ones getting access to the GitHub repos they're doing the high level configuration administration of the installation, and they're concerned about the security posture and all that. And then ultimately you also do have like the director or head of engineering, whoever approves

17:00

the budget for whatever you're charging for an enterprise version. So there's three. There's three different concerns to worry about, and you have to think about each one differently. It's it's like what when you think about like a minimal valuable product, Like how how batting of the concerns at each of the three parties you really have to do to like get something off the ground. Uh yeah, so it's a multi dimensional problem. No one envies a startup pounder,

17:30

I mean they do. They think you get to build your your dream, but then you deal with these quite challenging situations where your buyers are not your users that you are you know, really focused on. I have a question on that. Actually, I feel like and I know I'm gonna put myself in some bucket somewhere. I assume like everyone's using open TOFO today. I mean, I know that's not the case, but you know, maybe there's there's a huge audience there. Especially if we talk about people that are in

17:56

the quote unquote develop space. If they are familiar with this terminology, they probably are using some sort of infrastructure as code solution. We use extensively CloudFormation ourselves. What is the how does that interact with krotosis here? Like where

18:12

the touch points? Like if we've already defined our infrastructure in unfortunately HCl or worth in a programming language that doesn't really well match to declarative nature, what do we do to spin up a dev environment absolutely, so right now, krtosis operates after you apply your open tofu or your terraform scripts to spin up the cloud services that you would need for your development environment or your testing environment.

18:42

You can think of it really as a abstraction layer over Kubernetes and Docker.

18:49

So if you have can use your terraform scripts open tofu to spin up your Kubernetes clusters and provision everything required there and then deploy onto the clusters with grotosis, I see gotcha, And so then that so like that, that's like a way to leverage some of the tools that you're using to deploy into production for building that so you're not duplicating a lot of effort, right so you can still leverage the learnings from terra farm or open to tofu or any

19:32

other production grade tools that you're deploying with exactly exactly. And then the cloud provisioning side of things is an area that we are now exploring, and it's not an area that's covered in the current like open tool and like the direct aspaciates of the open tool where it's hey, can you also handle maybe four K my production environment, and its like as look as you could fork an

19:56

environment, can you handle clone everything necessary there? Right now? That's a lot of what's happened in people's information and terraform scripts is something like that we haven't That's an area that we're exploring right now learning about. You know,

20:11

what's important when you're when you're looking at designing those those scripts. Yeah, for sure when it comes to documenting, like because that's a I think that's a would be an important artifact of this process is we've discovered how our production infrastructure is working, reverse engineered it. How do you approach documenting that so that when the next development team comes along, maybe the learning curve is a little bit uh, not so sharp for them. Yeah, right now,

20:48

the document I guess there's two ways to think about it. One is one is how much can you write? How much can you have your users write code that is some self documenting, so like, can the environment definition be the documentation and as much as possible. I like having code that is extraordinarily readable and it seems self documenting, and the way that our tool is set up, you tend to get that in the code. It tends to be very composable, it tends to be very readable. It tends to be very

21:22

intuitive, So that's one function of it. Another function of it is the functionality that we offer, so we have high level environment parameterizability. What that means is when you write your environment definition, you can expose very easily how to modify it in a sensible way. So you start it and you say, well, maybe I want to modify it this way, this way, this way. We don't need documentation for it because when you want it with

21:52

that configuration, you're going to get it out of the box. So you're trying to reduce the need for documentation as much as possible with the design of the definition language. But you can only dose it so much so that after that it's it's traditional documentation. It's you got to rely on your users to

22:07

write the readings, to write the docs for those specific systems. That it reminds me of a quote that I heard a long long time ago, back when everyone used to have wikis for documentation, and I worked with this. One guy said, yeah, the wiki is where the documentation goes to die. But I think like one of the cool things I think that's happened is

22:32

if you've ever used Jupiter notebooks. I love the way that they let you mix, you know, different different syntax in there, so you can have like a markdown block and then a code block, and so you can kind of integrate the documentation alongside the code. And for I think for iterative or interactive type environments like that, it works really well because it then lets you keep everything in one place, because when you try to update multiple places,

23:00

it just never happens in reality. Yeah, and I love Jupiter Notebooks the experience, Like, it's one of my favorite That access in the world is like spinning up and hacking something together in a Jupiter notebook. And the dream is to feel that way about all of our development work. I'm not sure realistic, but I love that experience. I mean, that's an interesting thread. What would what would the world have to look like that regular development could

23:29

follow this sort of path here? Is it a matter of what we're working on or can we already sort of start doing that today? Yeah? I mean what's great about it? Like, one thing that's great is how fast you see your changes work. It's like extraordinarily fast. You write it, run run and run, run, run, and you get the feed back right away. That's the build deploy like how the build deploy loop. It's like, how fast does it take for you to see on real stuff off

24:00

what your things look like? Uh, usually in Jupiter notebooks you're using. I mean, like a lot a lot of the fun use cases for it are running on relatively small data sets so it fits in memory. So you do that fast feed toploy build a play loop, and that's just tricky to get. Like you can get it with like hot reloading into containers that are

24:21

running in your in your environment. You can get decently close to it, but you're still stuck at the level of like what is this What is the smallest build increment that you can that you can like define for your background system. I know a lot of people spend a lot of effort getting their their basal build rules so well defined that they're going to like minimize the build a play loop as much as possible, But that takes a lot of effort.

24:45

You really have to look into it to get that experience. I mean, I'm not sure that there's a really good way to get that now, No for sure, for sure not but I think you know, tools, tools like yours you know definitely help reduce the loop here, which is like the most important aspect because no matter how much time you spend, really every second you cut off your loop allows that feedback to happen faster, which improves the

25:14

time. I think it's a hackneyed response now that realistically people waiting for software development, is there what makes them unhappy? Right? If you have to wait for the response? Will said, You know, he's all about the instant gratifications. Nothing better than you know, hitting F five and having your program instantly work or tell you what you did wrong, which is you know, I don't think it's just me. I'm convinced that we are a TikTok

25:42

generation. You know, whether you're a TikTok user or not, we've all gotten addicted to short form content. You know, if you're going to do anything that requires an attention span of over sixty seconds, you really have to engineer for that. And I we've been digging a lot into this this little that little feedback loop there and one thing that did the dream world. There's kind of two really important things. One is getting that build deplay loop super

26:14

super fast, so you're coding the change, accoding to the change. The other one is when you see the change, how close is the change that you're seeing to what would happen in production? And I can you get both of those things maximized, so like you're getting a really really fast build to play loop, but also the deployment is safely basically a production when you're doing it. That's kind of the dream the dream world. It's it would take a lot to get there, I think all the way, but you kind

26:42

of like, roughly, what's happening. I think with the more and more sophisticated like dev infra along the software development bicycle, roughly what's happening is you're getting like more and more optimized on both of those axis to the point that eventually it kind of feels like you're doing that like the limit of what's possible.

27:02

I think there's a sort of a conundrum here where the providers that are offering production infrastructure or platforms, the amount of money that you're paying into them is so large or so vast for the environment you're running for production that it never makes sense. It's not something that is ever justified for them to make

27:22

it faster to run or get the loop closed because they don't care. It's you know one like how much is your test staging, fake feral environment cost in actual dollars, euros and pesos whatever compared to production It's like what one

27:37

percent less? Even so like Amazon, you know, Google, GCP platforms of the world, they're not going to make anything there, and so I feel like there it would be on them to really change that because that's sort of like the canonical interface that you have to abstract or make the same as much as possible. Yes, absolutely, and basically everyone's working against that limitation

28:02

as far as I can tell. It is like that's the that's the blocker, the only way to like you're you're basically just trying to design more and more like innovative solutions to get around exactly that obstacle. Yeah, because let's be just like completely transparent here. The gold standard is them on the production

28:23

server. Yes, you said that. I feel like that's where I started my career actually right for sure, Like I was working at a company where you did software development by sshing into another service somewhere and there was an environment running there which then you logged into and you literally changed production objects by opening

28:45

a text editor there. We didn't it wasn't VIM that would have been slightly better than what was being used, uh and uh, and then changing them and then you had to record the changes you made in an overgrown internal system that would somehow get those changes applied to actual production when there was a release. So, you know, I really do feel like what was old will be new again, and I honestly don't know if I want that world.

29:12

My introduction to programming was doing something very similar to that, and the guy who showed me how to do it, he's like, okay, when you get here, you want to run the LS command and you could see all of the files there. But then it was like, you know, for any given file, you know, like just say like Maine dot c, there was Maine dot c dot one, Maine dot C dot two, Maine

29:36

dot c dot three. Is like, so you want to look and see what the biggest number here is and copy Maine dot c to the next number so that if you screw anything up, you've got a backup for it. And that was our version control system. Oh no, I'm just like, I think people figured it out date based, right, You don't. None of these are the fancy tools. All you need to do is append the timestamp of the last time you edited the file there and then you'd be good.

30:03

Yeah, I mean it's the VIM on the production server. At least for me, this is like clearly the best debt X Like I love it. I love it. It's just dangerous, it's super dangerous, and like staying up all night to think there of an outage is miserable. So that experience is not great. But the feeling of the VIM directly to production is is is the ideal, So like to what degree can you mitigate risk to

30:36

get as close to that as possible? I mean, you said it right, you know, things like kurtosis, you know, shortening the gap there on time to make a change so that rather than VIM on the production server, it's VIM on your machine or you know, using one of those better tools that's out there, and then making that change and just immediately that gets to a clone of production that is almost exactly this, I mean exactly the same, and you can see that change running as if you had actually applied

31:00

it and then get all the benefits without having actually the danger of making a production change. Yes, yes, absolutely, that's that's that's the that's theer's the dream Yes, So what is the learning curve for someone who's just they've they've done the seven thousand line bash script and they're looking for something different and wanting to check out critosis. What's the learning curve or the amount of time

31:25

it takes to be productive with this so as framed very low. And the reason for that is because that person did the seven thousand best script part. They figured out how it all works together, they figured out who needs to be scripted, they figured out all of this hard stuff. That learning curve is extraordinarily low. I mean, in Krytosis, you're using a Python like dialect. Most people are familiar with Python. The set of I guess functions

31:53

and tools you have are very are very basic. Half a day to just rewrite it and something that's uh, that's cleaner, and then it's going to take you a lot further. It's probably about what you're going to get. Now. The part that's you know, frustrating for us and we're trying to figure out how to make this better is what if you haven't written the seven thousand mind mass script, then what do you do? You say, Hey, someone else wrote the seven thousand mind mass script. I don't understand it

32:19

yet, but I want it to be. I want to get all these benefits that I can get from protosis. Now we have a really big now we have a really big verded curve, but the learning curve is mostly inside of the side. Interesting for sure. And then can you give us like some some success stories of like different problems that customers have solved. I always think that's kind of cool because then it, you know, it gives us something that we can like resonate with and be like, oh, yeah,

32:51

I remember when I had to do that. Like, what are some cool things that you've seen people do with krotosis? Well, I feel like your whole work experience history here, you know, just recount those things and it's like, yeah, yeah, my career could be a like a Forrest Gump story in tech. Yeah. I Mean the biggest and like the most proud use case that we have for cryptosis is what the Ethereum Foundation is doing with

33:30

cryptosis. That's probably the thing that we're most it's the most incredible thing. So for folks who aren't like deep into the blockchain ecosystem, the Ethereum main net, so it's like the production deployment of Ethereum is now extraordinarily complicated. There is like twenty different open source components that have to get splat up together. There's no centralized DEVLOPS team for this, so it's all different people running

33:57

their own different service with moose communication between them. What krotosis, what they're using Katosis do. The Ethereum foundation is to package all of that together in a one line run with high level configurability, so you can say, hey, I want to run it with this topology like these clients and these clients, and it works remarkably well. It works remarkably well as it's it's a

34:24

large degree of complexity. The Beast. I don't know what it would look like if we like unfolded it into the Beast script scenario that you described, it would be very remarkable. But now this thing it runs, if it runs the UH, you know, you can one click replicated Thethereum main that on your laptop or inside of one of your cloud boxes or Recubernators cluster at scale. UH, so you can kind of hit the dev all the way

34:51

to late stage testing performance testing. UH. There was a team that built h They built it's basically like a sad path testing on top of this tool, so like a path network perturbin in lots of different ways, as you call it, chaos testing on top of this, and it takes you all the way almost to deploy the proud if you're going to make like a big change in the deep layers of the network. So that's that's the most That's the thing I'm the most proud of when it comes to like what are tools

35:25

been able to do. There's a large degree of DevOps complexity encapsulated in that project, and then there's a bunch of people using it to do some really cool things on top of it. I feel like that makes a lot of sense there as well, Like when you have a lot of different testing scenarios that need to be captured and the configurability for them is also determined sort of by the constraints of the test, then you have a huge dimensionality of the

35:53

data and configuration that you need to specify to actually execute it. There. Whereas I feel like, you know, some other environments that may not be total like always the case, but in this one, like, yeah, for sure, I can totally imagine, especially given the fact that it's there is no you know, official production environment where you know cloud provider where THEOEUM is running on exactly exactly, And it happens in cases where you're doing a

36:16

lot of different testing things. And it also happens when you're I guess you could call it like a feature branch environment. It's it's these situations where you're you're like pre merge and you want to spin up a representation environment that's specific to you're not really deving on it because it's not inside it. You're like build and run loop as you're in your IDE but it's kind of close to

36:39

like your death because they're not ready to put it into CI yet. That situation as well as one where inside of an organization or an open source community you might have hundreds of those a week of slightly different configurations, so it's either easy to configure or it's not. But if it's not, like this

36:59

is a a great way to do it. This gives me the sort of PTSD flashbacks to the previous environment that I was working in where there was like different production environments that we had because they were for different customers or different internal business units, and then we had copies of them that were test environments that we're supposed to match the production one because realistically, given the ridiculous amount of

37:24

configuration, you almost needed to run your tests in every single sort of clone of production however it could be possibly configured, and then of course it was a problem where like the actual switches from production didn't match where you're in your testing environment, and so you had not only did you have to run your test in all of these fake versions of production, but also make some changes. That for sure was quite the nightmare. And there's a mode of thought

37:49

that I'm honestly a fan of this movement. There's a mode of thought where you just say, can we forego all of that and just make shipping the production as safe as possible and do all of it like test in production. That motive thought is like sometimes possible, I think. I think that's like Honeycomb's big big model. They're an observability tool like test and production. We're just going to give you really good monitoring tooling so the moment something goes wrong,

38:15

you can catch it right away. For an impacts your entire customer base rollback like some some some products I guess can do this. Otherwise a tiny little bug is a disastrous like medical tooling. I don't know, Like I mean, I want to do it for sure, sign me up. I mean we're in the at authors with the five nine SLA where for sure we can't let stuff go through. I think the biggest challenge is depending on what

38:43

your business is actually like, what the business logic looks like. The fear I've always had with these sort of automatic rollbacks in a lot of cases where what happens if the problem is compounded by changing the database version? And I always keep getting back to the Night Capital four hundred and forty million dollars loss in forty five minutes, kind of like the ridiculous thing is they had the wrong version of the software only running on one machine, and they made it

39:06

worse by restarting the cluster. Like if they hadn't restarted anything, it would have been fine. And they assumed that the problem was with the rollout, so they rolled back everything and then also compounded the issue. And it's those things that really scare me, and I like, I would so totally before if the automatic rollback could calculate whether or not the you know, the database is that fault and also include that, And I feel like I'm not going

39:30

to see that in my lifetime. I don't know. To me, it just sounds like we're making excuses to keep from giving people VIM access to the production server. I mean, you know you're I think in control of that actually had poligon right well, like you could just go and turn that on and let that happen and it could be like it could be it could be the real case study. You could do that right shop, whether it was

39:53

a success. You know, published that blog post. You know that you're doing this, why it's so great, and you know you could get on board with it, and then next week's episode will be reviewing my resume because I'm looking for a job. I'll never forget that article that was published by get lab about how they you know, accidentally deleted their production database. Uh it was like a decade ago. Now. I was like, oops, Like I ran this command and I thought I was on the replication uh server,

40:23

and I was actually in production. I've done that to do. Yeah, it was in AWS, and I thought that I was on the on the staging environment and uh just dropped an entire database cluster. Oh that that's not that bad, right, I mean, you just restore that from from

40:47

the one that you you know, got snapshotted from an hour ago. Like it's much worse when you, like, you know, write that update statement in sequel and forget the select that comes after it, right, you know, people say you should write that first, like write the from statement first and then change your select to an update. Yeah, it's just it's ridiculous

41:06

that those mistakes can be made. But I think at every company I've been at, there's been a special name for that, like, oh, don't eighty four the database, because like that's just happened to be what the update was, where someone didn't configure the restriction correctly and now there's some column with this wrong value in it and it's so noticeable too, right, Like it's

41:24

not like, oh, database is gone, let's fix that. It's oh, when did someone change the name of all of our customers to eighty four? I mean that's kind of like a badge of honor though, like if you were if you were the person who implemented eighty four, like, would you put that on your LinkedIn? So yeah, I'm that guy. I think that was my biggest fear when I started my engineering career was doing one of these things like the terrified of touching anything that could look and feel something

41:59

like deleting a duction database. And I think it is about your honor. It's the moment when you realize, like, once you do something like this, hopefully to a smaller scale, you kind of learn like this is a process issue, like this should have this should have never I never should have been able to do it. Shouldn't have been a risk that I would do this. I don't. But when you start out, you feel you feel I'm gonna get fired if I need one big wrong. Yeah, and it's

42:28

one of those things. I don't think you can communicate that. I think the only way to learn that is to to earn that badge. This is a failure everyone has to go through in your life. You know, suffer that trauma. You'll live with the PTSD from that moment on about how the more the larger system, because you know you've gained more experience at this time, you're in a more important position. Your impact to the company is obviously

42:51

going to be higher. You'll never make that same mistake again. Yeah, it's like riding a bike, Like you don't really understand the pros and con of riding a bike until you face plant into the side of your house. You're like, oh, gotcha, breaks, it was a mailbox for me.

43:14

Cool. So let's talk about like we talk about mimicking the production environment, and production environments are always changing with you know, updated code and updates to the database and sometimes breaking changes, especially like since we've been talking about databases, like breaking schema changes with cartosis. How do you make sure that those production changes get integrated back into the local dev tool environments. Yeah,

43:45

right now, that's something that's something the end user has to configure. We're looking into ways to get this more to provide like a product doze, way to automatically get this. But right now, when it comes to reduction change, my dev environment has to update to it. You got to. You gotta have a process in place to check out updating the configurations. Ideally, what you do is you any time that you make a production change, it

44:14

goes through your krotosis set up first. That's the ideal scenario, but it's

44:20

not always going to happen that way, of course. Yeah. No, I mean I imagine that you're making the database changes in your source code anyway to run a migration, if it's schemed or schema less you're making the code changes that are associated, and you would just you can just refresh your dev environment, like you know, blow it away and start fresh, because the migration should just not even need to be run potentially and it would just be

44:45

the same as production. Well at least that would be my hope. Yes, yes, it's it gets goofy when people do things like like hot fixes, where like other people hot fixed stuff and you never looked at it. It can get goofy. But happy past scenario, it's not that hard to get it back up to. You know, these were these experiences are important. I remember having to do a production fix and at the time that meant

45:16

that a different team created a SPN feature branch for us. It was a hot patch of production, and I ended up with this problem where there was code that should have worked in production, but it didn't play nicely with what was there, and you know, I compared it to what was actually in and Devin, it didn't match. And what I had found out is that there could be multiple production fixes that had to be done at the same time,

45:40

and merging in SVN didn't really work that well. And what happened is is you would get merge conflicts there, and this team that was not the development team or knew how the feature should actually work, was going and actually doing the merge conflict resolution themselves and pushing that out. And that was a very horrifying learning experience for me. I think I'd rather drop the production database.

46:07

I mean, you don't even have the code to look at to know what the issue was, because like I had to go find these other branches that weren't merged to production and like merge them into my code and see like it still wasn't right, Like where is this? Where is this? Wrong? Code just didn't exist anywhere. And then of course you have like the incident response team coming to your desk because in these days, of course you had to be in the office and like asking you, why are we losing

46:30

one hundred thousand dollars every minute because of this problem? I'm like, I have literally no idea. Yeah, that's always cool too, That's just another fun experience of being able to quantify the cost of downtime, like, oh wow, this is serious. I feel bad enough to work like twenty four hours a day, seven days a week, and you'll never be able to wait from your computer in case something happens there. Right, that's the solution

47:08

four seven with VIM on the production server. Yeah, it's just going to keep you know, if you keep bringing that up like I'm gonna, I'm going to write down this in my in my notebook, you know, see when Will got this product ready to go, where we'll let you you know it uses maybe s s H under the hood or your ID you know, just you called VIM and production you can even you know it even it lends

47:29

itself a name VIP, right, and we'll see where that goes. Right, I'm just going to keep campaigning until they let me give a talk at like platform con about how to do this. Well, you know, how to get accepted to those conferences. What you do is you first just say you've done it, and and then if they get accepted, then you'll have to come with the whole product and actually try it out and I then present

47:54

it were full of useful tips today. I mean I really I really like this idea of getting a tool in the mix here for helping build the the

48:09

the environment for real. Yeah, and if you like to help illustrate the scale of this, you know, with your success with the Ethereum foundation, that's a monster project with completely open source, you know, and so I'm assuming that like if you want to if like, if I wanted to contribute to the Ethereum project, I could laugh and Warren, I think I'm going to ask for access to, you know, update this tool, and I'd be like, well, you know, will you know, making changes to

48:45

it? I'm not. I'm not sure I know. But if I wanted to to contribute to Ethereum, like I could use crytosis to build my local div environment which helps me write the code, test it, and then create a poll or open a poll request that's going to be more likely to meet

49:07

their contribution guidelines. Is that a fair statement? It's I mean, it probably is the easiest way to do that, Yeah, yeah, versus the alternative prior to because I worked with Ethereum before they started at adopting cretosis, like prior to that, you know, it was like twenty or thirty steps to get this thing running, you know, where you were downloading from all these different repositories, and then you were compiling it, and then you had

49:36

to figure out in the config files which parts you had to change to make these different things that you just compiled, but you weren't really sure what they were talk to each other and so yeah, this like having this type of solution makes it easier, and that could be that could be another benefit for you, especially if you're an open source project, making it easier for people

49:58

to contribute to your product. There's actually a good rule of thumb for when an open source project or community would would benefit from from using for dosis, and it's when you go to the docks and it says how to run locally if it's like the longer that thing is, the better you're going to get. It's kind of the general rule of thumb is how long is that thing? So yeah, for the ethereum, it was, it was huge. There's other general rule of thumb is that that that that's pretty much how it

50:30

works, uh and and it is help work for contributing. It's most of the people, most of our implementations are open source entirely, so it's you know, it's it's people contributing back to those uh, people that use them, putting things back, getting more options available. And that really really helps because our team doesn't have time to handle all of the different scenarios. UH.

50:53

When we start building examples and everything. So having people contribute back is I think the the Ethereum package, it's called a package, the definition of how to spin up an environment, the Ethereum package. It really is a remarkable work of open source contributions coming back. They're very, very engaged users. I think you touched on this a little bit earlier on about first run

51:15

the licensing model and then and then you change off to that. Like if someone's going to get started today with this, is it I install a CLI and write a configuration file. Is there a hosted version? Like on these packages available and you'll deploy them automatically to my cloud environment. What does this look like? There's actually two there's two versions right now. The standard way is download the Clyde the a homebrew or app get or whatever is the preferred

51:40

installation thing. Most people who start, they're starting because they see a package they want to run. So so most people they're just running a package when they start to see how it spins up, and then they start modifying that package if they want to modify it, or they're like, hey, this is cool, I'm going to brite my own my own system, and it's all all happens on your own laptop and you're own hardware, and you just

52:01

you just run with it and go. And then we do have a cloud right now that we host where Hey, let's say you don't want to down the CLI and you don't want to be using your own compute to be running the environment, you can just button click. It's like a no code interface to run any existing package. There is also no code interface for modifying the package to a certain degree. That's evolving, so you can get more and

52:28

more power over modifying it. And then once you're excited about it, you can write your own package and then you can deplay that there as well. And then the packages are those the ones that end up showing up in the Kurtosis catalog. Yes, yeah, because the Curtosis catalog is actually really cool.

52:47

Specifically, you know, if you going back to the Ethereum example, you know, you can just grab that from the catalog and it runs for you, right, and then there's there's actually quite a few out there in your catalog. One that I happened to know of is the Polygon CDK as of last week, just went up in the catalog. So if you want to run that, you know, just grab it from the catalog and there

53:10

you go. Interesting. Yeah, they're right there. And and what the I mean the way the catalog works right now is it scans public GitHub. So if it's an open source package, it's going to get detected and populated into the catalog and then you can just run it. Because it's an open source, it's an open source tool. So the Polygon CDK, for example, the Polygon team put a lot of work into to get into the package very in a very nice format, and now it's there and you can just

53:38

button click run it. And one of the most exciting things about having made this step of opening everything is that we're seeing stuff happen without us knowing it or tracking it too closely, so like, oh, look like look at what the Polygon team has done. This is amazing And that's very exciting. Yeah, for sure. And I think that's because y'all are completely open source as well. Right, So the tool is actually the BSL license, okay,

54:07

source available. You can do everything except for recreate our right on it,

54:15

which seems like a fair constrained exactly. Yeah, no, but I think that's really cool having the catalog there, especially with the open source communities, and it seems that Kurtosis is really resonating there to make it easy for people to just run and then contribute and then improve your product and in a consistent format, you know, because I think that's one of the biggest challenges of running an open source project is verifying or vetting out the contributions that you

54:52

get to make sure that they have done in some cases just tribal knowledge. Yes, yes, yes. And it's one thing we were noticing, especially as we work with more and more people who have cloud dependencies, is the more that your system is is composed of open source components, the more this problem is important to you because you're you're dealing with a launch of different software components built by different organizations, where where you can run that entire software component

55:23

yourself. That's the whole point of the open source So you don't have the issue of you know, cloud provider run a production environment has zero incentive to like make you able to like modify it or move it. In the open source communities, not the case, you actually can do it entirely. So the more open source your tool is in terms of like the percentage of services

55:45

that are involved in your environment which are open source. The more the more you get this benefit of like how do we verify that the contribution is going to work and makes sense? And you do kind of have to run it

55:57

all to see it. Yeah, I would even take it further, honestly that most open source solutions out there that aren't using a strategy or have a strategy for easy for contributors to come in and run the software would benefit a lot from a strategy like this, because realistically, if you go out there and find an open solution, no matter what it is, even one used by thousands of organizations, it is quite the challenge to get that up and

56:23

running. Even if it's a darker container, even if it's you know, in a registry like it's still there's a lot that goes on with it. I remember it was it was even recently that you didn't need a zookeeper anymore to run Kafka, Like it was always this sort of mess that if you ever wanted to run Kafka yourself, you needed three different services running on your machine with multiple nodes and clusters at the same time in order to even test

56:51

this out. And I'm like, what, yes, yes, it's a big issue, and it's and it's for the most part, in my opinion, it's a bit of a silent issue because if you're contributing back to an open source project, and a lot of the time you're doing it out of almost like a hobby mentality. It's like, I want to help this project, I'm interested in it. If there's a really big friction bar for me to do that, I'm not getting paid to contribute back to it. That's

57:17

the whole idea. So I'm not going to put an inorder the amount of effort into knocking down the door to say, like make my testing environment better, like make my environment better, I'm just going to disappear. So like the amount of open source contributions that you're missing because because people just got frustrated in the first five minutes and say, you know what, like someone else is going to handle this contribution at some point hopefully is I can just imagine

57:42

the velocity that you're losing there. I mean, I hope that's really the case, that that's even where they're falling off. I find that in Unfortunately, the state of the open source today is like you find an issue out there, or you want to even make a change and you may open an issue or open that poor request and just never hear back. And with stuff like this, you know that the team has invested in what that experience is

58:05

on building stuff up. So it could be a badge of honor really to say this open source project is using kurtosis or another tool that makes this experience better because you know that you thought about it, that they care about getting

58:15

those contributions. Then, so, is there is there a thought pattern behind you, Like when we talk about production servers with the whole DevOps movement, we kind of came up with this philosophy of servers are cattle not petch, you know, referring to the fact that they should be short lived and you shouldn't have emotional attachment tool them. You shouldn't be like hand curating them.

58:45

But I feel like dev environments maybe didn't fall under that umbrella. Is there like a thought process or something that you've noticed through critosis where you kind of see people shifting away from the long lived development environments because it's a little easier and standardized to bring them up. Now, it's I don't know if I would say that it's a shift away from it. It's in some ways it

59:14

is for certain use cases, like so how could I describe this? What one thing crtosis does is if you run your integration tests in CI and cider crtosis, it's quite easy to rerun that locally in your laptop. So if you want to debug or failing integration and the end test, rather than recreating it in a remote cluster that's like a shared cluster, you would just rerun it on your laptop. And in that case there's a shift towards ephemeral environments,

59:45

like you're just for depth, like you're just running vision. But there's also people who are like, I'm going to use kryptosis to set up a persistent development environment because the other deaths of my team it's just significantly easier for them to ping an endpoint that's sitting there stable and to get into that. Then even to like download a tool and then re run some commands every single time, and and then you get into the problem of like data management.

01:00:10

So if you've invested in your dev environment, having a particular data set behind it that's useful for that dev project, and then investing the extra effort into cloning and moving that data for an ephemeral use, it often doesn't make sense you just use the same thing. So I think it really is It really is dependent on the use case. But there definitely is a meaningful difference between how you treat your dev servers and your broad servers. So like the cattle,

01:00:45

the cattle not pets thing. I think it's totally fine for your dev server to be a pet. Like it's there's there's there's times when it makes a lot of sense. The moment you want to see that it looks like braud, you do kind of want to turn it into a cattle just to see how it works. But there's a lot of telling reasons to treat it like a pet, especially if you are debugging an issue or trying to figure

01:01:07

out the way things kind of work. You want to have that state sitting around, and you want to really preserve that because it's going to take you some time thinking and creative time to work with it. And if it's a

01:01:19

cattle and you lose it halfway through your work, it's miserable. You got to restart, right And I think I heard you say that there is a there's a use case here for using something like critosis to create a persistent dev environment that provides a common set of services for your dev team to access.

01:01:39

So not everyone has to run that locally, but there's one that's centrally available for them that maybe has a specific configuration or has a something unique to it that everyone needs to be thinking about or addressing in their local development efforts. So you just provide that as a shared service for everyone. Yes, and

01:02:00

that's useful for a lot of development workflows. But then there's another set of development workflows where you're going to say, I actually do want to isolate for an ephemeral purpose a particular set of micro services or a data set, and

01:02:16

I want that to be my sandbox that nobody else is touching. And that happens a lot as well, because it's like, hey, what if three different teams want to use the dev cluster and they're all hitting each other, and then you have to go on Slack and say, hey, wait a minute, like, let me use it for the next two hours, you guys, when you blow away their work, and it's miserable, But that's only sometimes. So the best world is you have the choice. When you

01:02:40

need to get the isolation, you can get it quite easily. When you don't need it, you can use other people's work. And that's the direction that we're trying to go towards, is how do we provide we like high high level. You want it to feel like you're in Harry Potter and you have a wand and you're just like that, you have the figure need Like that's the highest level user experience that you want to get to. It's like cool, Like, how do you design the want such that you can do

01:03:07

that? And I think you just want to have the configurability to be able to pick which in this scenario, I want the shared duftbuster or I want the isolated if I'm one right on, So, continuing on that thought, what does the future of krotosis look like? What are the in addition to that, what are some of the things that you would like to see getting

01:03:30

implemented? Absolutely, so I said before, it's right now. Crotosis operates to you give your terraform scripts or your open TOFU and it deploys your provisions, your hardware so to speak, and maybe your virtualization later like the communities Buster or Ductor Engine, and then Crotosis deploys on top of that. We want crtosis to move in the direction where you no longer have to do that first part. You no longer have to think about and dig into like how

01:03:59

do I how do I split my my cloud assets? Over how do I provision my hardware? We want to get better and better and better at providing tools to not have to worry about that, so that that's where we're going. You know, the more that the folks in the blockchain space they use a lot of open source tools where it's just all can be containerized and you

01:04:23

get everything in the container. The more that you're working with folks who have alliances on like a to bus services or manage authentication services and this kind of thing, the more that that abstraction is not so useful anymore because those are

01:04:34

the issues, the issues of the cloud stuff. So the more that we can build an instruction there that captures that stuff in a way that's useful, that's where we want to go, and that's where we're doing all this research, and that's where that's where this thought experiment of like, well, to some degree you're limited by the cloud provider allowing you to easily get like everything

01:04:55

worked over to some degree, have those limitations. So then, given that's the world we live in, how do we get as close as possible? And that's kind of the design work that we're looking at right now. Well, I think there's totally a gap here in the IAIC world where it doesn't feel like any of the offerings are super compelling. There's I mean, I think anyone that assigns themselves a DevOps title has a unique perspective on complaining about

01:05:20

literally everything that exists. So you know, of course the tools that we have, you know, are not as good as they should be. And I think there's always a space for someone to say I wrote a better terror form or whatever, because I feel like that's sort of the abstraction layer that exists over the cloud providers. And I'm not saying that anyone should go and create a new is C tool. I'm sure someone just got that idea right when you can do that, you absolutely can do that and wrap the SDKs

01:05:47

out there. But I do I do feel like that there's always there is definitely a room here to grow in that direction. Yeah, and then that's it's kind of a maybe a fork in the road. But the two flip philosophies is like create that layer so that it's very easy to do those forks. The other one is, I guess it's you call it like the honeycomb or the launch Darkly philosophy, where it's like, can you build tools that make production just safer, like feature flags, so now just don't worry about

01:06:18

testing or like like so much. Just get into production under feature flag and then and then just turn it on for a certain set of customers and then you'll figure out how things work or like deserveability thing. Instead of focusing so much on like fourteen year staging and testing environment, you just go into production. We're going to monitor everything so closely that we're going to minimize any kind

01:06:38

of damage. And there's I think there's gaps on both sides, and there the different engineering leaders and platform leaders have different opinions about about which way they want to go, so it's kind of a lots of different picks on. I mean, I definitely have an opinion on that. Realistically, it would be nice if we could get to that world. I don't think the cloud providers offer us the sort of control over the environment that would allow us to

01:07:04

perform like one out of every million requests. Yeah, go and changes. I just feel like we're much better off pulling production down to another environment to validate that the code that we have is exactly executing as we want, and we can do it, you know, creating synthetic requests from our production environment

01:07:25

and replicating that against the ephemeral, dynamically created dev one. This is a perfect example of what we're seeing is Warren who is like, let's get things synthetically mocked, Let's get everything running locally, Let's get it as close as possible, make sure things work, and then we have Will who wants to film Upbraud team VIM right here. I think valid approaches about valid approaches.

01:07:54

People have different appetites what they want to do. No, I mean, there's still definitely things where I'm a huge proponent of pushing stuff to testing in production, and we do that under certain circumstances where we make sure that we

01:08:05

wrapped. Like parallel execution is a huge thing for us. When we're changing some of the business logic of how something works, we'll usually execute two different paths actually simultaneously in production and validate whether or not the outcome is exactly as it should be, and before turning off the legacy one and when you do that, When you do that, does that output still go back to like your fork traffic or do you know it's in process? Like it'll basically the

01:08:35

equivalent of having two threads. You know, you call this method and it does two things at the same time, and then we throw away the new result until we like we validate it, and then we throw it away and if we get any invalidations, then we you know, that's an alert that someone goes and looks at and comes back and says, Okay, you know what happened here. And if we don't get any alerts after a period of time, then that sort of level of testing has been completed and we can

01:09:00

just delete the old code and own and actually return the new stuff. I love that setup. I think that's I think that's great. I think I think that's like decently. I mean, it's it's a far cry from like VIM and prade, but like it's almost a step towards that. Right.

01:09:15

It's like like you're at least able to get something into production more safely, even though it's I imagine there's constraints on like you wouldn't be able to do that if there was like very secure data that was going into the response that you're like, you shouldn't see. You wouldn't be able to nidate it that well, well, I mean we we sort of had that because we're an

01:09:31

identity provider federated and aggregation and no one sees the data. It's just you know, hey, there's something here that doesn't match, and you know, you can sort of revalidate that and realistically the actual who the user is doesn't really matter in those circumstances. You can just sort of see what it looks like. If for some reason there is some concern. We encrypt the logs that get written even before they leave the production environment. We hash the data

01:09:57

that we'll get seen automatically by any of our engineers before it leaves. So I mean, very rarely it's the case that that actually, like what someone's last name was was relevant for us. That's a very cool setup. I

01:10:12

like that. So that's some discipline there, though, Like that's very that sounds to me like that sounds like a very disciplined approach because you're being very deliberate about what you're testing, writing that second path for it, and then you're having to specifically define and measure what the success criteria for that is. So that's a charitable way of looking at it. You know, maybe that's

01:10:42

my na You don't do that at all. I mean there was nobly a time in my life where I'm like, you know, you're releasing that code to production. Why is there an arbitrary try catch block you know around that? But there is another aspect here when if you're if you care about reliability and that your stuff don't really go down and production and the code you're changing,

01:11:01

like there are too many unknowns there. Realistically, you didn't test literally every case, and sometimes it's not so critical that you have tested everything, but you know, validating something happens in production, like oh, we don't know if a customer is using this. Well you can at least create a log message that says, uh, hey, customer is using this thing, don't turn it off yet and run that for a bunch of weeks so that

01:11:25

you have the data to validate. Uh yeah, I mean you laugh, but you're not if you don't take that first step, Like this is absolutely the advice that someone could be utilizing to improve their decision making apility, not like some spreadsheet with a list of features that are on or off per customer, but really, like what is happening in your production environment, and this is just the next level after that, which is release the code, have

01:11:49

multiple versions running at the same time, measure performance, reliability, you know, consistency, whatever that is. This is just how we've happened to be doing and I feel like it works really well for us. Unfortunately, so you know, we're not going to migrate to something else. I don't know. I haven't heard anyone else doing something that you know really changes this. I mean there's blue green deployments right where you're siphoning traffic off from your new

01:12:15

version. It sometimes it doesn't really make sense a lot when you have a lot of features going to production over time, where the features especially aren't interacting with each other. Having them weigh in a queue to get through this and then deciding to roll back or not is a huge expensive process. As you said, you know, feature flags can work here, although that's again a switch. So really parallel execution has been the only thing I can think of

01:12:38

that really made sense for us early on. And how do you think about how that compares to like a canary environment. I mean, this is sort of the same thing, like when you if you have millions of requests that are that are coming into your service all the time, then some number of

01:12:55

people would be affected by any change, no matter how small. And if you have the budget, which is you know, one of the key terms that we talk a lot about in observability, then you know, go for it, right, you know you don't need to take some extra steps.

01:13:08

But if it's just a matter of you know, changing a bunch of parameters somewhere and you want to see how it validates it, especially when you have all the business logic on your side already, right, you know you're not calling a third party service, you're not doing some other things like if you can really test it all in your service in production at runtime, why not do this to validate? Oh was that hard to set up? I mean you really, like, you know, think of you're changing a method,

01:13:34

you know, any any engineer out there. You're changing the internals of some method. All you need to do is duplicate that method down to a second block, call it V two and have the caller execute both of those things, wrap the other one in a trycatch block, and then validate the outputs in your code and if it doesn't work, you know, log a message and that's it's not It's honestly not a lot there. You can create some structure and around that with a framework where you can wrap you know, parallel

01:14:00

execution and then pass it into methods and have it automatically do it. It's not a lot of extra overhead if you feel like you're doing this very very often. I think this happens, like one out of ten features that gets implemented goes in like this. It also gives us some security around it, so I wouldn't say it's complicated. I think a lot of engineers know that this is a great a way that you can test and this helps I think junior engineers, you know, they navigated this right, you know, throw

01:14:28

a try catch around there. I'm scared about this affecting production. I think bringing that fear back in it, you know, helps us. You know, thinking that our code that we write is invincible is the problem here? Yes? Yes, So can you just share your screen for us? Warren, I'm kidding it's really boring right now. I mean, when I'm doing the podcast, I literally have a notes page and my audio equipment uli up in running and that's it. There's nothing else on my desktop right now.

01:15:03

We are. We're completely different. I never I never closed windows, and never reboot. That's exactly what somebody who wants to be proud would say. Well, I saw a meme yesterday and it cracked me up, and it said, damn, I just accidentally closed that browser tab that I've been meaning to read for two years. It was very careful about my usage there that control shift T brings that up or the previous window that I've closed, you

01:15:33

start all my tab Yeah, for sure. That. Like I was using x marks on Firefox back in the day before they bought into it, because it's just so critical for me that I can actually get that. The worst one I have is emails in my inbox because when you click archive accidentally or delete like at that email was months old. Good luck finding what that was. We have a security polow see that automatically deletes our emails after a specified

01:16:03

period of time. So email email management is automated for me. Cool. So what well on critosis, have you found an area where it's not a good fit. Yes, definitely crotosis with where we are right now, if you are able to very adequately express your system with a doctor composed file or a HELM chart with a relatively small helm chart in the sense of like your helptot rebo doesn't have some dozen lines in it, it's got like, you

01:16:47

know, the same number of configuration settings. If you have those two things, you're you're doing great. Krotosis is not going to be great for you. It's the situation where those things have to be augmented by the seven thousand of my mast script, the bunch of different documentations on how to configure these things. If your doctor post file has or your home charts have a bunch of lines commented out that say if you want this behavior, please uncomment these

01:17:13

lines, and it's all over the place. That's where that's where you should think a look at prtosis uh, and that's where we're going to provide a lot of value at the moment. Right on excellent. So you mentioned doctor compose and HELM So I think I can safely infer there that you're a big

01:17:36

favor of containerization. Yes. Is that specific or is that a result of like this is what you're you learned from your customers or from the beginning, where you just like containers all the way I mean, I would love to say it was entirely deliberate. Uh, and from like the very beginning. Definitely in the beginning it was like, hey, this is the standard that people are using to ship these things. I think it's a great abstraction.

01:18:05

I think containers. I think it's wonderful to embrace the world of containers. I think right now, container right now, When people think about containers, they often also think about the Docker run time, the Docker build process, and so like the of a container, the thing about not only like the idea of the container, but also like what does it look and feel like when you build everything inside of the Docker file, which is like separate from

01:18:32

having a container. So like what I love is the container abstraction. I don't so much love. It's very easy to bloat the Docker file. It's very easy to blow the container so that it's like significantly bigger than it really needs to be. There's a lot of stuff to improve there. But when it comes to like how you wrap things up, the ideal world is like

01:18:51

containers as lightweight as possible, wonderful. The only issue is it is pretty It is pretty annoying to get in and out of containers and deal with the like is this a port that is exposed inside of the container, or is it on my laptop or is it like like which port? Is it? At what level of networking? It's pretty annoying, and it can be. It can be also difficult to map between filesystems and different containers to your own file system. When you have mounts mapped or not, that stuff can be

01:19:21

tricky. A lot of the abstraction that we provide in our like run time, tries to make that stuff better when it comes to interacting with the def environment. But I think really the the reaction folks have against containers when it comes to environments, a lot of the time it comes from them being perceived as heavyweight, which is often true. If you're running a doctor it often is true. It's like there's a lot of there's a lot of stuff going

01:19:45

on there. But a lot of the time it doesn't have to be heavyweight. It's just that the way we build containers now tends to lean towards a more heavyweight approach. No, I think you nailed it there. Realistically, it's like this, there's no hobb load automation. When you make a code change, how do you get the necessary like auto compiling and run like you just want to sort of debug it as soon as possible. I think it's

01:20:11

a small like that's suset of things. And then there's this other set which you for sure mentioned like what if you're changing something at the container level right like to You're definitely going to make some changes and you're going to need to test that in some way. I don't think I know anyone that wrote a Docker file and it was right the first time or the second time or the

01:20:29

third time. I did learn about this thing that you can like now make a builder out of it, and that can really help reduce the size of your containers by not storing ninety percent of what's necessary there. Like, I still feel like that's really esoteric, and most people are not aware that the overcon container initiative supports that sort of reduction. Are you talking about like using like a multi stage Docker build where you run your build process and then store

01:20:55

that as an artifact and then launch a new Docker in it. It was actually in the door one thing, right for sure? In multi stage No, there's actually a in the darker file you can specify multiple images and one of the images you can say is this is my builder, and then have the tools in there, and then another image lower down that says, this is the actual source image I want, because there's just so many run time versus execution time problems that come up, where in the previous world we were

01:21:25

in a state where they're the same thing and it's really expensive, but now we're much better off. Like you can get rust containers down from gigabytes to really megabytes because the builder is big and expensive, but the actual built image is just pretty much a wrap binary. Yeah, that's the crazy thing,

01:21:44

especially in the area of like build optimization. There's so much low hanging fruit everywhere, but it it's all different low hanging fruit, and you have to, like you have to do some degree of looking into it to figure out that, oh, you can grab the thing and put it in here,

01:22:00

and all of a sudden, I forgive your buys the megabytes. But it's so common to hear that, oh, I brought my build time down from ninety seconds to nine seconds, or I brought it down from five minutes to one minute, and it's these are massive improvements and you build all the time, but they're right there in the city there, but there's no it's not often connected, right, Yeah. I think a lot of For a lot of people, there's just they've got they've done the build process and it took

01:22:29

a long time, and they just like, oh, I guess that's the way it is, not realizing that there are a lot of knobs you can tweak to change that. Yeah, And I think I mean, depending on where you are on the build process. If you're if you're on your laptop in your building and you want it to be like super fast, and if

01:22:46

it's two minutes, it's going to be a bit annoying. But if you're if you're merging your pr and you're waiting for it to go to a staging environment, like if it's if it's five minutes, you're probably okay with it. You're probably like, Okay, this is fine, and try to get it from five to two minutes. It's probably like not that big of a deal for you because you don't do it that often. So then like when would you ever want to optimize that? But across all the debts and all

01:23:11

the mergers, that's it adds up. It's a lot of time, but you could have gone faster for sure, well, sweet, is there anything else we should talk about? I feel like we kind of just ran the gauntlet here we did. So if folks are interested in learning more about Kurtosis, what where do y'all hang out at online discord? Well, first of all, I recommend going to our GitHub. The GitHub is the place for you enter. So like GitHub, it's got our code, it's got our

01:23:48

docks there, it's got the reading me. So if you want to learn my business, go right to the getthub. And then if you want to chat with us, we're always chatting with the community, So join our discord and we'll be there. You can DM us, you can chat with us in the general channel. It's fun right on cool. Yeah, and I've been chatting with y'all a little bit. Y'all are you really are super active and super responsive online, which is nice and it's refreshing to see. It's

01:24:18

the fun part. The fun part is chatting with the users. See, that's just so anti tech right there, Like, like I would love tech if it weren't for the damn users. I feel like there's the difference of you know, the outward appearance versus what you're thinking when they're actually saying things. And it's also the users that you're talking with too. You know, when you're talking with with people who are doing the same thing, you know,

01:24:49

whether they're software engineers or whatever. I think that's a different experience than trying to talk mod through getting her dial up modem connected and clearing out her Internet Explorer cash or whatever. Yeah, I mean, I think that's definitely difference between support right for something and real engagement with your customers. Yeah, for sure. You know, I usually come away from them like, oh, no, we should have that feature. I can't believe we didn't think

01:25:15

of it. I mean, one of the fun things about having a developer tool is it's buy engineers for engineers. There's a lot of empathetic connection across the the product user barrier, where maybe if we weren't the exact same profile, you'd have to you have to bridge the gap somehow. Well. You know, I've heard many times that that's what makes us successful startup is whenever you're solving a problem that you personally have. I totally agree with that.

01:25:47

Yeah. Cool, Well, Gayalen, thanks for being on the show man. This has been a really cool talk. Thanks for having me. Cool Warren thanks for joining me, welcome back, Glad to have you back, kind of show, glad to be here. And for all of the listeners, thank you for listening. Really appreciate it. Hope you found it useful. Oh right, yeah, yeah, picks, all right, you brought it up. Kick us off oring. I just you know, I know some listeners were going to be mad if you know they didn't hear on the

01:26:17

report. Yeah, So mine this week is Turn the Ship Around by David Marque, which is a book that, irrelevant of your experience and technology or why you're focused on either core engineering or somewhere else, can really help take your understanding of how to work effectively to the next level. It's really about leadership in every way, starting from the bottom all the way up, and

01:26:48

how to think about it. And I think it's really it's really interesting to read, and if you feel like you can't commit to doing that, you can find ten minutes to watch the shortened video online. It's something of greatness. I can never win the name of the title. It's a YouTube video. You can find it, which is it's a hilarious and it's really good content right on. Awesome Galen putting on the spot. What'd you bring for a pick my pig? It can be it can be a music Oh for

01:27:18

sure. Yeah, so I have been very into jelly Roll lately. I don't know if you guys know jelly roll country singer. Oh yeah, yeah, yeah, he's he's uh the country post Malone. Yes, yes, that's hilarious. I've never heard it before, but as soon as you said it, I knew that you were thinking of the right person. Yes, Country post Malone. Jelly Roll is amazing. I've just been It's rare that I find an artist. And then I just listened to a dozen of his

01:27:49

songs on a regular basis. He hit me with, hit me with all of them. So if you haven't checked out jelly Roll, if you're into country, if you're into hip hop, it's a really nice combination of both. So for sure, Yeah, I last year I stumbled across hardy and which hardy is like country meets heavy metal or or hard rock. And then at some point he did a collaboration with jelly Roll, which took me down

01:28:17

that path. Yeah, so I'll agree with you on that one. That's pretty cool and it's you know, it's definitely not what you would expect from country music. It's it's quite a bit different. I'm not sure Hank william Senior Senior would approve of it, but here we are cool. So I brought two picks this week. The first one is going to totally destroy my nerd credibility because I have never read the book Dune or seen the movie Dune. So I am starting to read the book Dune, and dude, maybe

01:29:02

one of the best books I've ever read. I mean, the things that that Herbert does in the book is just unmatched by other stories realistically, like that you can tell us by the number of new concepts and things that were just made up. It's not like a world that just easily imaginable, right, And I'm looking forward to it and I've I've heard from multiple people. You know, it's a common saying, oh, the book's better than the movie, but I've heard from from multiple people like, no, No,

01:29:29

the book is better than the movie on many, many levels. So I'm looking forward, looking forward to getting into that. The other pick I'm going to bring up is I kind of just lost I kind of fell out of love with Chrome as my browser, and I wanted to see what else was out there, Chrome, It's not you, it's me. I just felt like to see other browsers across the ARC Browser, and I think I completely

01:30:00

locked up here there you I did here may be back. I got completely kicked out for talking trash about Chrome, didn't I. I'm going to go through with it anyway. The ARC Browser has been really cool. I've been using it for about a week now, and there's so many cool, subtle little features that I didn't realize belonged in a browser. Like my favorite one is when you're in a Google meet, you know, and you navigate off

01:30:41

to another tab, and then somebody asked you a question. You can never find which tab is the one for meat, right, So then you got this awkward silence and you gotta admit that you were screwing off and not paying attention to the meeting. Well, with the ARC Browser, when you leave the tab, it just puts a little thumbnail in the corner of your screw with that video meet still going. So when you never have to go back to it, it's just down there in the corner, you click into it

01:31:06

and it pops you right back up. And I just thought that was super cool. So that's my second pick for the week, and with that, I think we're done here. Galen. Thanks again, Ben, this has been pretty cool. I've enjoyed it. I hope the listeners enjoyed it. Thank you for listening. We appreciate your support, and yeah, I hope see you all next week.

Transcript source: Provided by creator in RSS feed: download file

Docker and Container Optimization: Strategies for Efficient DevOps - DevOps 205

Episode description

Transcript