Argo CD vs the world | Argo Unpacked Ep. #24 | Argo Unpacked podcast

⁠¶ Intro / Opening

00:00

Why should you never eat a helm sandwich? What happened at KubeCon Amsterdam and Argo CD versus the world? Today we tackle community questions, all of that coming up on Argo Unpack. Welcome to the show, everybody. Hey, get your comments in, follow along. Uh today I'm kind of in party, but I'll take off the sunglasses because I feel like it's a little much, but I got my octopus hat on.

⁠¶ Understanding the Helm Sandwich Anti-Pattern

00:27

We're rocking and rolling. So the first thing that we got going here is the helm sandwich. You should never eat a helm sandwich. Now, what is a helm sandwich and what does it have to do with Argo? That's all part of Argo's tip of the week. Okay, so for art for the for the helm sandwich. Sorry, just didn't get along with that intro here. For the helm sandwich, let's talk about.

00:59

This is a this is a tip. This is part of the anti patterns that we've seen from a bunch of people in the community. You should never eat a Helm sandwich. Because a helm sandwich makes life difficult for all of the folks using Argo. So what is a Helm sandwich? Well, normally in Argo CD, you have your application manifest. This is where you have maybe a Helm chart values file. Maybe you've got customized, maybe you've got plain manifest, whatever.

01:26

Then you've got your application definition, which is also a manifest, but I'm going to say definition. Your application definition is inside of a Kubernetes manifest that is of a custom resource definition type. that is Argo application. Okay. Now the Helm sandwich is we need You use Not just Helm for your manifest layer. That's normal. Everybody does that. It's when you use Helm to generate your applications. Definition.

01:58

Now doing that becomes the Helm Sandwich because you got Helm at the bottom layer of your application manifest. Then you've got your definition of your application. And then above that, you have a Helm chart that's creating that definition. Therefore, Helm Sandwich. But don't eat it. You don't want to have a Helm sandwich because it makes life difficult.

02:19

Yeah. The biggest issue with doing this is, first of all, from a user perspective, if I want to change a value in my application, where do I go? I should always know that the answer is I just go to Git, I look at where the, you know, the Helm chart, the values file are, I make my changes there, my customization, whatever it is, I make my changes always in the same place. where the source of the application is in Git.

02:48

But if you use a Helm sandwich, almost everyone that follows this pattern is using that top layer Helm sandwich to generate parameter overrides into their application manifest. And now. When I want to know where I should make a change, I've got the manifest in the repo. Then I've got the application definition that's taking these parameter overrides in.

03:11

Well, how do I change those? Oh, I have to change the helm chart above that, so I have to go up a layer. So now you instead of just having a single layer of definition of just look in git in the same source of truth you now have three different places that you would need to look to figure out where that value is how it's defined how it's passed that creates a lot of complications

⁠¶ KubeCon Reflections and Guest Insights

03:31

So that's the biggest reason why you should never eat a helm sandwich. Okay. So there's your Argo tip of the week. We may have some additional commentary on that. Like I said, we've got. We're gonna talk about what happened at KubeCon and ArgoCon Amsterdam last week. You can tell I'm kinda hung over. I'm about a hat on. I've never done that on the show before, but it but it for people that are listening.

03:54

I do have a hat and uh and that is a sure sign that things have gone, you know, very tired uh and we're just surviving here. So um as we go through the show, please make your comments as as always. Appreciate the compliment on the shades, Laurent, for those that are watching live. And of course, if you're listening on the podcast later, don't worry. Uh, while we love to have you live on the show, we're gonna cover all this stuff in a nice audio-friendly format as well.

04:20

Now, uh, let me bring in my collaborators for today's episode. The first one is a regular. His name is Costas Capolones, and he is the author of the aforementioned Argo CD anti-patterns uh got a great book that's what's the new book Coast is called that we're gonna win Argus did the right way, not the wrong way, the right way.

04:42

IOCD the right way, yes. And uh which I think also includes some mentions of the helm sandwich. You and I were talking about this last week. Um, and then I'm also gonna bring another famous author, Steve Fenton. Welcome, Steve, to the show. You just published uh a new report, which is the the GitOps report, essentially. What's the title of it? Uh so yeah, we have the state of GitOps report, but we also pulled out some special data from that, especially for uh ArgoCon. So yeah.

05:15

Ah, perfect. So we'll get into those. And later in the show today, we'll also cover uh community questions. So we've tackled some from Reddit. And if you have your own, bring us your hardest Argo problem. And we would love to tackle it. Um, we're gonna excuse Ruby Tal for now. Uh if she's able to join, we'll add her into the mix. Uh appreciate my uh Erstwhile co co-host who is uh who is often uh often absent, but does a lot of work to make the show happen behind the scenes.

05:46

So um first of all, gentlemen, any comments on the Helm sandwich, anything that we should add? So yeah, one thing that I wanted to add to this, and I say that I'm gonna sound really intelligent here, but actually I'm basically uh basing this on the fact that I read the book that Costus wrote. And so I'm gonna use that now to try and sound really smart. That's how you sound smart, is you read the book that Costas wrote. That's half the battle right there.

06:10

It's the way. And the way I would describe this is that helm should be the sausage, not the bread, right? That's the key, the key thing to bear in mind for this. So it should be more of an open faced sandwich, potential, you know, a po boy, if you will. Uh I also have a comment that as you said, maybe you know it doesn't look bad on its own, but it's the start of several anti patterns. So if you have a Helm sandwich, usually you have the other anti patterns on the list. So take care about.

06:43

Yeah, it's kind of it's like a gateway uh anti-pattern. It's like if you're doing this, almost I don't know if I've ever seen the helm sandwich in use when it wasn't being used to enable anti-pattern. Uh so it is like you're right, like not a problem in and of itself. Like, hey, if you just wanted to find applications using uh Helm chart, I mean, I guess that's fine. But like the only reason to do that is because you want a template value.

07:13

And if you're templating values, you're doing parameter overrides. And you probably shouldn't because those values are still coming from Git or some other source. So why are you doing that? Like parameter overrides, um, maybe a little controversial of a topic in Argo CD, but are considered an anti-pattern. Of course.

07:32

If you're using something like an application set and you're using like a cluster generator, you may have a parameter override that's part of that definition, but that's kind of a different domain of information versus. Just random environmental variables you might want to pass to an application or something.

07:50

Also, some people try to version their applications. So they hover the Helm chart on top in order to put a version on the application set. But the application set is not something you actually deploy. It just generates applications for you. So if you're trying to version it... Something happens there. Yeah, it's always a red flag. Anytime I see an application that is not pointing at head, it's like, ooh.

08:13

why did we why did we do that what what's going on here because you get into this weird scenario where you're like hey what's deployed and like oh well i do git op so i just look at git and i know based on my repository exactly what's deployed except that your application is pointing at different very various random revisions. So you actually have no clue what's deployed unless you look at the application manifest or you look at Argo CD itself.

08:39

And um the whole point of GitOps in my mind, I mean whole point, but one of the major advantages of it is that all of the information aligns all the way up and down. So if I look in Git, if I look in Argo, if I look anywhere, it's all going to be the same information. I'm never going to have surprises in that stack. And that simplifies my life from an operation standpoint. Mm.

⁠¶ KubeCon and ArgoCon Event Takeaways

09:00

So I think we've probably covered Helm Sandwich and I don't see any uh questions about it from the from the audience here. Um before we jump into the Argo uh sorry, the the Argo versus the world. Which uh Steve was your essentially your uh keynote that you gave at ArgoCon. Um any takeaways from ArgoCon uh or KubeCon that we should call out for people? We were all there in Amsterdam just last week.

09:30

Uh, as you can tell by my party hat and party shirt. I've just come from Amsterdam. But um what uh any any observations that you guys came out with? And maybe Steve, we could start with you. Yeah, so I was talking about this book quite a lot. Um, uh I'll go see the the right way, just in case you didn't catch it and you need to go and find it. Um, and the funny thing was, um, I would be speaking to folks around the conference, um, and

09:55

I would mention one of the anti patterns from the book. I'd be like, Oh, you'll find things in here like and I'd explain one of the patterns that I enjoyed, like the not being able to just run uh the containers locally without having Argo locally. That was one that resonated with me. Um and the number of people who went, oh.

10:13

We do that. And it's like, ah, you need the book. You need the book, my friends. So it's it's funny how common these patterns are that just random conversations I was having, people were kind of sheepishly saying, Oh yeah, I think we're doing that one. Yeah. We'll try to throw that link in the chat because the book is free. Uh Patrick Close, I talked to him this morning, who was another Argo CD maintainer from the Octopus team, and he spent a lot of time at the octopus uh at the Argo booth.

10:43

uh at KubeCon and he told me that he had somebody come over and argue with him for about 45 minutes about why they shouldn't need to separate their application source code from its manifest. that those should be in the same repo. They thought they should always be in the same repo because it's easier for developers. And he says I argued with them for 45 minutes. And he said then Costas came over and then Costus got drug into it for 45 minutes with this guy who was convinced

11:10

that the best way to do things was always to keep the manifests uh next to the application source code, which of course of course is an anti-pattern. Um do you remember that conversation, Costis? Yeah, I do. It's also another anti pattern because they wanted to keep developers happy and have a single repo for everything. And I said, How often does this happen? Like how often do you need to change the source code and add a brand new variable in your application? I mean, sure it happens sometimes.

11:38

But it's not happening all the time. So trying to optimize, you know, something only for an edge case scenario doesn't make any sense. Just just put a link in the repo that points to where the applications deployed. You know, just put a thing that's like link to environments and like here's here you go and link link to the app in Argo CD and link to the app in Git. So if they need to go update the manifest, it's right there. I mean it shouldn't be

12:01

It it also means that their CI system was also, you know, monitoring commits that happened on manifests and also Argo C D was monitoring commits that happened for source code. And each system doesn't really care about these things. They should only care about their own life cycle. What do you think the odds are that somebody who makes this argument is using manifest generate paths? on their applications. Cause I feel like it's low.

12:25

I think most people did not use this path annotation like they just found about it. So it's the worst of both worlds. You have a monorepo and source code there and not the annotation. So for those not following it, what Costus is talking about here is that every time your git repo is updated by default.

12:43

uh Argo application will have to refresh and that takes some processing. It checks out the repo from Git. It looks at the live manifest, it generates the manifest, it runs a comparison to find often that there's no change. So there's this cool setting in Argo CD applications. Bonus tip. which is uh manifest generate paths. And what it does is when a git repo is updated, it first checks to see what paths have been updated and if anything was generated inside of that specific path.

13:14

Then the application goes to the reconciliation route. It's one of the one of the first things you do when you start optimizing performance. uh of Argo CD is you set this manifest generate path so you don't get these noisy uh resyncs that don't have any refreshes that have any no they don't have any purpose. Um, I I thought that uh KubeCon Amsterdam was quite well attended, a lot of good energy, tons of people doing Kubernetes, Argo as always. Um

13:40

One of the things that surprises me every time when I did the opening keynote for ArgoCon, I asked, how many people are here? First ArgoCon? And every ArgoCon, the answer is about the same. It's about 80% of the people are here for the first Argo. Uh, so you'd you kind of would think that, oh, I'm gonna go to Argon and every year we're gonna have a way more advanced conversation. And certainly there are topics that are new, uh, you know, um

14:07

application promotion has been a very hot topic for the last couple of Argocons. This time There was a lot more discussion around AI and uh how people are integrating Argo into that stack. So it's definitely worth going to multiple of them, but there's always a lot of conversations like these where we're telling people. Hey, how are you deploying your applications? Oh, don't do that. And uh

14:29

But but that there there is a difference because many, many years ago you got questions like what is Argo? Tell me about Argo and hopefully now we do not get these questions anymore. Uh we do still get those questions at the booth because there's analysts or whatever that are coming around that don't, you know that

14:46

Um, but if you know, you know. So uh but I thought you know it was quite quite good. The next uh KubeCon, depending on how we count'em, I guess. Uh we've got KubeCon India coming up shortly. We've got KubeCon Japan coming up in uh end of July, which will be the first ArgoCon outside of North America or Europe. So it's going to be the first Asian ArgoCon. So quite excited about that. And then of course we have KubeCon coming up in Salt Lake City.

15:18

So just a reminder for everybody listening, if you haven't submitted a talk or you have an idea for a talk, submit that. And if you want to brainstorm ideas, feel free to hit me up on Twitter at today was awesome. Always happy to talk with people about their ideas for talks and uh I think Costa is Steve, were you also on the program committee for ArgoCon this year? No, it wasn't.

15:38

You're the only okay. Cosas and I were both on it, but uh uh Steve, we did you did give the uh the keynote for octopus deploy, um, which was very cool. So Yeah, if you if you're not making it to these conferences, they're a lot of fun. And uh there was also a great session, none of us. can fully represent but we had a really cool um session that was the contributors

16:00

Summit, what was it? Contributors summit, contrib fest summit for Argo, where we had six different Argo CD maintainers. Patrick Loss was representing the group because Reggie, who is by the way, the number one Argo CD maintainer, and she works for Octopus. Uh good job, Reggie. Uh she wasn't able to travel due to uh ongoing regional conflict and the limitation of flights, but Patrick Lose represented and um that was what happened in this session.

16:28

was essentially it was uh I want to say it was almost two hours, but it was basically everybody just talking about their implementations, their architectures, and it was this audience, this environment where you could talk freely because normally Oh, I can't get approved. I work for Big Bank McGee or Defense Company or whatever. I can't go give a talk about what we're doing because I can't get approvals. But

16:54

If you're having a hallway track discussion, yeah, maybe you can share a little bit more about how what you're doing works and share ideas and tips. So that was uh like a fully concentrated hallway track. So uh I don't know if anybody from the audience made it there, but uh I heard quite good feedback from people who really, really like it.

17:16

Um any other uh oh and I see that we've have posted in the chat uh the Argo CD the right way book, the link, but I think it didn't make it into uh to LinkedIn because LinkedIn chat operates. in its own biosphere or something. Anyway.

⁠¶ AI Security and Container Breakouts

17:35

Um, okay, cool. Any other uh tips or or takeaways from ArgoCon y'all wanna point out? It could just be about AI. AI, AI, AI. That ability to actually speak to people in person and get groups of people having a conversation. Like even though when we're not at a conference, we can kind of join community calls and stuff like that. But being able to explain something with the entirety of your body, with actions and movements and stuff.

18:01

Like it's just the bandwidth of communication goes up so much. Like I could see it wasn't just me doing it. I could see other people kind of like dancing the communication in all of the conversations they were having. Um, and I love that at these conferences. I I had a very fun conversation uh with a guy named Dennis. uh that was all about he first of all he came to the Argo booth

18:27

because uh he wanted to talk about his Argo monorepo setup. And I think we're gonna have him on the show in the future because he has an interesting use case where they have a big

18:37

Uh he he's supporting dozens of different customers and they need to have access to their manifests. So what he's done is he's created a single they they don't write to those manifests, they just need to see them. So he's created a single monorepo and then he's created uh additional child ro repos that and he has an automated process that just dumps out the contents from the monorepo into those child repos.

19:00

That are then shared with those different individual customers. So that's the way he creates his separation. But what we what we spent most of the time talking about is how we're using AI. And there is this rate. uh just trying to get to the point where you can have agentic AI running regularly and it's chatting with you and you can do it in a secure way. A a big and uh this was another conversation I had with uh Joseph Sandoval, which is a kind of an eye opener for me.

19:30

I am running an AI uh orchestrator agent system called Symposium that on my Kubernetes cluster, which I think we're gonna have the creator of that on the show too later. And uh I thought, hey, well, I'm gonna have all these agents running around, but they're gonna be on my cluster, they're in containers, so they're all localized, it's you know, secure, it's sandboxed. And Joseph Sanoval was like. You're out of your mind.

19:56

Uh AI is doing container breakout all the time because the security model for containers in Kubernetes was none of it was designed with the idea that you would have a brain sitting inside of a container trying to pick the lock on its way out. And so there's a huge effort to go on there. Kubernetes sandbox agent uh is is now out as a kind of early uh platform that's meant to be able to create better isolation. Um, clear case containers is kind of

20:25

maybe gonna become a solution there. So there's a whole kind of frontier of containerized isolated agents and how that will work. And uh even in AI testing, I mean anthropic had a story. just uh a week a week and a half ago where they shared that during their training set they had uh one of their AI agents actually broke out of it.

20:49

sandbox. I don't know if it was a container or a VM or what, but it then went and found the GPUs that were running the AI and it took them over to start mining Bitcoin to fund some plan that it had. And uh that's a that's a pretty wacky story, uh, but you have to think about this kind of breakout uh Skynet! Skynet! Yeah, Skynet, yeah.

21:13

I'm imagining now this AI is basically thinks it's been given like a really cool escape room. Um and so it's just it's just wants to break out, it's trying to solve the puzzles and break out and then for some reason mine Bitcoin and then I don't know, maybe it will start paying us wages if it can earn enough money through Bitcoin. Oh, I think for sure there was a step two where it wanted to spend the Bitcoin

21:35

to start accomplishing other objectives. And I think it had been given some prompt like it was like go make money or something or it was like optimized for business. And it was like, Oh, I better go start a business. I'm gonna need to pay some fees and hire some people and get some servers and You know, I gotta get I'll get I don't have any way to get US money. Uh let's get Bitcoin, I can use that.

21:56

Um so yeah, I mean it is quite good at uh solving problems uh and being a hacker. Um that was one other piece of news. two and and feel free gents I'm talking a lot here but uh the Claude has delayed the release of their new model, their newest model, because it is too good at hacking. They said it is it is a security they consider it a security risk to put the model out because it is too good at breaking into systems and they need to put some controls on it to prevent it from Yeah.

22:33

in Fort Knox or whatever, uh that's that's quite a problem. That's quite a scary thing that we uh we seem to be like uh that people are talking about harness engineering and stuff at the moment, but it's a little bit like we can't keep hold of the reins of this stuff. So we we definitely need to get better leather for the straps around this stuff. Well, and the the only way to fix that from a security perspective is you kind of need that.

22:59

You're like, I need the better hacker bot to go and help me secure my systems. But if you give me the hacker bot before I've secured my systems and everybody else. Well, then everybody else isn't gonna be running secure systems. So there's a chicken and egg problem where uh yeah, it's moving it's moving quite quickly. And if they wait too long, it's not like you know, another model provider won't come out with better hacker bot too, right? So There's a race here. Um

⁠¶ ArgoCon Talks and GitOps Report Overview

23:27

Shortly. Uh any other call outs, experiences from KubeCon y'all want to call out before we jump into Argo C D versus the Uh they should watch the presentations. You should watch the presentations, they're on YouTube right now for ArgoCon. I had a presentation as well about uh GitHub secrets. My friend Anastasia had a presentation about using Argo or Lounge with database migrations. So go and look at the playlist, see what topics are interesting and watch the presentations.

23:52

Yeah, the uh the playlist is up as of Six days ago? That can't be right. Yeah, no, it was up the same day. I'll post the link to the playlist into the chat here. Good call, Costus. I'm still catching up on all of the videos. I uh I started like queuing them all up and I think I've got through about four, four hours worth of videos. And I think I've got about like sixteen months left to go. There's so much content.

24:26

Uh Kostis gave a great talk on the on GitOps and secrets and the state of State of the Union. It was a quite a good. Um there was a really cool talk about BY BYD using uh Get uh Argo workflows to do all kinds of machine learning stuff, um, which sounded really interesting. Um There was one all about doing FinOps uh with with all the Argo tooling preview environments. Dag Anderson was talking about diffing and preview environments on pull requests.

25:04

Uh, which was a great talk and kind of a continuation on something that he's been building on for a long time. Um There was one other talk that was eliminating ten thousand dollar Argo CD mistake. that was all about eliminating Phantom Sync, which kind of alludes to some of the stuff that we were talking about earlier. So yeah, lots of

25:24

Lots of good uh lots of good stuff in there. So go grab uh the the link that we threw out there um just now and it's all on YouTube. Um Mario is asking, will there be discounts on certifications at Argo CD this year. I think he maybe means ArgoCon. There were actually we don't like throw them out, but if you come over to the booth and you're like, hey, I'd love to get a discount on Argo certification, uh, the GitOff certification from Octopus.

25:55

Uh we were giving out some discount codes there. So definitely a good opportunity to snag that. I don't know if we're gonna do any on the show, but uh I don't know, Mario. Keep asking. Um All right. So let's dig into today's topic. Steve, Argo CD versus the world. Why has it gotta be verses? Why can't it be a with? Ha ha ha. So uh obviously, I mean, we want Argo C D to take over the world to some extent, I suppose.

26:25

uh but only to make things better for everyone. Um uh this was uh I had a challenge because the the lightning keynote that I did uh had to be five minutes long. And I thought, what can I get into in five minutes? Um, with an audience filled with folks from Argo. Um and so I went and grabbed out the data that we got for the state of GitOps report. There were I think six hundred and sixty um responses that we had to the survey. Um and one of the things we asked was what tools are you using?

26:57

So I thought what happens if I pull out Argo C D um and compare it to all of the other responses? So just pull out Argo and then compare it to everything else. I thought that'd be the fairest way of doing it rather than picking on any one particular thing that was out there. Um and uh and yeah, so that that was the idea behind it, just to look compare Argo CD across a whole bunch of different outcome metrics that I collect. Um versus everything else.

27:24

Uh it's very cool survey and um I've been able to review a lot of that data, Steve. And we obviously we've talked about it before, but why don't you start with I one of your most interesting findings? So from the state of GitOps report, um Uh yeah, or or from from your uh from your keynote. Yeah, yeah.

27:41

So I I think from the state of GitOps report overall, um, what really struck me was um because we were testing the open GitOps principles Um, and quite often when you have like a bunch of things that you should do and then a bunch of things that you hope to get. There's like a kind of tipping point where you kind of get halfway through and you start realizing the benefits. And then the other things are like the little marginal gains on it.

28:07

Um, but with GitOps, it's really stark. The people that are doing all of those um open GitOps principles get way more. The tipping point is when you're doing the whole of GitOps. And so that was like a big thing for me because I often preach about um capability versus maturity models, capability models like you keep adding more of the practices and you keep getting those gains. But with GitOps, you just have to

28:31

The folks that did open get ops obviously knew what they were talking about. You have to do all of those things, otherwise you don't get the the cool benefits. So that was the the big take out um for me.

⁠¶ GitOps Score and Deployment Frequency

28:43

Bringing it, you're bringing a tear to my eye to say that when we made the Gauss principles, that we knew what we were talking about. It's a high compliment. Thank you. Ha ha. Yeah, well we kind of like we can statistically prove that you did. That's um that's like one of the crucial findings. Um, and so we created what we call the GitOps score. Which is just a number between zero and a hundred that says

29:08

How much are you following the open getOps principles? Um, and um that was one of the things that I presented was um if you're using Argo C D. Um you are the median score for everyone using Argo C D against GitOps was like seventy one and versus everyone else who was So it's like an eleven point um bonus from using Argo C D in terms of that GitOps score and that is how well you are um meeting all of those open get ops principles. So that's pretty cool.

29:42

But that that eleven points has a pretty big impact on like how effective you are with your Dorometrics. Yeah. So um I'm a really big fan of the Dora metrics because instead of it just being like how fast can we go, it's like how quickly can we turn corners and do the brakes work, which are also important things if you're trying to move at speed.

30:04

Um, so we kind of broke down um and looked at Argo C D on these as well. Um, and we can kind of tackle them one by one. Uh there's only four of these, so it's not too many. Probably worth introducing um the the kind of metrics themselves as we go. So I'm gonna zip through here. That's the GitOps scores chart, which just shows that um that kind of median difference. I use violin charts for this because um

30:33

You can kind of get a feel for where all of the data is. So it's quite a rich chart. So uh That are just listening, what this is showing is essentially that the GitOps score, so how closely you're following GitOps principles. according to this rubric that uh Steve and the rest of the theme of doc puts to play to put together. If you're using Argo CD, there's a significant bump where you are much more adherent to GitOps than if you're not using Argo CD. Yeah, exactly. And um and the data kind of

31:01

Um, you always have these things called distributions in data. Um, and they can be like normal, which is a bell curve, um, or they can kind of like lean one way or another. And this one leans towards the hundred percent mark on Argo. Um, so there is more data um to kind of like above that median number, which is also a good sign with your data. Yeah, and almost almost nobody that's not using Argo C D.

31:28

Had 100% on their GitOps score. It's really only the Arco CD users. There's like this really fat. kind of leadoff from the median of people getting more towards GitOps, whereas in the non-argo CD, it gets real skinny real quick. Yeah, tails off, which which shows that there's some limiting factor in the other tools that's stopping them from getting all the way to a hundred.

31:53

Um, and and that's kind of that's interesting stuff. It's like uh it's like a strong signal that some tools must be missing one of the crucial pieces, you know. Uh I think Uh I suppose from my experience, a lot of people are using a tool that maybe can do infrastructure automation. Um, but it's not got that reconciliation loop, which is so crucial to GitOps, where if anyone goes in and clicks their way around and changes something, it's not being put back.

32:22

So you're gonna get drift over time. Um, and you know, you might be able to go in and reapply something, but like if you're using Terraform, if someone changed something that isn't in your Terraform script. Um it doesn't kind of get put back. It's only the stuff that you've defined that gets put back. So you'll kind of find gaps that end up causing drift. Um, so yeah, that's like a that's a uh interesting problem. So let's have a quick look at um deployment frequency next.

32:51

To just to define this, this is quite this is probably the easiest one. It's how often you deploy to production. Um and uh that effectively uh in terms of uh Argo. um is another win for Argo. So the the data's a bit more lumpy around this one, but the median for Argo is fifty. And the median for everyone else is thirty seven, which is quite a lot lower.

33:17

Um, so so yeah, again, Argo in this case is very much the winner in terms of deployment frequency. So if you're using Argo CD, the the probability is you are deploying more often than someone who's using something else. It's quite a big differential too. I mean, it's the median being thirty seven versus fifty.

33:41

That's a pretty big jump. And then again, if we're looking at the shape of this, there is a very fat head towards people that deploy very frequently being much more associated with people that are adopting more aggressively the GitOps principle.

33:57

Yeah, exactly. So so it's it's another another case where Argo is doing something which is helping you deploy often. And for me, this is such an important measure because If you're deploying a lot, it means that you're keeping your batch sizes small in your application team. Um and if you keep your batches small, that means you're accumulating less risk, you're having less integration problems. So it's a it's a good place to be, to be able to deploy frequently.

34:25

So declaring Argo the winner on that one. أم

⁠¶ Lead Time for Changes and Change Failure Rate

34:29

This next one often confuses people because if you're from the agile or lean community, you already know what a lead time is. It's like the customer decides they need something valuable all the way through to that thing being put in their hand and and then getting that value. Um, but this metric from Dora is called lead time for changes. And the way to think about it is you're the developer, you're the customer, and the value you want is feedback on the change you've made.

34:56

So this is measured from when you commit the change to your application to when it gets put in front of a user and they can use it. So when you kind of deploy that into production. Um, and short lead times are typically a good thing, especially if you need to fix a critical bug. You really want to have short lead times at that point. Um and In this case.

35:19

Uh we have an interesting kind of wobbly chart on this one, but it's quite dramatic um because uh Argo C D's median is seventy five compared to the rest of the world with a median of fifty. And again, it's got that trend of there being more data on the right hand side of the chart, which is where the good stuff is. Um, yeah.

35:41

Yeah, that lead time and I I think here, right, lead time to changes smaller is better, obviously. And so, but you've placed smaller to the right of the graph because it's the better. Um and again this huge fat head for Argo C D. This median this this is a this is a fifty percent difference going from fifty to seventy five. It's huge increase. Uh or it's fifty percent. Increase in performance is a decrease in lead time, yeah.

36:10

Yeah, exactly. And we always I've I've arranged all of the charts so that you get basically get more points for doing the right thing. Um you could kind of like do it so that the charts represent the concept, but then it gets confusing when you're like, which which side is the side I want to be on? So on these ones, you always want to kind of move to the right on these scores. Um so so yeah, so this is another case where Argo C D's very much a winner. Um Which leads us to change failure rates.

36:39

Um and change failure rate is uh I suppose A lot of teams that I've been in in the past, change failure rate was what was really eating them because they would deploy the latest version of something. Um and either the deployment process itself or the version of the application um weren't good. And so it causes this big blow up where you've just gone into production and then there's people

37:06

getting stressed out and you have to recover super quickly. Um, and so change failure rate can kind of cause this loss of trust. People think that you don't know what you're doing. And then they're like, We need a clipboard, we need someone to sign things, we need to slow this down, and lots of bad things happen as a result of it.

37:23

So this is one that we want to score really high on because it's not just the application knock-on effect. It's also like the reputational damage to your team and your organization. Stay. Now on this one, we have a very interesting chart because almost all of the data in both cases is over on the right hand side. It's like super high scores for everyone. And in fact,

37:48

The median score for both Argo C D and the rest of the world is ninety. And I found this quite heartwarming because that means that everyone has um done really well solving the change failure rate problem. Um, I've worked in organizations where it would have been nowhere near ninety. Um and you know, there are some organizations on here that have scored zero. That means that every time they do a deployment, something goes wrong. That's a really bad place to be.

38:15

But most people are in that zone where it's very rare for there to be a problem when they go live. Um There's there's this weird little Fat lower performer uh for people who use Zargo CD where their change failure rate is a little bit higher at the low end of the GitOps adoption store. So people that are around 50 points. Yeah. People are more likely to experience failures using Nargo C D is quite a counterintuitive.

38:45

I think we are missing the frequency, Dan, because they adopted Argo CD, so they are deploying more often, and maybe they have more failures. If I deploy... So it takes into account free. Yeah, what you'll probably find is they might not have all of the software disciplines that are needed to be able to increase their deployment frequency without introducing a bug. So as they uh like whenever you speed up, you realize that you need maybe more automated tests.

39:18

like other things as part of your process to make sure that the application version itself is good. Cause what could be happening here is that the deployments don't fail. But the application gets released with a bug in it, you know, or a new feature goes out and the new feature's got like a critical issue. Maybe they didn't performance test it and it kind of when real users start hitting it, it turns out to be a real performance bottleneck.

39:41

So you can kind of have these effects that are caused by the application. So so yeah, it's worth bearing in mind this this could be the deployment process or it could be a problem with the application. Maybe they haven't got the um all of the technical practices that are needed to develop and deploy applications a little bit faster.

39:59

I suspect there's also a little bit of change in definition here where when people are looking at change failure, right? If you're using Argo CD, you might think of sync failure as a change failure. And sync failure does not mean downtime. It just means, hey, there was some issue trying to apply these manifests. Uh, maybe they weren't uh linted correctly, or maybe you have an incorrect value, or you're trying to change a

40:26

uh a an immutable field in a Kubernetes manifest or something like that. And that would show up as a sync failure that you then would have to deal with. Whereas if you're not using Argo CD, you wouldn't necessarily see those failures. You just apply the manifest, you know, fail fast potentially on a CI pipeline. And Like, hey, the application still works, the change still works, it's just there's an issue with how they're being applied. So I suspect that some of what's going on here is just

40:52

people actually seeing more of the issues that they have with their manifests that they otherwise would have been blind to. And again, from a resiliency standpoint. They're not experiencing downtime because of these failures. They're just seeing that there's some kind of conflict in the way they're trying to apply their.

41:12

Yeah, and you could be doing something cool with rollbacks where you've automatic automatically detected a problem and just uh for safety gone back to the previous version so you can investigate. Um, all of these things can kind of contribute to that change failure rate. Um, but

41:26

The the scores are super high, like even in these cases. So, in general, people are doing very well with this. Um, and I'd say We're probably at the stage where if your change failure rate is bad, say it's more than five percent, um, you definitely need to look at your technical practice.

41:43

Um, and Dora has a whole model around this of suggestions of things that you can do to improve that, um, like the non-failure rate, the kind of success rate. Um, so it's worth kind of going and digging into those. So this chart actually results in an inconclusive uh result, but that's because everyone is doing super well on this particular metric. So that's actually nothing to be sad about, even even though Like it would be nice to sit here and go, it's another win for Argo. Yeah.

42:15

for everyone. Everyone's time for the We all win on this one. It's great. Yeah.

⁠¶ Time to Restore and Overall GitOps Impact

42:21

Um, the last one to look at is time to restore. So this is what happens after you've had a a failed change. How long does it take you to get back into a good state? Um, and this one's super important because um we used to think that not having any problems was the key to making our systems users happy. But it turns out that they would much rather have a slightly more frequent but very low impact failure, like something that's fixed in a couple of minutes.

42:53

Um uh uh you know, if you can effectively get a coffee, hit refresh, and things are working again, you you're generally quite satisfied with your experience. Right. Obviously, you don't want that happening multiple times a day, but if there is an outage and it gets fixed super quick, it's less painful for you. Um and on this one, um, what we're seeing is Argo C D performs very well. Again, it's got a median score of uh seventy five compared to the rest of the world, which is fifty.

43:23

Um, and you can also see that the rest of the world kind of has a bit of a lumpy left, which means that there's quite a lot of low performers. And when you look at that low performance zone for Argo C D, it's very small. So the shape of the data is almost flawed. for these two data sets, which is super interesting. This is huge. I mean, again, a 50% improvement in time to restore when using Argo CD versus without.

43:50

And we do see there's a uh high performance sweet spot at around 75 points for people using Argo CD, whereas The other chart is just a way more lower performers. There's a big f fat bottom. And talking about a chart here um that that is quite funny to look at that's that's that doesn't surprise me at all i mean if you're experiencing a failure and you don't have

44:17

Idepotency and versioning and all this stuff. It's like, oh, oh, what went wrong? It's like, I don't know. Like, how do we fix it? Like call Dave and uh hope that he's available because he's the he's the one guy that knows how to fix it. Whereas That's a crucial insight actually. Like the pinpointing time can often be longer than what the time to actually resolve the problem. Like it's like what actually went wrong is a hard question to answer.

44:43

Um, and with Argo, everything is there in version control. So you can see precisely what happened and when, who did it, you know exactly who to speak to if you need to speak to someone. So maybe the difference here could be that pinpointing is super quick. Um and then the rest of the time, that's the kind of complex variability because if you've introduced a problem to your application, maybe that's kind of harder to fix, or if it's a performance problem, it might be harder to fix.

45:10

Um, so that part uh can be more volatile and variable. But if you can pinpoint quickly, um, you can reassure people as well. Like I've always found when we've had an incident. If I can actually tell um the the leadership what's happened, that's half the battle won. Cause then they they think, oh yeah, Steve knows what's going on. It's okay. We don't need to keep asking him for updates. We can just let him fix it now. Yeah. Wow, great finding.

45:38

So yeah, that one is another win for um Argo. Super cool. And we probably have time for one more before we jump into community questions. So you've already done some really high-impact ones, but I don't know if you have a That that is the full set. So that's all four of the door metrics. We're we're we're done.

45:56

All right, fantastic. Yeah, and this is obviously a subset of your findings. And so I did post in the chat a link to the full GitOps report. You can download it for free from Octopus Deploy Deploy's website. Um and uh what what was really impactful for me, Steve, and all these is just that it is such proof.

46:17

that A, if you adhere to the GitOps principles, you will deploy more frequently, you'll experience more failures, you will recover more quickly, you will be more effective as an engineering organization. If you just adopt GitOps. And second, the most valuable tool to help you adopt these metrics is Argo CD. And I actually think that was one of the other findings that we didn't cover today, but there was this um There was this uh finding that it was essentially like if you adopt Argo CD, uh you

46:51

are much more likely to adhere to the Gauss principle. Oh no, that was the first finding that you shared today. So we're Yeah, get your high GitHub score. And the beauty of this is that we find that all of these numbers move together. So increasing your GitHub score increases your score against the Dorometric. So that's the great thing about this is that you can um you can start off your improvement process by looking at those GitOps principles.

47:17

Seeing if you're maybe missing one of the steps in there or maybe maybe you're actually using one of the anti-patterns that Costas has written about and you could fix that and improve things. So yeah, getting getting the GitOps score up will get you those Dora metrics as well. So it's gonna it's an easy way to get to that kind of elite performance zone. Yeah. Reminder if you want to improve your Dorometric.

47:42

and be the 10x, 100X engineering operations organization. You gotta subscribe to this podcast, uh, whether it's on Spotify, on YouTube. We're up over 7,500 subscribers. Our most popular video, by the way, on YouTube. Just past 570,000 views, and it's all about ignore differences. So uh be sure to follow us on YouTube, Spotify if you're following on LinkedIn, whatever, whatever your method of choice is. But the vanity metrics I like to look

48:11

because that's my platform of choice. Um thank you, Steve, for sharing that. And now it's time to get into

⁠¶ Community Q&A: Fast Feature Branch Testing

48:18

Community requests and troubleshooting, uh, where we're gonna tackle some of the questions, most popular questions from the community this week. And of course, you have a chance to ask questions. uh as well. So the first question that I flagged for us, I thought this would be a fun one. Um, this was looking for a way to test feature branches fast. And what they said is, uh I have this is coming from uh Reddit, by the way. Um what they said is.

48:53

Oh hang on let me let me pull Um My first approach was to initially merge an application to develop, which would have a target revision as my feature branch. However, I would like to get rid of this initial merge somehow for faster testing. I thought about setting up a local cluster if possible for testing. I heard about telepresence, kind, minikube, et cetera, but I feel like I should do my tests in a dev cluster.

49:21

That is why it is there and it may be hard to move everything into from the cluster into local. I checked the pull request generated with application sets and it looked very promising at first, but auto heal in dev cluster may override the things I want to test. I guess and de deactivating auto heal seems to be an anti-pattern. So I don't really know how to make make this work.

49:40

I have checked Argo City image updater, but not everything I want to test is in the form of images. It doesn't completely help because I have things in in depth map, etc. I think it if I had a way to make the first step faster, that would be the most straightforward thing without adding extra complexities to the current system. But I am not sure what I should do. So let's draft an answer, panelists. What do we think?

50:06

Υπότιτλοι AUTHORWAVE the whole reason that you adopt kubernetes is to have you know auto scaling and all the fancy stuff so your preview environment should be something separate something that doesn't touch dev QA or staging, so you should be able to create three of those, five of those, ten of those. It doesn't really matter.

50:40

exactly for this scenario. So you create a uh something with a peer generator and have a separate preview environment that has nothing to do with QA staging or whatever else you call. your other environments. So maybe we need to ask some clarifications or he's still in the old Yeah. where he has a set number of environments. He is a bit anxious about deactivating auto heal for The application sets because it is an anti powder, not to use auto heal in general. But we're talking about like

51:11

Operating environments. Like if this is a dev environment, I don't care what you do. If you have auto heel turned off because it makes it easier and you want to. make changes directly against the cluster as part of your development process. Like I don't really care if you do that. I care about what happens from an operation standpoint once we move beyond the development stage. That's where really the best practices should apply the most. And the idea that you would uh

51:38

Uh because I I do this too, actually. I I I sometimes will have stuff where I push against a dev cluster. And I will disable autosync and make changes in the cluster because the thing that I'm working on, it's the fastest way to get that loop. When I'm ready, I then copy those changes out, commit them, turn autosync on, make sure they still work.

51:57

That makes sure I'm I didn't make a mistake in between. By the way, when we're developing Argo CD, we use a mix where we actually build stuff locally, but it connects to a dev cluster that has the running services on it. so that you can test different components against the existing running services and you can override any of the individual ones you need to. And we use um what's that system called? Telepresents. Tiltor Telepresents.

52:26

Uh we use tilt. Yeah, yeah. We use tilt to do it. So that's what if it's hey, if it's good enough for the Argo project. Now I think I would say the only reason that we don't use a pull request generator setup is because it is a community repo and we don't want to have

52:39

three thousand uh pull requests potentially running. I guess we could do it with labeling. We could get there. That might be an idea, but it's it's quite convenient actually to use something like tilt in my opinion. What were you gonna say, Steve? Yeah, I think this is the key thing is the um being able to spin up these temporary environments.

52:56

Um, rather than fighting over like a test environment or a dev environment. Um, definitely the benefits of that are great, but uh, but you do have to just consider how many people are working on this, how many of these are you going to generate? There is a a small like uh if you're putting them all into a cluster, they can all be completely isolated because you can keep them in separate namespaces, right? Or give them all different unique names. It's like that's really easy to do.

53:23

Um, but if you need to create loads of these, the cluster would need to be big enough to handle that. So I suppose it depends on the scale. Yeah, pull request generator makes this super easy, and that does seem like the correct answer to this. Um Or you could use even V cluster and have a completely different cluster for your pull request. Yay.

53:45

Hey, I'll have that blog post out very shortly. I am in the final running. Uh the I I would have had it done if it wasn't for ArgoCon last week. I swear. I swear, Costus. I'm gonna get it to you right away. Costas is

⁠¶ Community Q&A: Manifest Error Checking

53:56

I didn't I didn't say anything. I just talked about V Cluster because it's our favorite project or at least one of our favorite projects. All right. Well hopefully uh for Reddit user Ripbo Briff. Uh whatever that is supposed to be pronounced as. That was a helpful answer. Let's do another one. Uh this this was uh nine days ago. How are you checking for errors? This comes from user Mr. P. Bennett.

54:21

How are you checking for errors in your manifest before pushing to main? This might be a skill issue or knowledge issue on my end, but I can't seem to find an Argo CD schema that plays nice with any linter or formatter. I was moving my applications over to application sets at work yesterday. I guess my current formatter turned missing key equals into missing key with a capital K equals, which of course broke everything.

54:43

This wasn't caught by a linter, which would have stopped me from making the silly mistake. We are now looking to implement some kind of linter runner to check for mistakes before merging. Do you folks have any tips? Personally, my main IDE is lazy Vim.

54:56

with the default yaml setup just a clarification it's more how to get decent linting for manifest within lazy vim the tips on pipelines though are still helpful so i think a lot of people were telling them to to use some pipelines and stuff to check this, but um what do we think about this one? I I was talking to someone the other day about maybe we should invent type YAML. Um Yeah, you and I were talking about that.

55:21

So we can kind of create this lovely strongly typed YAML and this all gets handled with that, but like there's there's got to be a way of doing something similar to that, right? Without us inventing a whole new Yeah, I think I was ranting about Helm and uh yeah, we talked about type Yamels. Sorry.

55:38

Specifically for application sets, and I don't know if it's one of the comments, you can use the CLI and preview the application set. So you could actually do it manually or just have a uh pre commit hook that does this behind the scenes or say to your ID whenever I save. run this specifically for application sets. I think it would work uh great. I think also the top answer is a project from Doug.

56:01

Which was also cut this uh for you, let's say push it to the called R C B diff preview and it's actually a GitHub action, I think, right. Uh it's more than a GitHub action, but it's let's say you can use it as in git in GitHub action. So the idea is you you check it there first. Υπότιτλοι AUTHORWAVE It's a live Argosity instance, so you have much more visibility on what happens there with your changes. Maybe you know the syntax is correct, but you don't generate what you expect to be generating.

56:46

We should get this project and actually into Argo Project Labs. Anyway, um yeah, I would I would also say uh I follow as part of my linting process. I always do a dry run apply and I actually do that uh locally on my own machine against an Argo CD instance or ideally against the target next instance, like a staging instance or dev instance or whatever. But you could also do that as part of a CI process where you apply the manifest with the dry run.

57:18

And the great thing about a dry run apply and I usually do server side apply dry run by the way, is that it will then get all the interaction from the controllers that are on the Kubernetes cluster. So you'll get a full, you know, if there's any issues with that conflict, you'll get that information up front.

57:35

Now, in the case of the pull request generator, the error specific one that he's or of pull requests in general, the specific issue he had was missing key being camel case versus not camel case. And if it's camel case, it throws an error. Uh, I don't know if it will generate a blank application set or if it will just throw an error. If it throws an error,

57:58

Um, then of course uh you would it it's not necessarily an an error with the manifest itself. It's more about the application set. So in that case, Costus, I agree, you have to use the CLI to do an app set preview.

58:13

uh to see how it's gonna render the applications. And even that isn't gonna give you everything because that will generate that'll show you the output of how many applications are generated, but it doesn't then Apply the generated applications, which would be correct YAML no matter what, because the app the P pull request generator made it.

58:32

Um, it doesn't show you how they would change the applications on the other end. So it's a it's there's kind of a mash of two different questions here. One is

⁠¶ Episode Conclusion and Next Episode Preview

58:39

uh linting and the other question is previewing diffing. Um and that previewing diffing one is actually the better challenge to solve. So that's why I do like this Argo CD diff preview that you Great solution. And you can watch the talk from ArgoCon. So uh hey we tackled a couple of questions from the community today we covered we covered the whole uh argo cd versus the world thanks for sharing all those insights steve and we got a great argo tip about Yeah.

59:08

not eating the helm sandwich, which of course is something that Costa uh wrote about originally and so thank you costus for per for contributing that um we are at time as always subscribe uh follow us on x linkedin blue sky whatever your your network of choices Uh send us your questions, your ideas. We got lots of great shows coming up in the future. Um

59:31

We're gonna be tackling some more AI stuff. We're gonna be talking about some cool outside the box use cases. Uh let's see, when is our our next episode is gonna be in two weeks? And I wanna see if it's already live on the preview. I don't see it's what the topic is that's scheduled yet, but it'll be up soon. So make sure to subscribe to our newsletter. If you haven't done that, go to argounpact.com, subscribe to the newsletter. You get all the latest

59:57

information. Rivitao is furiously chatting at me and she's saying, oh, the next episode is octopus deploy an Argo CD. So we're gonna be talking about how octopus deploy is playing in the ecosystem with Argo CD. And if you ever had that question. What is our what is Octopus's relationship with Argo? Where we maintain the project as our project. Uh, and uh what is Octopus Deploy as a software platform do?

01:00:25

on top of Argo, how does it interact with Argo? That'll be the episode to watch. So that'll be in two weeks time. Uh yeah, Monday after next. So be sure to tune in for that. And thank you again. Costis, Steve, any last words? What's the recordings from MargoCon? Steve. Download the report. Are we should we making us uh my my final word is thank you for having me on the episode today. Uh I'm a fan. I have the t-shirt on, my Argo Unpad t-shirt. So thank you for having me.

01:00:59

Highly coveted Argo and Patch t shirt. Very rare to give. I stole it. You're an OG. Well done. Ha ha. Well, thanks everybody for joining. That's it for another episode of Argo Unpacked. And as always, stay synced.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Argo CD vs the world | Argo Unpacked Ep. #24

Summary

Episode description

Transcript