How Adobe uses Argo CD | Argo Unpacked Ep. #19 | Argo Unpacked podcast

⁠¶ Welcome to Argo Unpacked

00:01

Welcome to another episode of Argo Unpacked, the first of 2026. Today we have an awesome show for you. Today we're gonna be talking with Mike Tujeron. We've got Costus on the show, Rivi Towels returning as co-host, and of course we're gonna be talking about application sets, what Adobe is doing with Argo CD.

00:19

All coming up on Argo and PAC. So gather around, grab your popcorn, get your friends, get settled, because today's going to be a great episode. Now, if you're joining us on the podcast later. Uh while we always do have some visuals on the show, you can always just listen in and it works pretty much just as well. Don't forget to like and subscribe, follow us on whatever you use to get your podcasts on. And today we're going to kick off uh with a little tip about applications.

00:50

So for those of you that are just tuning in. My name is Dan Garfield. This is Argo Unpacked. My co-host, Rivi Tal will be joining shortly. And of course, we're going to have some guests. And we talk about all things Argo, all things GitOps, and helping you get the best out of your software delivery tools. like Argo and GitOps uh best practices, all that kind of good stuff. Um before we jump into today's tip, I want to celebrate a milestone.

⁠¶ Celebrating Top Argo Maintainers

01:18

Last quarter, this last three months, as we're starting this new year, uh the number one Argo maintainer most activity was octopus deploy. Cheer, cheer, cheer. Great work. Uh this is a huge effort from the team. And the big winner here is the community, of course, but uh the biggest contributor this last quarter. was Reggie from the Octopus Deployed team. So this is maybe the first quarter that uh Michael Crenshaw has been dethroned as the number one maintainer.

01:55

um in a couple of months. Uh of course, uh Michael's still very active and and we love Michael and all his contributions. So don't don't get me wrong. I'm not uh not putting him down anyway. Also, Pat in the top 10, another Octopus deploy contributor. And of course, we have some other maintainers, um, myself included, who are

02:12

not in this list who are a little bit below the uh this this group of people, but huge win from the team. Great work, Reggie, great work Pat and the other maintainers at Octopus Deploy. um for getting number one for the last quarter. We love to see it and uh hopefully we'll be seeing that going into 2026 some more. So before we get into today's overall topic with the group.

⁠¶ Advanced Application Set Tip

02:36

We're going to do a quick tip on application sets. Now, the tip is this. There are more ways to combine application sets than just using a matrix generator. For many of you, if you've used application sets, let's say if you don't know what application sets are, this is a way that you can programmatically generate Argo CD applications and you can generate them for Clusters that are added Argo C D to pull requests that are existing in GitHub, merge requests in GitLab.

03:05

uh you can do uh uh gen you can generate applications off of folder structures files git repositories all kinds of stuff you can even build your own generator and build and generate applications off of anything you want Now, application sets are super powerful and they have this wonderful feature called the matrix generator. The matrix generator allows you to combine multiple generators in an application set so that you can get values from two different things.

03:32

So for example, if you wanted to use a cluster generator plus a git repository. generator, you can have applications generated for everything within these different Git repos onto each of the clusters that are around. So imagine this scenario. Here's the scenario today that we're going to do. What if I have uh a pull request? I w I want to generate applications for every pull request. But I want to have a better level of isolation. So I actually want to generate a cluster.

04:05

And we're going to use V cluster in this case. If I want to generate a V cluster for each applic for each pull request that gets generated and then deploy my application to it along with whatever other dependencies that I have to that V cluster so that I can get an entire ephemeral cluster that I can tear up and tear down in seconds how would i do that using an application set

04:30

Quickest answer that most people would jump to is I'm going to use a matrix generator. I'll use a I'll use some c I'll generate some kind of V cluster, some kind of resource, and then I'll add a cluster generator to it. But here's the thing. If you do that, those clusters have to be created first. And so you they can't be used as part of that matrix generator. So here's what you can do. Instead of trying to combine all of your application sets

04:56

into a single matrix generator. And people do do a matrix of matrix generators. And it is very complicated, very fast. And if you can avoid it, I definitely encourage you to do that. Um Mike when we when we have him come on can tell you all about how he's generated uh tens of thousands of uh clusters and clusters using uh I think he was using the uh the

05:20

Matrix generator. But a way that you can actually do this is you can use multiple application sets that provide resources to each other. So here's how I would tackle the problem of creating an ephemeral cluster for each pull request. First, I would create an application set generator queued off of pull request. And I would put a filter on it for whatever pull requests I wanted to actually generate for. And its job is merely to install v cluster and to add annotations to that cluster.

05:53

that will give me information about what pull request is being checked out, all that kind of stuff. Once that cluster is generated, I'm going to have a post-sync step that will automatically add that cluster. to Argo Cd, it'll generate the cluster secret that Argo CD looks for to add that cluster to Argo Cd. Now if you if you don't want to do that step, if you're paying for V cluster, they have an enterprise feature. This is not sponsored by V cluster

06:20

uh in any uh by that team at all in any way. Um but if you have the enterprise version, it can automatically add new new V clusters to Argo C D. Either way, once you've done that, you now have a cluster that's been added Argo C D. You then add a second application set that is using the cluster generator. And it's going to be looking for clusters that have a certain annotation. That is the annotation of pull request.

06:46

And once it's done that, it can then uh based on the annotations that you created in the first application set, it will know which pull request needs to be checked out and it will then check out that pull request and you can have it deploy whatever applications you have associated with that. And you could also use a matrix generator at this point to bring in other dependencies or things like that.

07:07

So this is a very cool way to create two application sets that solve a single problem, generating an ephemeral environment. Now what happens when the pull request is closed? Well the first thing that it does is it goes and it triggers the deletion of that cluster. And once that cluster is deleted, of course, it's gonna delete all the resources that are on it as well. So uh that that basically is able to scale up and tear itself down in ephemeral way.

07:37

If you want more information on the how to do this, I do have a blog post that I'll have coming out on the octopus deploy blog uh in the next. Uh Um, but if you don't want to wait for that, you can do the GitOps certification. And it's in the level three certification, we actually have a full article.

07:56

uh explaining the process of combining two application sets in that way. But of course, that's very easy to do. It's just two different application sets that happen to work off of each other's resources. And uh by doing that, you can actually combine application sets in all kinds of wonderful ways. So I hope that that was a helpful Argo tip for you. Now let's bring in, let me introduce my co-host.

08:18

Ruby Tal, welcome back to the show. I feel like you've been traveling a lot lately. We haven't been able to have you on as much. That's true. Were you were you studying up while you were away, deploying stuff uh to Argo CD and and building out infrastructure? You know, I was actually writing uh an additional blog post for Argo CD 3.2. And then I realized 3.3 is out there. And you normally give a brief status about it. I'm not sure we didn't.

08:46

Uh no, I don't think we did. I don't think we talked about Argo C D three dot three. Yeah, so I think this is something we should do. But in general, uh I am getting from the release champions for three dot three it was Nitish, I'm getting a short video explaining all about uh the new RC the feet the new features uh and the process uh so it's uh on our youtube channel argo unpacked if you haven't subscribed yet uh you should

09:15

Now I think the release champion for 3.2 was Natish, but for 3.3, I think it's Peter, isn't it? Yes. And I'm I'm working in progress for this. Cool. And um Argo CD 3.3, I think the release candidate came out December 15th. And so we expect that the final release will happen beginning of February. Cool. So people can start playing with it right now. Now, um, anything else that we need to address before we bring in Mike and Costas?

09:47

Okay, all right, then let's let's introduce Mike Tujeron from Adobe. Mike, welcome to the show. We've been trying to have you on for a year practically, but scheduling's been difficult. We finally got you on, and of course, Costus. Who is uh basically the uh the the the godfather of all things Argo. Um some people call me the GitOps.

10:10

get the get the get ops godfather mostly just me i call myself that but costa is really the the brains behind the whole operation here so thanks costa for joining and i think that Mike mentioned that he wanted to talk a little about application sets and Costus was like, I mean, if you guys are talking application sets, I'm gonna be there. So I'm glad you came up. Yeah, happy to be here. Thank you. I appreciate you having me.

⁠¶ Adobe's Argo CD at Scale

10:34

So uh Mike, for people that don't know. Uh uh anybody that's been going to ArgoCon knows that Adobe is a big Argo user. You guys have a lot of applications, but maybe give the folks a little bit of some stats. Give them a little bit of flex so that they know that you guys are doing something bigger than maybe what they're doing at home. Yeah, so Adobe has around 500 clusters, um, and these run across uh

11:03

AWS and Azure and our private data center. They're a combination of AKS, EKS, and some Azure on Just VM because we haven't finished migrating them over to AKS yet. Um we run Argo CD across, I want to say it's like a dozen or two Argo CD instances. Um, and those are spread out across stage, prod, different regions, um, stuff like that. Um, I want to say it's around 200 and something system applications. Those are the add-ons, things like cluster autoscaler.

11:39

um core DNS, Cilium, things like that that we run into it. We also use RGO C D um to deploy and build our clusters. So we use controllers like ACK, ASO, cluster API. Um, to build the cloud infrastructure. And this all happens via GitOps as well. So it's a commit in of a basic YAML file. Argo CD applies it, controllers build up the cloud infrastructure. Argo C D then starts applying the manifests for all those system controllers that they then go on top.

12:13

Um I was asked a little bit earlier how many deployments do we do per month? I I ran the stats for October. Now they're a little inflated because we were doing a migration trying to get a bunch of stuff in just before the end of the year. But we had 1196 PR merges to the two repositories that trigger some form of deployment to production.

12:38

Now those are to one or more clusters. So it could have affected 200 clusters or could have affected one cluster for each of those PRs, but that's kind of the the rate of change more realistic on a monthly basis is around seven or eight hundred deployments a month. And it's growing every month, right? Yeah, yeah. We uh we do expect a large number of new clusters next month and then things just keep Keep growing, you know, things never slow down.

13:08

Now, um, what's been really cool, uh, you and I go back a fair bit. Um we we we chat we've chatted basically every stage of your journey uh before you guys even started really with Argo. when you did a big a lot of work with Argo workflows, moving it uh Argo C D, monolithic Argo CD for a while, splitting up instances. Um and uh And and for most most people don't know this, but Mike, I'll also mention that um you and I have an affinity for chicken wings and uh

13:38

that we share, uh not with just us, but in Vegas, Las Vegas, I don't I don't I don't think you were at reInvent because I was looking for you. We had actually the best wings I've had like of the year. Wait, better than KubeCon Detroit? Uh oh, actually well, I said of the year. Detroit was la was uh previous year'cause just'cause I don't I don't know if you can top Detroit, but we had really good wings actually in Las Vegas. Sorry for the sorry to derail.

14:01

But where you've evolved to. For the history for everybody, um the thing is is every KubeCon, uh, we try to find the best wings in the city of what that KubeCon is in. So every place in the country in the world, you know, we try to find what that city has the best buffalo. This is a Joseph Sandoval uh operation that he helps run with Mike. And um one time I mentioned Buffalo Wild Wings and they banned me from coming for like a year.

14:30

So uh it took me a while to work my way back onto the list to be invited, but um but yeah, we had really good wings in Vegas. So anyway.

14:37

In all your of your time of evolving, you've come to a place where you guys are actually using application sets quite heavily. And Costa and I have had a lot of discussion about this because we actually Uh anytime we review a customer and they have more than like five application sets, I think, on a cluster, we flag it as like, ooh, this might be a problem area that we need to go dig into because it it's there's like kind of a lot going on, a lot of applications.

15:03

So what are you guys at right now on application sets and what kind of generators are you taking advantage of? Let me actually clarify something here real quick. Um this is for the system team. This is for the infrastructure team. On top of this, we have several Argo CD instances that are running tens of thousands of applications.

15:21

that are the uh application teams that are running the Adobe applications, things like Photoshop, things like Firefly, things like AEM and all of those sort of things that are what makes Adobe Adobe. That's an entirely different stack of Argo CD that runs tens of thousands of applications as well on top of what we're doing. So if you think what I just described was big, double it. And that's our Argo put. Um last count I did, which was

15:55

early this year was we had something like for the infrastructure team, seventy ish thousand applications after all the application sets had fanned out. So um And so, you know, it it is Um up yeah, I want to say it's up to about 250 potential application sets um that are deployed. So the way we do it is actually quite interesting. So as I mentioned, we have a dozen or two dozen Argo CD instances of which the 500 clusters that we have are registered to various ones up.

16:31

We have 250 application sets that are potentially deployed to every cluster. So we actually apply those 250 application sets to every single Argo CD instance that we have. So we don't deploy them differently based upon environment or anything like that. It's the same application set for every single environment. Are they pointed literally to the same file in Git? Mm-hmm. The exact same file, the exact same everything. And that's one of the really cool things about applications.

⁠¶ Managing Clusters with Attributes

17:03

is you can run your fleet and your application entirely attribute driven. So when you register a cluster into Argo CD, you can set annotations and you can set labels. And so we use those very heavily. We set things like environment, team, organization, um, cloud provider. And some other things that we set to it, both for label selectors as well as annotations like AWS account ID and things that we don't need to do selectors on that just provide information about the cluster.

17:36

And then inside those application sets. um as part of either the matrix or the merge or the cluster generator, we say, hey, I'm selecting all the environment stage clusters that are part of this particular maintenance group, because we break it down. so that we don't deploy two all five hundred clusters all at once. That would be uh some crazy sauce right there. Um that's that's when you get into accidental DDoS territory. Yeah. Um, and you know, the potential for risk is just huge.

18:06

But we'll deploy to a canary, let's say, maintenance group of our, you know, a particular environment. And then what it'll do is it'll apply the Helm chart or the Git repo or whatever it happens to be for that particular application set. And whichever segmentation of those, let's call it 20 Argo CD instances, has that combination.

18:29

um from that selector, it'll apply and create that application onto that one and deploy it out. And it'll be the exact same running everywhere. And so when I'm An infrastructure engineer has to go and look at Argo C D. they don't have to think is this one dealing with stage is this one dealing with prod is this dealing with this

18:48

It's just this is just Argo CD, it's cookie cutter. So we treat our Argo CDs as cookie cutter as we do our ethos or what we call ethos clusters as well. They're Kubernetes clusters that Ethos is our internal code name essentially for our Kubernetes flavor, just how we run it, what we install on it, that you know, our CNI is Cilium and

19:10

we run carpenter and stuff like that. So it's not a custom install of Kubernetes, but it's all that flavor that makes Kubernetes Well Kubernetes really we've talked about this before, but Kubernetes is maybe best understood as a customizable private cloud, right? Because

19:29

The stuff that you're talking about running, Carpenter, Cillium, this is all stuff that maybe you would actually have as as a like built into your cloud. But when you're running Kubernetes, you get to decide which of those things that you're gonna have. And so having that kind of system uh systematized approach. You've built your customized cloud, your custom cloud operating system on Kubernetes with the components that you need for your teams to be successful and everybody knows what to expect.

19:54

And that's pretty cool. I think that's one of the big advantages of Kubernetes. It's one of the reasons that people pick it. Um, the great thing about your approach is that you did go with this attribute model. And this is something that Costus, you've written a ton about.

20:07

Uh, because you've seen people do it the wrong way. I think when Mike described it, everybody listened and nodded along and they thought, yeah, that makes perfect sense. But Costus, you've seen people do it a different way that maybe is less optimal. Yes, that is an article that we have written about this and several webinars where we see people that they try to describe each specific Kubernetes cluster and have a huge list of enabled and disabled staff.

20:35

And it's like the snowflake approach. Each cluster is unique and you need to say what this cluster has versus what this other cluster has. And what Mike is talking about is a thing I like. You don't care about individual clusters. It's as he said, cookie cutter, you are talking about cluster groups and cluster labels.

20:53

You label your clusters and then you deal with cluster groups. So if you forget about what this specific cluster is doing, you don't care about this specific cluster, you care about cluster groups, and this is how everything else is organized. So application set. are targeting cluster labels that's how the whole thing works and that's the correct way because for data operations it's the easiest

21:18

And also the one that requires the less changes. So every time you know you change something, you want a new group or something else. If you have 200 clusters. of enable disable enable disable, things get uh real messy real fast. If you have cluster groups, you don't really care about the number of clusters, like it's one cluster, five clusters, two hundred clusters, it's just cluster groups for you.

21:42

So that's the correct way, and I'm really happy that you know Mike is the leading example of how you should do it because you know I'm saying this is the theory, but then Mike says, Okay, this is also the practice and we know this work. Yeah, it works great. I won't say we're perfect at it. There are a few situations with a few apps where we do have um we use the merge generator where we combine the cluster selector and the git generator.

22:07

And basically the existence of a file in Git triggers the installation of a particular application. This is usually more for um things like the Firefly. clusters where um they may be doing something super custom on their inference or training clusters.

22:28

where they're doing some sort of um hey I want to test something super new and cutting edge and so it's not really a brand new it's not really a group of the Firefly clusters anymore it's this one cluster and we don't want to say hey do a cluster selector on cluster blah you know I would rather do that less than having a uh you know a single file committed into Git.

22:52

um so we do do that occasionally but yeah most of the time it is entirely uh attribute driven um and we do that across the board that's how we build clusters that's how we deploy the clusters that's how we think about clusters is in grouping Well, and I think you've struck the balance there because by having that component as being a git file driven thing, then it's not cluttering up an application set with that additional conditional. Uh about like

23:21

targeting specific cluster names in an application set. That's a nightmare. And we definitely have seen people do that. And like Costa said, it just g totally unruly. It's it's a is it one of the anti patterns? Is it in your top thirty anti-patterns blog posts? Yeah, yeah, it's it's one of those. It's one of those.

23:38

Yeah, so there's a lot of things. Hey, Argo C D can do a lot of stuff. Doesn't necessarily mean you should do all that stuff, but um the approach that you have, Mike, I think is one that we are pretty happy about. Um, you mentioned that Uh they're all pointing at the same application set file. So So each application points at their own application set. just to be clear. Um so there's 250 ish application set files, but all our CDs are point it's not like there's a different application set file.

24:13

that each Argo C D points to or each environment. So wait, are they all pointing to one application set file or are there 200 application set files? There's 200 and 200-ish application set files. Oh, okay, okay, okay. So if you wanted to change that application set. um across the board, you would go through and you would update each file.

24:36

to do that. So you would update one that would update maybe one cluster and then you would update the other ones. No, no, no, no, no. So there's there's think about this, there's 250 applications that can get there's 250 system applications that could potentially get Yeah. Okay. So for each application, there's a single application set. Okay. So if I want to update the carpenter application, I get you. There's a there's a carpenter application set.

25:03

that uh anything that has that that label uh that that triggers carpenter um will get carpenter would go and get applied get applied to. Correct.

⁠¶ Versioning and Progressive Syncs

25:14

So now we have the different generators in the different cluster selector generators inside. We actually use um I we have them broken down into different blocks for the different, as I mentioned briefly earlier, maintenance groups. So we have different the clusters are assigned to different environments and maintenance groups. And so we have different cluster selectors inside of there, and each one has a values dot version, which is the version of the chart that gets installed.

25:45

And so we define the version of the chart that gets installed inside of the app set, which I know is some people consider it an anti-pattern. Um, some people are okay with it. But what then happens is using the Go templating of an application set, we take it from that cluster generator. and we plug it into the application set spec. I have an example of this if we want me to code uh screen share it. But we take it from there and we plug it into the spec to say install this version of the helm.

26:16

So when we update each maintenance group, we can just update that one version for that different cluster selector group and then it'll push out that new version there. It's kind of like the a manual version of progressive syncs. And that's because the progressive syncs, which is really an awesome feature, has a few quirks that doesn't quite work for how we want it to work.

26:40

Um hey, it just went to beta last last uh last release. It moved from alpha to beta. I missed that. I saw that as a that's awesome. Now I really gotta get my feature requests in there and get some code time to make those changes. Uh there's a few things it does a little bit differently than we would like them to. Um and so we need to get some feature flags in there so that it works the way we need it to. Um but it's an awesome feature. Uh I I love it. I'm glad that it's progressing.

27:10

because then at that point we won't have to be making those changes. We just make the change once and how it goes across all those different environments. Yeah, because that was going to be my concern is like, okay, like how do we handle an update uh, you know, and make sure it doesn't hit everything at once. So it sounds like you've covered that. Costus is the guardian of best practices. Version in application set, yay or nay. What do you think? Yes, yes for Mike, for Mike specifically.

27:39

We we do have a tool. No no we we need to be clear. We always talk about best practices and best practices are you know the starting point. that if you don't have any other experience and if you don't have any other opinion, this is how you start. If you know what you're doing and you also are in a at the level of Adobe, then it's fine to break the rules if you need to. But this should be like a a decision. ότι γνωρίζεις το πρόβλημα, γνωρίζεις την λύση και γνωρίζεις τι θα κάνεις.

28:12

And then you're responsible for your choice. Okay. You can say after one year, these were the options, and I sat down, I talked to my team, and I made the selection according to what no. That's great. What I hate is you know, people don't think at all, they just choose the first thing they see, they try it, oh it seems to work, yeah, let's go and that's it. Okay. So I know that Mike is Is thinking. Okay, I know that Mike is thinking before making a choice.

28:38

Yeah, we went through several different options before taking that approach. Um, and we actually made this decision knowing that Progressive Syncs was coming um and was going to become more stable. Um And we wanted to be able to rely on Progressive Stink's ability to say, update our version from this. To this and do it progressively across all these different environments and maintenance groups, um, you know, safely ten percent at a time.

29:07

um as it goes along. And that's our end result. And so this is an interim step until that is there. And then we'll be you know, we won't have to deal with that. Now we do have tooling in place to update that automatically. Um it is PR gated, um, so it's not like it just rolls on its own. Um So does it does it need to make a commit every time the version is updated or it's oh it is it is actually making a commit in git specifying the version but you're making that change on the application.

29:42

Yes. Oh interesting. Okay. So yeah, yeah, yeah. Okay. So the other um I guess the alternative approach would be that. the application set is pointing at files that specify uh essentially what version is supposed to be deployed and then those files are getting updated. And in that case, it would be um a source of your application set generator would be doing that. And the advantage would be that you could

30:13

You could know what version is supposed to be deployed without running the application set itself. But you're already running the application set. So it's not like you've bought a big problem or anything there. You gotta remember also for us though, we're 500 clusters. Yeah. So we would have to be updating 500 files. And so from a maintenance perspective, that's a huge pain, uh, you know, for verbial keys.

30:39

And so that's that's a choice we chose not to do. Now we thought about okay, could we do it, simplify it down and have files by maintenance group again? And then it was kind of like, well, then it's kind of a disconnect there because over here we have it by grouping but over here we don't.

30:56

And so we felt that it was a better decision to just have all that grouping in one place. We thought about adding it as attributes onto the cluster itself, but then every time we'd make a change, we'd have to update. the the registration in Argo. Um and so like Costa said, we went through and we we thought about it. We went through Five, six, seven different ways to do it before we came upon this temporary solution until ProgressiveSyncs is fully in place, and then we'll be making a change.

31:27

No, that makes sense. Um, I think I think it's a very reasonable approach. You've you've kind of struck the balance here uh of of the different advantages and disadvantages of these different approaches. So from progressive sync. Is it worth talking about what specifically you guys are looking for before that can solve the problem, or is it too in the weeds? It's actually two very simple things. Um one is that it now no longer does uh autosync.

31:53

Um, so for us that would require hitting sync on 20 different RGD instances. Um, which is again, we would lose the automation, which we don't want. Um, and the other thing is is it doesn't quite handle sync windows the way we use sync windows. Um So we don't use sync windows, I don't want to say a lot, but we don't not use them a lot. You don't not use them a lot, Doc. Right. So uh I mean we use them, but we don't use them. And we probably don't use them the way they were in.

⁠¶ Sync Windows for Incident Response

32:30

Um how how do you use them? Yeah, we we basically use them in the situation of um kind of let let's say somebody does an appointment. And it just, it blows something up for one particular cluster, you know, and it causes some sort of uh customer service outage, uh, what we call a CSS.

32:54

At that point, what we'll do is we'll throw a sync window on that cluster either globally or for that one particular application to allow somebody to If if they're not able let me clarify, if they're not able to just quickly switch that tag back. and just roll it back um because that's preferred that's step one because it's really easy to revert a PR

33:19

And do that. But if for some reason they're not able to do that and they have to actually say, I need to make manual changes to fix this problem, which goodness forbid that's not the case. But if it is, because it does happen, they'll throw that sync window on, make the manual changes, recover. And then they'll go back and fix it and get with what needs to be there. And the sink window will prevent changes from self-healing from what they had to manually.

33:44

No, so you you use them for incident response. Incident response primarily. I totally get that. I actually think that's a great use case for sync windows. I think that's like a lot of people think of them as just being like, oh, we don't deploy on weekends. Okay. Yeah, you can totally use them that way. That's great. And but I think incident response is absolutely One of the main use cases for single.

34:07

And to be able to say, like, okay, something's gone wrong. We need to do some cowboy stuff. Hey, ideally we don't do cowboy stuff, but we know the world isn't always ideal to uh to to to to throw up a sink window so you can go in and to figure out what's going on and

34:20

And then um make your changes. And then the nice thing about it is you go and make them in Git and the sync it'll start showing as in sync and you're like, great, I can take away my sync window and it won't even have to do any synchronization. um because i've already gotten it there so i think that's a great use case and and i honestly wish more people would use sync windows for incident response

34:40

um like that and have it part of their toolkit. I think it's great. Yeah. Just as a side note, uh Dan, that was your uh recent tip actually in the previous episode. That's right. We had one on um, I think it was recovering Argo CD from when it gets locked up. And you use a sync window to block all synchronization. Exactly. Things calm down. You can access Argo CD, see what's causing the problem. Absolutely.

35:04

Yeah, that uh you mentioned the not do weekend deploys. That was actually one of the things I was happiest about when we moved to Argo C D. was uh we used to have the don't deploy on weekends, don't deploy on Fridays thing, and now we just we go for it. Just let it roll. Yeah. Our our risk got reduced by so much.

35:23

Because of once we had self-healing, once we had our, you know, like you said, we used to have a monolith. Now we have small individual helm charts. We have that, it sounds like a lot, 250 applications. Well, you gotta remember that used to be one application. that would deploy everything all at once and that's a lot scarier.

35:41

Um no deploy Fridays are like considered a dark pattern, but I think that they are actually the sign very often. They're they're it's like the bell curve meme. It's a sign of a very unperformant organization.

35:54

Or a hyper performant organization. You know what I mean? Like you're either deploying on Friday because you don't know better or you're deploying on Friday because you're so good at it that it's not a problem. And I'd much rather be deploying on Friday because we're good at it, right? Then then I feel like

36:09

You're a high performant operations team. If you're telling me you can't deploy on Friday because it's like deployments are terrifying, we are not willing to ever deploy on Friday. It's like, okay, well then you're probably a low performance uh deployment organization. Yeah, now people optionally don't because they're like, well, something goes wrong at four o'clock on a Friday. I don't want to stay late just in case. Yeah. You know, people do. And that's that's a great thing about Argo C D.

36:36

and GitOps because like I said, the tags in the application set all revert is is going from 1.2.3 to 1.back to 1.2.2. It's a quick flip like that. It's not like reverting a whole bunch of code, recompiling code. because we use Helm charts um and so the Helm charts already exist it's just flipping it back and it just goes so a revert is you know two minutes, three minutes, it it takes like no time at all. Luke Luke says no problem to deploy on Friday, you have all weekend to fix the outage.

37:11

Thanks. Um yeah, so in your case, I think it it totally makes sense the way you're using them. Um so the issue that you brought up with progressive sync, the big disadvantage, of course, is that it does disable autosync. And that is Uh probably it's Achilles heel. Um have you looked at GitOps Promoter? Uh because part of the reason that they built that was they were like, we really like progressive sync, but we don't like disabling out of.

37:39

So we actually've been having a nice debate about that in Slack uh today about it, me and a couple of other coworkers, um which is great, but again it doesn't have GitOps promoter doesn't have the 10 promotions and things like that. That so it They both have pros and cons. Um, and so we'll end up somewhere in between with one or the other, or at some point maybe the both will merge. I I don't know where things will end up.

38:05

Um I mean, Mike, I'm I'm contractually obligated to say this sounds like a great use case for octopus deploy. But uh but hey, you know what? That's not the free option. So I get it. Well it's not let's say there is a free option with Octopus Deploy to do this, but it's not for 400 clusters.

38:20

Yeah, it's uh you know, we always run into the problem of people always say, oh, this is great. You know, we're thinking, oh wait, you're at how many? Um a lot of the solutions, they just they don't scale at you know 80,000 applications. You know, Argo CD is great, but you know, a lot of our clusters, you know, we need to watch config maps across the entire cluster. And when you have 300,000 config maps,

38:46

You can't run 500 clusters in the side of a single Argo CD when several of them have that many config maps. It's just not tenable. Yeah. It's a real challenge. But it's it's doable the way that we have it set up. We've made had to make you know some design decisions back and forth. um for some pros and cons. And there's some stuff that we have we struggle with because of it, but we build other tooling around it. I'm not sure how familiar people are with crew plugins for kubectl.

39:17

Um, but we've leaned very heavily into that. Um, so we have a lot of plugins. that we've written for that that allow us to interoperate with Argo CD that it basically it'll shell out to the Argo CLI or it'll make API calls to Argo CD itself. um to abstract away the fact that we're running those two dozen Argo CD instances um as you try to get information about a particular application on there.

39:44

and stuff like that. Oh, that's cool. Have you considered putting any of those out in the community or are they already out there? Uh they are very poorly written. They are my uh many of them are my 10% time sort of tools of like this is bugging the heck out of me. I'm just gonna um

40:02

And so I haven't done that. There is one tool that we wrote that is uh pretty cool that I want to get into the open source community. I just haven't figured out how to abstract it away enough yet. Um It's that we you know, the fact the two dozen Argo C D instances. And Argo C D notifications are great. Um and you can make some of them more data red. But it's hard to enrich it with Git information of like who actually was the last commit.

40:37

Or who was, you know, what was the commit message? What was you have things like the SHA and stuff like that. And so I wrote wrote a tool that uh watches the health of the applications, grabs it, goes to Git, grabs the commit information from the various repos, looks it up, pings the person on Slack directly, pulls it in. pulls in additional metadata from Argo itself and goes and looks on the cluster itself for what's going on and gives you a nice rich status message.

41:09

of what's going on when a cluster works or when a deployment fails or not. Um if you s saw my Argo talk at um I think it was Europe last year. You know, when Argo C D doesn't deploy, you know, it may sync, but something didn't deploy all the way. And there's those kind of long running deploys I call them. You may not know for a couple of hours that something didn't work. And so you gotta have something that comes. Oh, thanks.

41:43

Something that comes back and lets you know later that something didn't work versus what you're just looking at. Um and so what this tool does is it basically enriches it with that additional information for why exactly something didn't sync through properly. instead of just the the standard, hey, something's out of sync, it tells you why it's really out of sync um beyond just the data that Argo C D has.

42:10

With that other stuff. But getting information from a remote cluster and from Git and all of that makes it hard to open source. Yeah, yeah, yeah. You have to ab abstract that stuff. And that is like um Yeah, that sounds like that sounds like an enterprise feature. You know what I mean? Like that sounds like something that we would stick in our kind of stuff in our

42:35

But again, I'm not trying to make a sales pitch here. It's just like it's like when you mention it, you're like, Oh yeah, they're talking about the thing that we we do. Yeah, yeah. I mean it's it's really handy and it's useful information. Um but yeah. Yeah. Hey, you know, between you and me and I guess the rest of the world now, since we're talking here, I would have loved to have gone with you guys years and years ago. But

42:56

Yeah, it just wasn't in the cards. Um you know, we we love the open source side and it's all good. It's all good. We're all open source, but you can with Argo. That's the awesome thing with Argo. I mean it We really haven't run into anything with Argo C D that we couldn't do with the open source version. Yeah. Which is just incredible. Um well we love to see it. So um RubyTal, we've got

43:21

Uh ten, fifteen minutes left here. Mike's taking us through the story. Should we um should we react to some Argo stuff together? Some some Argo news. Do you wanna talk about Argo three dot three or do you wanna go deeper on something that was No, I was actually uh I think that Argo 3.2 didn't get enough attention from you then.

43:46

I thought you went through the blog post but then I went through uh when I created the one for octopus deploy I've noticed that there you know I like you and UX And there were a a lot of cool features that were part of the three dot two version. that no one talked about them and as I went through them I even uh learned about the uh source hydr uh what's the name of the component? Source hydrogen yes

44:15

Uh, and I was learning a lot of a lot of cool stuff. And that's what I decided maybe we should create another blog post. that create that contains those uh new uh cool features uh and we can share it with the world. Wait, didn't you do a blog post on three dot two? Yes, it's not published yet. But I think as a result of the fact that I was just navigating through the features and I realized that some of the UX features, they did not get enough attention. Yeah.

44:49

Yeah. So maybe we should do that. Well, let's save that maybe for next time. Uh, because you'll have your blog post out and you can you can talk us talk us through it. Fair enough. Do you have any insights or things you want to say about three doc three?

⁠¶ New Features in Argo CD 3.3

45:04

Well, we just we just had the blog post out uh right before Christmas on 3.3. Um and there was some good stuff that was going on. Uh pre-delete hooks are coming and I can pull up Yeah, pre pre sync uh pre-delete hooks are coming to three dot three. So this lets you run a job before Argo C D starts deleting applications. This was one that Was really hard um to work around if you had a use case where you needed to do some kind of cleanup thing before.

45:39

It was hard to do. And so you you basically had to do like a precinct operation that had some kind of conditional check. to see if something need to get deleted. Not very fun. Pre-delete's easy though. Um, so this is a great feature. And uh Costus and Mike, if you have any comment on this, feel free to jump in. Uh this is going to be really helpful for us for some of our CI stuff. Um we have some stuff where we have to shove some data into vault.

46:07

Uh and so this will allow us to make it a little easier to clean up Vault on the fly with our CI code. So I'm actually really looking forward to being able to use this. Yeah, nice. Υπότιτλοι AUTHORWAVE Do we foresee any abuse of pre delete? Don't ask me just yet. I will take a look at the Slack channel that I always do. This is how I find the anti-patterns. I follow the Slack channel. That's the way to do it. Yeah, you're going to Slack and I just say, wait, what are you doing?

46:51

Um yeah, pre-delete, I I was thinking, uh I was thinking there's not there's not an anti-pattern, I'm sure. And then I thought people are so creative. They're gonna show us one. Um, I'm excited to see what it is. Uh oh, OIDC background token refresh. This one's really important. Um, so there's just this issue. This is a kind of a UI issue where uh when you're using OIDC tokens to access the UI.

47:18

um they weren't refreshing automatically and uh so you would just randomly get logged out and you'd have to go and go and re log in. It was annoying. Um there is a similar issue with this that's being worked on right now where um it once once your token was set, it wouldn't recheck to see if your token was still valid as well. So you could stay logged in after you were you were flushed. So that one I think is being solved before 3-3 comes out as part of this.

47:51

So that'll be a good one. Uh lot of lot of work being done on source hydrator. Um so inline parameter support, which We probably think is sounds like an anti pattern, but understand but but it's it's one of those things where Like you said, Costas, there m there are cases where it does make sense and you've bought the problem and you're willing to do it. So it makes sense to have the support for it. Um, but I would definitely be very careful using these, using the inline parameter support.

48:25

Uh make sure you have a really good reason to do it. Uh better monorepo support. I don't know exactly what this contribution was. Um I know there were a bunch of performance improvements, which is cool. So we needed to see that. Um any any thoughts from the group on source hydrator improvements? Let's wait to see how people are using it. Yeah, fair enough. Okay. We're we're not actually using it, so I'm not sure.

48:53

Yeah, well Source Hydrator is very new and not very many people are using it yet. I think that it may not be considered completely like feature complete at this point. Um, but a lot of those components are there. So you could definitely start using it and trying it out. Um

⁠¶ Automating Deployments & Tools

49:08

Let's actually take one of these questions from the audience here. So we pull this out. So Craig says, I'm testing Argo CD with a pull request generator in dev. My Azure DevOps pipeline builds an image and pushes it to ACR. Argo C D creates a temporary preview app from the PR and deletes it when the PR is closed. Okay, so so far so good.

49:30

What should I configure so that the same image from ACR is automatically deployed to UAT immediately after the PR is closed, merged, right after the preview environment is removed? Ooh. What do we want to do so that that's a good idea? Image. is automatically deployed to UAT once the PR is closed merch.

49:53

So d Dan he's asking two more questions and they are much easier to to answer. Let's jump to two and three and then let's come back to this one. Oh, I was gonna say I was gonna say the easiest way is that you have Octopus deploy just configured to do that. But Um but I know that's not the I don't want to be the shill guy. Isn't this what GitOps promoter is uh meant to handle?

50:18

Uh yeah, you could use GitOps promoter to do this because once the PR is merged, it would pick up the next thing. It would go and automatically update the next thing. Um you could also have Uh yeah, I think I think both of those. But let's do the next ones and see if we have an additional answer. So uh Craig also says, Craig? It's a K with a Craig. He must be a Kubernetes uh professor.

50:43

Um second, do I need to use Helm with Argo CD or can I use customize instead in dev and later in prod? I prefer customize. Helm templating is inconvenient for me. Yes, you can use customize everywhere. I love customize as well. That's not a problem. See, super simple, super fast. Craig, I got I got even another tip for you. I, whenever I use Helm charts, I don't even use Helm. I use customize to render the Helm.

51:13

Because it gives me so much control. I can do, I can do all the stuff with Helm, plus all the stuff with customize. I basically never use. Helm directly. I always wrap it in customize because I always am like I might need to change something or do something or tweak it or maybe I want to add a layer or whatever. So um even if you use Helm, you don't even have to use The only thing you can't do with that is the ignore missing values file.

51:41

You can't use the ignore missing values file with customize. Mm-hmm. Yeah. What does it do? Does it just fail to render? The the helm option in Argo has a flag, ignore missing values files, and it will automatically pop off the list any values files that doesn't exist. Whereas if you use customize or the um was it the CMO? CMO is that the right the um Your own plugin. CMP. CMP. Yeah. It won't uh you have to do your own checks to make sure the values files don't exist.

52:18

Oh, okay. Okay. Good point. Good point. Um all right. Craig, third question. Regarding team communication, when a given application is created by Argo CD, is it possible to send a notification to Microsoft Teams about its status? Could you send any link?

52:36

I don't know if it has a built does it have a built-in MS Teams integration or do you have to use the web of one? If it doesn't, it should have like webhook support. I assume uh Teams has a webhook support as well. That's my assumption. So y yes. Yes, for certain, if there is native support. And yes, if there is webhook support, and I assume teams have webhook support.

53:00

Yeah, there's um you can set up notifications in Argo C D uh to run and you can basically just have it run a curl. Um so yeah, you should be able to send. Assume I don't use Microsoft Teams, but I assume that they I assume that you can they have an API, I'm sure they do. to do that. Um on the first question, Craig said the CI is a temporary repo called app. The second one is a temporary called Argo C D according to best practices. The second one is temporary.

53:39

Second one on temporary is called Argo C D. Yeah, he's saying that the source code exists in its own Git repo, which is called app and this has source code. And then there's a separate Git repo Argo C D, which is the correct best practice to have separation between source code. And manifests. I think this is what he's saying. I hope he's not saying that every app gets its own second repository. Um I would like come I would like make one probably, but um

54:09

But yeah, so I think uh the the purpose of that, since it's separated, uh, you can deal with everything basically in the second repo. Because the first repo has created the app. Now it's going to go through testing. That that that that lifecycle is done. You've done the application lifecycle. Now you're dealing with an operations lifecycle. So everything's going to take place in that second repository.

54:29

And you can do a couple of things. Um, you can also open up both pull requests at the same time. Um trying to think if you could. Create a check to see if the other one had completed yet. You could probably do that. But I think the better way to do it is. Is not that. The better way to do it is not what I just said. Um

55:02

So let's let's go back here s to to this question. So uh yeah, Coach, just you have a better idea now? I've got a just a quick parenthesis, I just checked that there is Teams support for Argo C D notifications. Okay, if you search for Argo C D notifications teams Υπότιτλοι AUTHORWAVE And you reject the PR while merge means you like the change. So for me, you should have a workflow when if a PR is closed, nothing should happen.

55:39

Nothing at all. This change wasn't valid. Something was wrong. If it's merged, then this means that you have a new git file and then you can monitor it with any way, or maybe it pushes a container image and you get this container image with image updater or octopus deploy. something else. So make a clear distinction between I like this PR and I want the process to continue versus I don't like this PR and I want nothing to happen. This should be super clear for everybody.

56:08

Yeah, something you can do is once it's merged, you can also add a tag to that image. Like you can tag it for staging, right? Or or whatever. And then if you were using something like image updater, you're looking for updates. Under that tag. That would be it would be tagged staging with a version, right? And it's looking for certain versions. And then it would go and update Git automatically to pull that new version.

56:35

The quickest the quickest answer to this question is use image updater. So by the time you merge something, there is a push to contain registry, and maybe it's your if it's your production contain registry, you start a production deployment. If it's your staging registry, you start a staging program into something like this. That's the easiest way to start. Cool. Yeah, and I would I would probably have it.

56:57

You could you could have a CI job that runs once the PR is merged that adds the tagging or or whatever. Um that'd be an another way to do it. Uh we mentioned using GitOps promoter to potentially do it. Obviously Octopus Deploy supports this as like a direct like that's a very obvious like it basically once you once you essentially add an artifact you would have Your plan.

57:20

of um how that's gonna be rolling out to different environments, different applications, if those are gonna be Git updates, if it's gonna opt update just Kubernetes directly, however you want that to be done, if it's gonna work with Argo C D or not, um, that would be just a a configuration you could do. So Nice. These are great questions. Um Let's take one more before we close. Um, this one's from Piyush. At Adobe, GitOps with Argo CD is all about using Git.

57:46

as the single source of truth for Kubernetes deployments at massive scale. Teams define application seat declaratively in Git and Argo C D continues to reconcile with surrounding clusters. This seems like he's actually just uh explaining Just making a comment. He works for anobi. Summarizing. Oh, you know this guy. Okay, cool.

58:04

His comment is too long to see the rest of it on stream. But wait, d Dan maybe we need to mention this because uh Mike didn't say anything about this. Adobe is a heavy user of Argo C D. And they have been so nice where if you go to the previous KubeCons and ArgoCons, you can see presentations from Adobe on many different topics. So if you want to see, you know how

58:25

Argo CD is used in a really big company. You know, even if you forget about this show, you can go to YouTube and find lots of information on what they do and how they do it, including Mike. Like Mike has you know lots of presentations. Both at ArgoCon and KubeCon. It's it's both. It's both since uh twenty twenty or something, twenty twenty one. Yeah, and those other talks will talk a lot more about application deployments, not just system infrastructure deployments.

58:54

Um so you'll learn more about you know deploying your application that runs your site or your tool or your app or whatever, not just how to run your infrastructure, which is what I Uh we did a talk, Mike and I together about um benchmarking. I think Argo C D and Figueroa. Oh, is that with Joe? Okay. Joe and I. Sorry. Also Adobe. Um Mike, you and I did a talk together though though, didn't we? No, uh that was when I got in the car accident and wasn't able to do it.

59:24

Oh, you were supposed to do it. I was supposed to do it, but I got hit. You thought it would be more fun to get in a car accident. It was hard for me not to take that personally, Mike. I thought that that was like, geez, I've never seen somebody want to get out of a talk so bad uh with me that they're willing to get injured. Wow. Mike also mentioned the cluster API, which I think is also another very interesting topic on how you create the cluster.

59:52

so we can invite him a second time and he can talk about that topic as well if you're not familiar with cluster api go and look at it Oh, okay. Wait, wait, wait. So there is an additional little question here. So let let us let we'll take it. Okay. Um

⁠¶ Scaling GitOps Without Bottlenecks

01:00:10

When using, how do you prevent Git from becoming a deployment bottleneck? Specifically when hundreds of teams are committing simultaneously, environments drift independently, and emergency fixes bypass Git. What concrete mechanisms do you do do you use to reconcile speed, safety, and eventual consistency without breaking the GitOps concept? Ah, great question. So we do have um a variety of different ways. This is oh man.

01:00:42

So individual teams have their own individual git repos. Um so a lot of this is solved just by the fact of Different teams have different repos, so there's not going to be that bottleneck coming into it. It's not one massive monorepo. Um you know with inside my team as I talked about there's those 250 different applications. Um each of those are in their own um repository um for where the the Helm charts are.

01:01:12

um and that kind of stuff and then it's just the application sets themselves are in their own and you don't end up with getting into a bottleneck there and it's all they all can roll out independently of each other. Um We Basically, you know, w when there are um you know breaking the GitOps contract, if somebody were to make a manual change without um putting in place the sync window. Argo CD will just write right back over that manual change. So the manual change will get reverted.

01:01:47

Um, if there is a sync window in place, we do have dashboards that show all of the sync windows that are in place and say, hey, look, these are the things that are currently paused. And people are constantly going back and checking, hey, is this still supposed to be paused? Is this still supposed to be there? And we have logging systems. So every single deployment gets logged into a centralized system.

01:02:05

That says, hey, these are this is all the change that's ever happened on the cluster. Um it's both inside of Adobe centralized system as well as a much more searchable and viewable uh dashboard system that my team can use to show not just When changes happen, but what the current version of everything is on the different clusters. Let's see, did I get everything there? Um

01:02:30

So safety always comes first. We're always we care first and foremost about stability and uptime. I mean that is critically important. I mean Adobe's a multi billion dollar company. We can't have outages without people getting really, really upset and costing a lot of money. Um so we have a lot of CI C D checks in place um before things even get merged. Um

01:02:53

as part of the application, as part of the Helm chart, before things even get to production. And then we do slow and steady deployments out before things even reach to a production stage. So there's a lot of steps that happen before something can even reach production where there's multiple sets of CI or or not CI uh C D and testing happening before things get there for that speed and safety. Yeah, this question might be boiled down to

01:03:24

uh what kind of delay is acceptable from when git is committed to and when it's deployed? Um yeah. I mean theoretically if we were like, hey, you know what, YOLO? you know, we we really could deploy fast. We could get something out in a matter of minutes. Um, but realistically, you know, it it takes, you know, uh an hour or so if we were trying to go super fast.

⁠¶ Episode Wrap-Up & Next Show

01:03:53

Yeah. All right. This is this is maybe worth a follow-up discussion for us to tackle on another episode of Argo Unpacked. So Rovital, we're at time. What do the people need to know? What's coming up? So we're gonna have Luke, Ben, Kostis, and you then, of course. On the next episode, we will be talking about Argo C D and C D events integration. Uh that will be taking place uh Within two weeks.

01:04:25

Um I think it was scheduled for the 19th, but that's a holiday. So uh did we decide if we're going to broadcast on the 19th or if we're gonna put it on the 20th, or did we pick it up? We're going to record it next week and then we will uh stream it.

01:04:41

On the nineteenth. Okay, cool. All right. So plan on that. And then uh of course, if uh don't forget to share these episodes with your friends. Um, we've got a lot of YouTube shorts and other short videos if you're on other platforms, uh LinkedIn shorts, whatever. So find those, like them, comment on them. Uh love the questions. Craig, Piouche, thanks for bringing those questions to the team. As always, you can find us on uh X. Blue Sky, all the social media is at Argo Impact.

01:05:08

And feel free to send us ideas for episodes, questions you have. We'll try to address them in future episodes, either with tips or with guests. If you have an idea for something that you'd love to see, let us know. And with that, thank you, Mike. Thank you, coaches, for joining. Rovital, thanks for hosting and putting this whole thing together. Thanks for having me. I appreciate it. All right, everybody, as always, stay synced.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

How Adobe uses Argo CD | Argo Unpacked Ep. #19

Summary

Episode description

Transcript