Environment as Code ft. Adarsh Shah - DevOps 190 | Adventures in DevOps podcast

00:14

Hey, everybody, welcome to another thrilling episode of Adventures and Develops. I'm your host today, Will Button and joining me today is adar Shaw. Adarsh welcome, Thank you, thanks for having me my pleasure. So do you want to give us a quick introduction and tell us tell us what we're going to talk about today. Sure. My name is Adashah. As you mentioned, I'm the founder and CEO at compus Est. I've basically played a lot of different roles over the years, from a developer to an architect, to

00:44

a consultant to technical product owner, mentor advisor. But I'm an engineer at HAUT who loves to build stuff that helps business technology, stuff that helps run business. And right now I have built up product as part of a company that I founded, and also do some consulting and advising around the DevOps and infrastructure as code space right on. So one of the things, and I

01:15

want to jump in because I'm really excited about this. One of the things that you put in one of the articles I read from you was talking about not infrastructure's code, but environment is code. Do you want to bring everybody up to speed on that? Yeah, definitely. So what I and some of the colleagues that I've worked with realized over the years is like, an infrastructure code is great, right, because it helps us automate provisioning of various

01:44

infrastructure resources on cloud providers on premises. But what we realize is what people need is an entire environment. Right. Just having your networking or just having your EC two instances or what have you, is not gonna give you everything

02:01

you need to run your application. So what I and my teams have done in the past is like kind of go through this painful process of like, right, pipelines that run stuff, connect various pieces of these if you think about it as lego pieces, like bring them together and kind of those things get really complicated and very difficult to manage. So what I kind of thinking about this more and more in the last few years, I was like, you know what I mean, there has to be a better way of bringing

02:35

these pieces together. So started thinking of this as a concept and eventually named

02:39

it as environment as code. It's just an abstraction over infrastructure is code that basically provides a way of defining an entire infrastructure environment, right, not just a single component or a resource, but also it manages state for the entire environment, including the dependencies between where resources, and also supports best practices, right, Like when you look at infrastructure is code, one way of getting

03:08

this whole environment is by writing this monolith infrastructure is code where you can put everything together. Right, that's not a good practice, right, like thinking about loose coupling and things like that. So, yeah, that's kind of what I call it. Environment is code. But basically it allows teams to

03:27

deliver entire environment instead of individual infrastructure resources rapidly and reliabily at scale. Yeah, whenever I just first read the term, it was like this huge light bulb went off, because I think the term itself is so much more descriptive of what we're trying to accomplish many times with infrastructure's code, because the infrastructure itself is just a component of the overall environment, right, and each piece,

03:53

each individual piece of the infrastructure is pretty meaningless unless it's couple with all the different pieces of that environment. And like you mentioned, I've built many environments using that monolith approach, and then you have to update one dependency in there or change out one of the components. And it follows very much along

04:16

the lines of building a monolith software application. You know, we're just making small changes, becomes this task that you dread and procrastinate and hope that you don't have to do it, knowing full well that you're going to have to anyway. Yeah, exactly, and some of the other things. Right.

04:32

So some of the principles that we talk about it like immutability, right, So you want to be able to easily provision a brand new set of Like let's say, if we talk about servers, rather than updating and keeping a long lasting server in case of this, applying that same principle to an entire environment, right, So why keep your various components living for long? Can

05:00

you actually replicate an entire environment rather than just an individual component. So for someone listening who's liking this idea and wants to explore it further, what do you think the first steps to migrating from infrastructure's code to environments as code is like, yeah, so, I mean, like I said, I think it's just an abstraction. So thinking about these various pieces and how they get together. So what we did is we essentially to keep it really simple,

05:34

is created a simple Yamel format. Again, this is new, so I'm like a lot of that I'm saying, at least from my perspective what I've seen. I haven't seen anyone talk about it this way, but it's new. So what we did is we provide a very simple format, a yamal format that actually specify these individual resources or components and how they connect with dependencies. I have an article that I think I've shared that talks about the principles

06:06

and how what this means. Yeah, I mean it's it's like I said early stage, So I would say, like something that allows you to create an abstraction on top of infrastructure's code, that gives you an entire environment rather than individual component, right, because a lot of people, like I said, write pipelines right to do this, and then you have to think about things like, hey, how do I provision stuff? But how do I

06:33

tear down right? Because teardown would be totally totally opposite graph right, So you have to tear down those leaf nodes first, and you just write all this complex logic if you've changed. So defining an entire environment and easy format that provides ways to connect those lego pieces if you talk about it, and then that takes and enterprets like into visions, those individual pieces so that you

07:01

get an entire environment. One of the things you put in that article you mentioned in there about keeping everything in source code, but then you added to that a little thing that I know I've slipped up a lot in the past doing this, and I've encountered situations where other people have done the same, like those one off scripts that you only run ever once in a while,

07:24

about how important it is to get those into source control as well. Yeah, definitely, I mean I think, yeah, talking about I think you're talking about the patterns and practices in general for infrastructures, code environment because it doesn't matter. So yeah, like I think, yeah, same, yeah, right in the situation where like hey, we just ran the script and you know, forget about it, right, like you it's on someone's machine and then you try to find who did that and how it was done.

07:49

So yeah, I mean source control being a place where you can track things, have auditing, right who pushed so you can talk to the person about it. And so yeah, like I think people say, like, hey, just put your terraform code or blow me code or configuration know, everything including Bash scripts that are lying stuff like you know that you use once in a while, everything needs to go there. Yeah, what about documentation? Do you like documentation in the code itself or do you have a preferred tool

08:18

like Confluence or something else for keeping track of docs? Yeah? Yeah, I mean so one of the things is docs is like a lot of people say this, like, hey, you do infrastructures code or like everything is in code, why do you white you need documentation right? And it's just I mean, yes, you don't. You need less documentation, but you

08:39

still need documentation right, And in general documentation it's hard I understand. But so like like a lot of people are like and I used to at one point, I was like, hey, we don't need documentation because you know everything is codified, but you still need right and talking about documentation right, quality documentation, making it a well bill where let's say there is an error, provide a link that can take a person who's actually looking at it directly

09:05

to that. And also it's just providing a way to keep it updated, right, so like I can draw a diagram or like write documentation and just you know, like hey, it's out of date, it's not going to be useful because it's actually giving you wrong information. So yeah, like I prefer obviously have naming things better in your code, right, because that tells you what it is. But definitely have some documentation that's external to the code that will get you started. Even it could be like, hey, here's

09:41

an index of things that you look at. This is how you onboard a team member or a customer or a user of your system. Also look at things like run books, right, that helps you look at specific scenarios around troubleshooting and things like that. Another important aspect about documentation is like, not everything is automated, and you actually don't need to automate everything right away, right, But you hit that second or third time you're trying to do the

10:11

same thing. Maybe document at first if it things are not automated, and then use that run book or what have you to actually automate things later, right, because that way there is a progression into getting to that automated state. Yeah. I think that's a really good point, Like everything doesn't have to be automated because some things, like in my experience, one of the

10:35

common scenarios I've hit over and over again. I work a lot with early stage startups, and so we'll build stuff and then that will fail and it just gets scrapped and never comes back to life. And so when I think about automating things, you know, I try to automate the things that are going to survive into the future, and it seems to save a lot of

10:58

time. And I know that's there's a tendency to automate everything, but for some things that just are going to fail fast and then disappear and never be brought back to life again, you can actually move a lot faster if you don't try to automate those. Yeah, yeah, I mean exactly. Especially let's say you have there is an error in one of your environments, and then it's like fixing or troubleshooting, right, even documenting those steps and then

11:26

eventually using that documentation to actually have a fix. Right. Because so whenever you talk about environments as code, is there a right size of team or environment where it starts to make sense to implement this, Yeah, And I think, yeah, it's like if you have a very simple setup, Yeah, it doesn't make sense to start complicating things because obviously with this like that

11:50

also comes a little bit of complication. So yeah, I mean I think I don't have a number for that, but I would say it's like a typical setup where you have various layers of your environments, right, so your networking layer, your platform layer, your application infrastructure layer, right like databases

12:11

and things like that. Once you get into those kind of things, and an application is not like your setup is not as straightforward as let's say I am just provisioning an easy to instance or like just you know, using lambdas

12:26

or what have you, right, which is like very straightforward. Yeah, I would start thinking about it because the other way to look at it as well is that, hey, let's start with writing these pipelines, right that provision these to start with, and then once you start feeling that pain of like adding the third component on the fourth and I was like, okay,

12:48

now this is getting like how do we tear it down? Then start looking with anything in software, right, don't overcomplicate to start with, But once you start feeling those pain think about a way to abtract it, which makes sense for provisioning the entire environment. I think we can come up with a new buzzword there. Instead of test driven development, we can call it pain driven development. I love it. Yeah, that's awesome. It seems to

13:15

be my big motivator. Yeah, so you've got you've got some thoughts here on testing. What is what are your what's your high level overview of the type of testing that you need to have wrapped around us? Yeah, so testing for infrastructure's code environment code as in I mean thinking about the test pyrament that we keep talking about when we talk about application testing, right, so kind of similar idea, but obviously it's different, right, Like, writing

13:43

unit tests for your infrastructure's code is going to be complex. I mean, there are tools that allow you to whether you need it is another question, right, So let's kind of Yeah, so let's kind of talk about a

13:58

typical test pyramit for something like this. I start with at the bottom, it's like do static analysis, right, So if you're doing terraform, like do things like terraform validate, you have lend or maybe you're using Puppet or what have you, Like, all of these tools have some static analysis tool.

14:18

When you get into unit testing, it really depends. What I tend to suggest is like if it's something like terraform that is declarative that actually allows you to define the desired state, and it's Terraform's responsibility to actually get you to that desired state versus more like your conditionals and things like that, where you need to write some complex unit tests and things like that. I would say you don't need unit tests when you're talking about declarative stuff because you are

14:50

not writing that logic to actually make something happen. You're just defining a desired state there. And then it's the tools shop, right, so if there's a failure within the tool itself, it'll tell you that, hey, you can't get you to that desa state. So like most of the cases, you don't have those those complex logic right. But then if you move up this pyramid, you talk about integration tests, right, which makes sense in

15:18

this case. Could be like in your provisioning pipeline or whatever you're using, you can bring up an epheneral environment, a temporary environment and then run some

15:28

tests against that. So let's say you need a setup with networking and like some platforms like EKS and what have you, you can run some basic integration tests to see like if those resources have the things like for example, running to make sure like hey, none of your S three buckets are public, for example, So you do those things ahead of time rather than when you actually provision that production environment, right, so you don't want to be finding

15:54

those issues late. Right. And then the last one there is like which I have useful, is like the best pace to test your infrastructure or environment that you have provisioned is by running deploying a dummy app. So let's say you're provisioning a Kuberne dis cluster. You can run like one simple dummy app that is close to what you would be typically running and then run some smoke tests just to see, like, you know, if you can get there.

16:22

So again, it's a test bit of it. So at the bottom where you're static analysis and things like that, you'll have more of those tests rather than and when you as you go up the pyramid, you have less of these tests because they are more expensive and time consuming. Yeah. Yeah, exactly. And I think one of the things you said in there that

16:41

I really want to call attention to is talking about unit tests. And with a declarative type framework like Terraform, there's really not a lot of value in unit tests because at that point you're testing like did the tool actually do its job? And a tool like Terraform already has tests to cover that, so you can you can save yourself a lot of time and effort by just acknowledging what tests the tool has the test that the tool is doing its job.

17:12

Right. But yeah, I mean if you're writing let's say Bash maybe like when you have all this logic, then in that case that's different. Yeah. So how do you deal with baking security into the environment? Yeah, and that's the thing a lot of people, I mean, that's a very important aspect, right, and a lot of people tend to like, hey, what do we need to care about security here? But yeah, I

17:34

mean I think a few different things to consider. One is you with with these tools, right, since it's automated, you need to give these languages, tools or whatever it's running access to your aid abous azure or whatever you're using to actually provision these right. So the easiest thing is give root access everything. You can do everything, and then you know if it's compromised.

17:57

Right. So I think one of the things is using a role based access control or trying to limit the surface area of attack, right and looking at if there are various layers of your infrastructure environment, maybe having different roles for those that give limited access to doing those things, because specially when you have production running, one credential gets compromised or so, like you can compromise basically give and I've had those issues that I've gone through it with multiple customers,

18:33

so that'll just stop everything, right, like the whole thing is compromised, rather than saying like, hey there's limited access, so even if things are compromised, the surface area is That's another thing you can do is if you use tools like hashikore Vault or I think even Secrets Manager AWS they have a rotation policy, right, so you can you can have these dynamic credentials created that then used. So again it's time based they expire, right, so

19:06

like you're not exposing it for a long time. But in general, you need secrets right when you are provisioning your environments or infrastructure. So keep it in in general, like just basics, right, keep it in some kind of a good secrets management tool which which stores that and then also looking at things like security scanning, so basically running those as part of your provisioning process

19:33

early on where you can spin up a firmeral environments. There are tools like CES Benchmark, the Amazon airfs as inspector and things like that that allows you to run these security tests or scanning to see like if you have things exposed right like is your S three buket public somehow, or like your databases are exposed and things like that, so it'll also check for other vulnerabilities. So yeah, I mean these are some of the things. Writing some common security

20:03

tests like using inspect and things like that also help. Like where it could be like just any infrastructure environment you provision, we just run these kind of common security tests to see like your network is secured, your resources that secure

20:19

would also be a good idea. Yeah, And I think that's one of the one of the possibly hidden benefits to using this approach that you've got of the environment the environment based approach, because like if you have this monolith infrastructure and you have security groups in there, you've got to think about this security group, say okay, this security group they need access to my sequel and elastic search and the reddest cache, and then there's an API serve, and

20:48

there's all these different rules and exceptions and all that kind of stuff in there that creates this huge vulnerability footprint. But if you break it down like you're recommending here, where everything is just a component, then you can when you're building out that component, you can say, well, this component needs to talk to postgrass, so we'll open the postgrass port and you don't have to

21:11

open anything else up. And then when you tear it down, you know that you can safely tear down that security group because you know the one and only place that it was ever used. Yeah, exactly. And one of the things that I talk about when I talk about environment, it's called is

21:27

like blueprints or templates for your entire environment. Right, So if you build these blueprints using best practices, keeping security compliance and other best practices in mind, essentially you can you can templatize it and then kind of use that and say like, hey, we did this and this is like certified or you know, by our team, and then you can share it with other teams

21:55

within your arc. So that gives you that versus if you look at like, you know, doing it using pipeline or other things, it's very different, difficult to like, you know, creates these templates for the entire environment. Yeah, because once you change the pipeline, everything downstream is now in an unknown state. Yeah. Plus these pipelines are not meant to run these things. They are more generic and then like you write all the logic to

22:23

make whatever happen, right, so it just gets complicated. So, speaking of the pipeline, whenever you do the infrastructure and build this out, what are the pros and cons of using an autumn using a tool like jink injer circle CI to deploy this out versus writing it and building it and deploying it

22:45

manually. Yeah, yeah, I mean it's it's a very important aspect, especially when again, one thing I say, like, if you have a very simple setup you're getting started, maybe you don't need it right away because let's say you're the only developer, like you know, you have a very

23:00

small theme. But as you as you grow and you look at provisioning and automating these things, having this shared auto execution environment, right, I call it where things are set up which are replicated, right, so hey it works on my machine problem right, Like like hey I ran this, but oh you had this version of terraform or this version of that, right,

23:25

so you can avoid that. And basically I feel like using any of these pipelines is one way, right or that have this common setup that you use every time, right, So it's not your machine, it's something that has all the tools and right versions and things like that. That definitely gives that. It also gives visibility, especially if you're running these tools a lot of times. You know they manage state, and if you've run a lot of

23:55

times things locally, you have local state and stuff like that. Moving all that to a ship place helps because then you know, if I'm doing something one of my team members is running, they all go to the same place to do that. Also, there's another way, gitups, which is more and more talked about recently, but essentially how icy GitOps is. Basically, you take your infrastructure's code, add a workflow to it, which is typically your PR process which all the developers are used to. Is like, hey,

24:27

create a PR, someone approves it. Once it's approved and merged to your main branch, it automatically provisions that so now you don't need to push a button or do anything right, but it also has this flow which I feel is a nice flow off, like you know PR process, right, so you can have people approve it, take a look commented before you merge

24:52

it. And then I also feel in there that getups is like this control loop logic right that talks about this detect th drift and kind of bringing back the desired state to actual state to desired state if things go out of sync. So I feel like that getop's way is another way of looking at it. When you're running your infrastructure's code or environment code, but using like the Shared Environment

Transcript source: Provided by creator in RSS feed: download file

Environment as Code ft. Adarsh Shah - DevOps 190

Episode description

Transcript