#11 - FinOps - podcast episode cover

#11 - FinOps

Jan 22, 202327 minSeason 1Ep. 11
--:--
--:--
Listen in podcast apps:

Episode description

Send us a text

In this episode we talked about FinOps. What is it, but beyond the buzz, what does it mean to be financially aware. What are good systems to put in place, metrics to track, tools to use and concepts to adopt.

Trivy: https://www.aquasec.com/products/trivy/

Meir's blog: https://meirg.co.il
Omer's blog: https://omerxx.com
Telegram channel: https://t.me/espressops

Transcript

Hello everyone and welcome to the 11th episode of DevOps Topics and today. And thank you for everyone, but are here with us today. Hi everyone. So I guess like if it's a podcast or nobody really understands what we're doing, we're just looking around and looking for nothing. Yeah, there's no one here. Yes. Today, we are going to talk about thin ops, financial ops, okay, thin for not as a fin, you know, the fishes, fin, but fin for financial, okay?

So it's important if you're not aware of it, okay? Good. So, Omele, the surprising question each time we talk about phen ops. So except for thinking about the fishes, wonder what's going to be. Yes, except for thinking about the fishes, fin now. Because I, you know, implanted this idea in your head, so except for thinking about all I'm thinking about is a fin, that's not like I'm okay.

So what's the first thing, what's the first fin that comes up to your mind when you think about DevOps and FinOps, okay, what's the first thing, go, yeah, okay. So the really first thing, the first connotation would be the FinOps Foundation because FinOps is this, like DevOps, it's a cultural thing and thing, like a system we need to think about and develop, sometimes it's a title of someone in the organization.

But the first thing I'm thinking of, like intuitively, that would be the FinOps Foundation, which is just a foundation, I think it's kind of like the CNCF, you can kind of compare the two, but this is all around like everything, FinOps, what it is, what's the culture promoting the culture, promoting tools that are surrounding it, promoting how can you not certifications and like making things more institutional, let's say like that.

So basically FinOps, at the base, like we said, it's a kind of a culture, it's the culture of being informed of how much you spend in a cloud, knowing how to handle it, knowing how to prepare for it, structure, some kind of a system, or put some kind of system in place for you to manage the costs, and that can include so many things I'm probably speaking and as I'm speaking, I see you kind of thinking in the back of your mind, and you probably

have so much to say and self, I mean, if you were to approach a company, or just start a new role in a company, and they'd say, you know what, don't work on infra, all we need from you, like we're in a financial risk, I don't like to say recession because everyone are using the term too much, but we're kind of in a financial downturn lately, so companies are more often than not, tend to handle their costs rather than building new shiny infra, a product or a thing like that.

They focus a lot about cost reduction, cost utilization, however you want to frame it, and they start looking where they can save, and if you were to approach someone and they'd ask you, mail, take our bill down, what you're going to do, you have something in your mind, first place you're going to rush into, actually first I'm going to go into their bill and see which syllables they're using, and according to that, I'll see, you know,

the ones that cost the most, and then start attacking those, you know, focusing on them. But I got to say, like, I didn't expect to get this kind of answer from you, because you gave me like not only the things that comes up to your mind, but what's a thin ops culture and everything, so I just say in my own words, like to me, thin ops is like saving money when I create infrastructure, that's it.

You know, I didn't expect to get such, you know, a deep and a full answer, so thank you for that, because it will be easy for me to write down the subject for this one, because that's also like what's thin ops and culture, all right, so, so I need to throw back a ribbon, because what you said is that's a core of it.

Of course, you, at what you do, you're very much aware, you know what you're doing, you know how to build in for, you know how to save money, you know how to, I don't know, pick the best, utilize resource for a specific task.

What happens when you're working a larger organization, and then you have a team, and then you have developers, and those developers have managers, and that manager wants to set up something for his POC, for just a five minutes in the cloud, that's it, I'm just running the script, it's nothing, you know those, and you have the developers on the other hand are building, sometimes they're building stuff around the application, I just need a

small SQS queue, it's just a little instant, that's all, you know those, so when you don't have a system in place, it's hard, so that's my question basically, what do you do, how do you structure for that, do you limit them, do you help them, do you put something in place, how you have it depends on the use case, because usually I work with small companies, you know, not like enterprise-grade companies, usually I just ask guys what do you want

to do, and according to that we're starting to, you know, design the infrastructure and think of what's best when it comes to utilization and costs, you know, but most of the time, when, I don't know, when customers tell me, can you please assist with lowering down an hour bill or even when I work right now, we just go to the costs, we look at what costs the most and attack that and see if something is off, and sometimes you also get

those notifications from some cloud providers, such as, I don't know if I can call Cloud resellers, like do it international, so they can tell you listen, we saw an anomaly in your AWS account, and CloudWatch is costing you tons, you know, way, way more, and suddenly you realize that some developer left the debug trace flag on for one of the services, and then your CloudWatch logs are like $100 per day, just because of a single application.

So, you know, it really depends on how I want to attack it, if it's like, remember we talked about preventive or, you know, like before something happens or after something happens, so I think it also phinops is like that, so they want to prevent it or attack it afterwards, you know.

Yeah, so because you brought up a company's name, I have this window, which are operating in the same field, you know, but I won't bring up Zestia, the product of its own, it's just a name for something that companies can do, so as part of your phinops system, you don't probably, most companies don't call it a phinop system, they just call it saving money, okay, phinops is just this nice, you know, new buzzword, but saving money on AWS,

for example, one of the best things you can do for yourself is reservations.

You can, and I'm saying reservations, if you're not familiar with the term, it's basically committing to a certain period of time that you're going to use a certain resource, and then you get a large discount because you've committed, so maybe you paid up, there are all kinds of schemes, maybe you paid it all, sorry, upfront, maybe you're paying in chunks every month, maybe you're paying as you go, but there are certain schemes, but as long as

you're committed to a year or three years on Amazon, it's probably, it's probably the first another cloud providers, you can save a lot of money, and probably if you're using a lot of EC2s, most companies do, maybe RDS clusters or instances, elastic ash, every kind of long-running resource that can be provisioned and committed on, it's probably wise to blend yourself accordingly, just another way of treating things.

Now you mentioned something actually very important, you mentioned the bill itself, and I don't know if that's the intuitive way of spending a lot of money, let's go see the bill, because that's usually what people do with everything, not just their cloud resources, but it's very, very important, especially on Amazon, because the cloud bill is actually where you'll see the entire thing running and not running actually, but the entire description

of what's going on with my costs, because that would let you open kind of collapse little areas, and you can see them filtered by region, and you can see them filtered by a certain service, and then a certain action, and it's really helpful to understand what's going on, where you're losing money, and I think people most often are focusing on the big money, right?

I'm focusing on the thousands of dollars, I'm focusing on the huge services that I use, which is not bad, it's very much good, but there are lots of, we call them unused resources that you probably lost around, probably instances are lying in regions you don't use, maybe you have some kind of workloads that someone set up just for testing, and they tell you, no, I stopped the instance, I stopped it, but then you have EBS snapshots lying

around that you pay for, maybe elastic IPs, all kinds of resources that you may be paying if you're not using them.

We won't get into details of each and every one, it's just probably wise to run some kind of scanner, I don't think you need to build it on your own, maybe we can add in the description, there are tons of open source tools, mainly geared towards specific technologies, when I'm saying that I mean, for example, there's one scanner that you can attach to your terraform pull requests, okay, whenever you have a pull request that is going to be

merged into terraform, preferably only then is going to run and deploy to production, you can see the actual build that's going to cost, not the build, I'm sorry, the actual cost, the resource you're going to provision or delete is going to cost you further down the line.

So that's one thing, another thing is an operator you can install, I think cubed cost and there are other open source ones, you can install things into your Kubernetes cluster even if they're multi-cloud, multi-region, wherever you run, you can get summary or management on top of your cluster, something that I do, you know how in companies, I don't know if you do it, you have some kind of way to present metrics to developers, like actual graph

of CPU memory, what's going on in the cluster, yeah, so cool, we use Grafana, I guess a lot of companies do, and in addition to those services, I have not a phenops dashboard but that cost dashboard, that rises and it shows you how much we pay this month, you have this rising graph and it's filtered by services and we have a daily graph, of course, and that's just one of the dashboard and sometimes that's presented on the screen on the company,

so that's what we do and that keeps us a little bit aware to what's going on.

That's it, I think we can discuss more tools, but I think the most important part of it is just being aware that it's there, of course you're aware, you're paying for it, but as a developer and as an engineer, you're not always, you don't always understand what's going on, you don't always, even me, I'm sometimes really surprised by what's going on in the bill, I don't know, so I find myself going to the bill, like you mentioned, and then

I go to the cost explorer, by the way, that on AWS is a super powerful tool, cost of explorer is this interactive console part of the AWS console that lets you find grain services that you use, what you're paying for, why are you paying for them, you can filter by service, by region, by action, by whatever you want, so that's very useful as well.

What are you thinking? Any other thing I just want to do, like focus on cool things that happen to you when it comes to phenoms, so I'll give three examples of mine and then you give

three examples of yours or maybe less, okay? Cool wouldn't be the best work of this, cool is full paying money for nothing, that's cool, cool is for pain, pain, pain and pain, okay, so three points that I can't think of, but three things that happen to me when it comes to phenoms and I was sad about it, okay, first one is I told you CloudWatch, okay, you forget this debug tracing whatever and then suddenly you realize your bill is getting crazy, okay? Second thing, you won't

believe me, transit gateway, you know, we're all enabled yet, right? Yeah, so you just pay for nothing if you have a VPC attachment, you're like, if you don't really use that VPC and you attach it to a transit gateway, you pay for that attachment each month and if you don't really use it,

so why would you just attach a VPC? So it's not big money, but it's a waste, okay, it's for nothing, okay, so that's the second thing that I can think of that happened to me recently and the third one is, you know, those S3 rules where you, you know, the lifecycle policy rules where you move stuff

to deep archive, you know, and yeah, this is crazy, okay? So suddenly you can get like a $500 bill because you moved terabytes or gigabytes of data from standard storage to deep archive and if you if you knew it's going to if you know it's going to happen, then that's fine, you know, if that's on your schedule, you know it's going to happen, that's okay, but if it's a sudden move and you suddenly see that $500 or $1,000 just for moving stuff, then you need to think,

right, that was just for moving stuff. Yeah, because if you move tons of data, you know, like 20 terabytes or something like that from, you know, standard storage to deep archive and you know, actually what you pay for, you pay for the the requests, so the put and they tell you, you know, it's exactly like the intelligent theory, you know, they tell you if you use too many, you know, tons of requests and you do this move, it's going to cost you tons of money,

you know, so you need to think how to when I actually don't think I know that. So if I have a bucket with a hundred files and if I want to move that bucket to a lower tier, that would actually mean running a hundred time, put a hundred time. Yeah, you're thinking, let's let's talk about a hundred thousands, okay, think about you don't like, yeah, I think it was something like half a million

objects or something like that or even more. Okay, so when you're talking those numbers, so yeah, it's going to cost you because it costs like put request, you know, there's like this, that's a transfer costs. My professional response to that. So you need, I'm just saying it's okay, if it happens because eventually, okay, so this month, it's cost you'll maybe $500, but in the following month, it's going to cost you way, way less because the storage itself is going to cost you

maybe, you know, 25%, you know, like a 75% discount or something like that. So that's okay, as long as you can predict it. So I'm also okay, you know, I'm just bringing up another topic that I'm okay with having a large bill. If I know that it's going to happen, I just don't like paying for stuff that I'm not aware of, you know, that I'm like, whoa, why am I paying for that? You know, you bring something to my mind. We talked about reservations a minute ago. And most of the times,

people just reserve things. They plan for a year or three years or depending on the period. And then let's say they committed to the, the middle scheme where you pay a lot of upfront in a big junk and then you start paying every first of the every month. And we were at the first time we did that. We paid a huge chunk, which we expected. And then we started getting these huge bills on the first day of every month. And our expectancy to the end of the month was huge. It was like

we're going to pay 10,000 of dollars instead of paying for like 900. I'm just obviously changing the numbers, but we're saying what's going on? And, and that's part of it. You need to be aware. You need to be able to predict because if you chose a certain scheme, be able to financially plan it ahead. And when I'm saying that, you don't have to be like, you don't have to have a finance background. All that said, some companies, if you're large enough, I mean, FinOps is a title these days,

there's literally either a FinOps group or a FinOps person. That's what they do, a FinOps engineer. And they handle the FinOps aspects of the club. Well, I want to ask you about the commitment. So yeah, it's hard to commit. Okay. So I have my own theory. I'm not going to say it until you say yours. Okay. So there are development, you know, development resources and there are production resources. You know, we can divide them, you know, pod and non pod. Right. Now my question to you is

do you commit like to which ones do you commit? Maybe you commit to both or maybe you commit only to development or you commit only to production. I want to know like what's your strategy when it comes to committing to resources for the development resources and non development resources, you know, production resources. Okay. So what you're doing now is throwing two curve balls. I'll say why. First of all, in terms of before going to the product, you said something really

smart, are you doing it for production? Are you doing it for staging? And that brought to my mind the importance of tags in every resource you put up. And tags are good for scanning the environment and understanding what belongs to what, but it's also good for financial management because if you tag everything, not only by environments, but also by product. And a lot of times, you won't understand it, but the CTO will suddenly come and ask you to ask you how much, if we

import a new customer, how much does it cost? How much do we pay for this product? How much do we pay for the back end of that product? And have no idea. But if you put tags in place, just because that's the common practice, and you go to this cost explorer on AWS or any other platform, you launch it and you filter by tags, you can actually say what's costing you per day. I think there's also cost allocation tags, something like that in an AWS. I don't know why it comes up to like,

you'll handle those. Yeah, yeah. I don't remember exactly, but yeah, sounds very familiar. So tags are very important. Back to the core part of your question, we do it for everything, because that's what Zesti does. That's the product. We can commit over everything, and we can actually, we don't do it on 100%, but it will be in 90 and up, and we can commit to this, because if we decide we don't want to use them, we have solutions for resources that are unutilized. We can take the

commitments. And so an idea or a consultant brain right now, not your Zesti mind. So when you go to a startup and they tell you, listen, we get these workloads in development and these workloads in production, I mean, it's a startup. You don't know how they're going to grow and when a new customer is right. Right. So how would you commit to that? How would you commit to resources for that?

Would you buy the savings plan and say, let's take it 50% from what you have now and let's, and when the time comes, maybe we can increase that, or would you initially say, let's commit to 70% of what you have now? Do you have a strategy for that? I can't say I do. Honestly, I can't, because it very much depends on so many variables, and every time as a consultant, what I had to do is sit with the CTO or the VPR and V, and we had to guess, it's always based on guest

destinations you make according to your infrastructure. Yeah. It wouldn't be different between production and staging because as long as you keep the in-frame place, it doesn't really matter. So correct. Using staging is not something that's as long living as production and you may change it and you may be playing games with it, but staging, I think, often is very much like production and it's going to live as long as production lives because you need it. It's there,

they come together. So I think most of my reservations would be to both, and it would be to the core services, so I would kind of assume what kind of compute resource we're going to use and put that in place. There are all kinds of additional variables with reservations like you can buy convertible, arise, where you save a little less, but you have the flexibility of later on changing the family use, maybe Amazon released a new type of instance, maybe you want to change it

because there's flexibility. Business requirements plan for compute and stuff. So you don't really have to commit to anything specific because you don't really know what's going to happen, you know. Right. Yeah, exactly. So that's the next variable. You have all kinds of ways to play with it. There are too many to count them here, but another, but to me, I do have a rule of thumb. So I don't mind committing for developing stuff. When I said development, I also included staging.

So let's say I don't mind committing for my Kubernetes internal cluster. I don't mind committing to my CACB runners workloads. Like as long as I don't commit to full of them. So if my, let's say all my non production workloads costs, let's say on average, all right, let's say they cost

$10,000 per month. Okay. I hope not, but let's just say that. Okay. So I would commit for something like $6,000 per month for development, but, but, but for production, I feel less comfortable to commit it, because I don't know how the workload is going to grow or or even shrink, because if you get a new customer, you're going to get like, you know, tons of, of their end customers and whatever. And then the bill is going to grow like hell and then you want to commit to more and you

have a steady, steady load and stuff. But, but it might also happen, especially in those times, you know, in nowadays that you will kill a product and then you committed, you know, upfront to a lot of money when you killed one of your products and then you don't really need this workload. So for production workloads, I feel less comfortable committing, you know, unless it's like something

like 20%, 30% of them. But to development workloads, I can, you know, my guest dimensions are better, you know, for production, I don't know how the product and sales are going to, you know, kill a product or bring new customers. But for development workloads, I'm more aware of what's happening, you know, so I feel more comfortable committing to them. So that's my rule of thumb. I'm smiling because it's the complete up thinking and it made me think.

Good, good. It makes a lot of sense in it. Yeah. I didn't know. So I need you. So I needed to respond to that. So, okay, so you'll hear my opinion. I want to fight with you now. So that's why we are here. We have two different opinions. I want to know why no fight. We're here to let you know, I'm here to fight. I need to process that. It's very, it's very interesting. Okay, fine. Okay, we fight. Okay. Okay. No, it's very interesting. It's super interesting what you said.

And again, I think that's one of the things that depends on the company's work for because if you're a company with many products and one of them may fade or one of them is looking like it's going to shoot up and going to take the entire company with it or maybe work at a traditional company that how one product, if it's going where it's going, where it's going, where we're close to the company, then there's no point in doing that. But it is very important to keep

yourself aware to what's going on. So I think that's the essence of what you said. That's the core. It's understanding your company, the needs, the product, where it's going, where it's, where it's wise to commit, regardless of whether that's stating or production, although what you said makes a lot of sense to me in specifically in my context. So, so I think that's a summary of what just

happened. You know, make sure you are aware to the business before you commit to anything because sometimes the product can just, you know, be growing and you don't really know it or shrinking and you don't really know it. So that's, but that's the best thing to do because you can

commit to anything, you know, exactly. And you can keep yourself aware with things we mentioned to begin with dashboards of metrics of what's going on presenting things to the developers that they're aware, maybe presenting the costs in the pull requests, maybe putting the open source operator in Kubernetes or AWS or GCP to show you daily metrics of what's going on as far as Costco.

Okay. So much about the finish. So it is now the, the corona, I don't even know how to say it in English, but let's call it a corona, because it's funny too to completely translate it, you know, directly translate it from him would do English. So let's move to the corona. Okay. To the corona, well, we talk about cool experiences. We had this week, I'll start with mine. It's super short, you know, two weeks ago, it was, yeah, I build wasm, you remember C++ wasm with corona.

Wow, yeah, we have assembly exactly. So this week, it was yay, also for Android. Okay. So yeah, so that's like, that's like woo hoo. That was just happened. Okay. So yeah, now you all right. For me, it's mostly tools and experiences, although they are kind of the intertwined cool tool. I learned from a very good engineer friend of mine who's probably listening right now. The tool is called Trivie. Trivie is, it's an open source security scanner by Aqua, I think.

Super cool. Why I like this too so much is first of all open source, you can run it locally without any API keys or anything whatsoever. It can scan images. It can scan, it can scan, sorry, your AWS attacker, depending on the service. It can act, it can, first of all, scan Kubernetes clusters, but it can also operate as an operator installed in the cluster and reporting security issues. Tons of other, how do you start it? Subscanners in the tool. Trivie, it's T-R-I-V-Y. Okay.

Right. Really cool. I suggest you go check it out. It can scan so many things and it's so easy to implement. So I started incorporating in RCI pipelines, starting off by just containers, but I think we'll take it forward to anything else you can scan. So that's mine for this. Okay. So that's it for today. So thank you for listening. Thank you, Gamova. Thank you, everyone. Okay. Okay. I'll see you next week, Amigo. And thanks to everyone in the crowd. Thank you,

made a full being here. And bye. Bye. Bye.

Transcript source: Provided by creator in RSS feed: download file