S5E30 - Azure Container Storage - Native container volume management in Azure | Let's Talk Azure! podcast

00:01

Hello and welcome to the let's talk Azure podcast with your host Sam Foote and Adam Upshaw. Here we're a pair of Azure and Microsoft 365 focused it security professionals. It's episode 30 of season five. Alan and I recently had a discussion around Azure container storage, which is a volume management service built natively in Azure for containers, which enables cost effective performance scaling and simplified management of volumes for stateful container applications. We what are containers and their storage requirements? Why is your container storage as useful? How do you create storage? What are the configuration options and how much does the service cost? We've noticed that a large number of you aren't subscribed. If you do enjoy our podcast, please do consider subscribing. It would mean a lot to us for you to show your support to the show. It's a really great episode, so let's jump in. Hey Alan, how are you doing this week?

00:56

Hey Sam, not doing too bad, how about you? Yeah, not too bad. Thank you. I am off on my mandatory, no, not mandatory, just very common August holiday. It seems like everybody sort of rotates through August taking time off, but yeah, no, it's good to have some downtime. How about you? Yeah, it's been, been an okay week so far. It's been quiet. Seems to be that when either of us are off, it's quiet for the other person. It's like we're back. It's not like it's one way or the other sort of thing.

01:35

I see what you're saying.

01:36

Yeah, it's like on our own we're quiet either side, but when together it's busy and that. So yeah, it's been a busy week. I don't think there's been anything come out that I've been aware of. There's been a, there might have been a few things. Can't think of top of my head what they are, but I did take a quick look at the new, at the what's new in my area. Cuz I know that's a couple of weeks away. There's a couple of interesting things out there. So. So yeah and I. This is our 30th episode of this season.

02:15

Wow. It's the longest running. That is. Yeah. That is quite scary. Really. How many have we missed? How many weeks have we missed this year? It's been two, maybe two. Yes. Yeah. So yeah, considering we used to do. Was it 20 episode blocks we used to do. Yeah, I think we had a week or two week break. So I suppose we've kind of done that by it. Not by accident, but just due to. Yeah, yeah. Life and things and illness and technical. Issues every now and again and yeah.

02:54

That was it on illness and technical issues, but apart from that we've done every week. So yeah, let's see if we can just seems continue on to the end of the year. I was just trying to quickly work out in my head how many more there would be because like, come midway August. So what, September, October, November, December, what? Four months, four episodes each. 16 episodes. Looking at like 46 out of 52 weeks. Yeah, it's not bad. Yeah, that's not bad at all. That's crazy in there.

03:34

But yeah. So what we doing this week then, Sam? We are looking at Azure container storage. It's a relatively new sort of add on to the Kubernetes service. I think the reason why I sort of prioritize this one is because volumes and volume management can be kind of tricky for sort of containerized applications. So I think this is a really good addition from Microsoft. So I thought we should probably. Yeah. Prioritize taking a look at it.

04:15

Okay. And was, is this a bit of an add on to your storage account episode, is that right? Or is it slightly different? I can't quite remember. I thought it was one of the options that you had for storage accounts. But anyway, I suppose we'll talk about in a minute. Yes, yes, we will. Okay. Okay, so let's get started. So I guess, you know, should we start with containers and, you know, their storage requirements?

04:41

Yeah. So, yeah, let's zoom out a minute and just talk about the problem. I will challenge, I suppose, that we have. So a container is essentially a very lightweight. You can think of it almost as a lightweight virtual machine dependent on how abstracted away the kernel space is. What sort of denotes whether something is a virtual machine or a container. Containers have the benefit of increased isolation security. They're also more lightweight because they do share more host resources, essentially better scaling. Instead of virtual machines, a lot more lightweight as well. They can be started, shut down, rebuilt, almost instantaneous. So you don't have a lot of the weight associated with a virtual machine. So what we generally tend to see nowadays is more applications being containerized, I would say. And as applications have got more complex, or at least the orchestration of how applications function has got more complex, new sort of container orchestration tools have had to be made. And that is because we're not just using one or two or maybe three containers for an application. Some large organizations and applications can use hundreds if not thousands of containers and orchestrating them how they interconnect can be very challenging. So probably the standard in this space is called Kubernetes or k eight s for short, originally developed by. And I still, I don't know who the main contributors, contributors are. I assume it's still Google. But yeah, born out of Google's internal orchestration needs, Kubernetes has been open sourced and maintained in the public for a long time now. So if you've got a virtual machine, that virtual machine will have a disk attached to it. And usually that is a virtual disk that's attached to that. So it's sort of a file that sits on top of another machine's operating system that you, what's called mount into the, into the virtual machine. You can also do pass through of devices. So you can pass a physical device or an actual hard drive, NVme storage x, y and z into a virtual machine. So there's a few different ways you can sort of approach that. Containers have the concept of volumes. So volumes are essentially a, an amount of space that is allocated to a container for it to store data in. This might be things like your database files, any blob storage that you might need. Your application code is usually in your image, but other artifacts that your application might need, it might log to a volume XYZ. So applications generally require some level of local storage. Now when you run a container, because you can think about it like a very lightweight virtual machine, you can actually store when the container is running, you can store write files into that container. But if your container is rebuilt, you deploy a new version of your software as an example, your changes are going to be overwritten by your new image that comes down that's built your container. So that's not really a safe space to write files to. You might want to write like in theory you could write caching files or files that you could lose and regenerate, but generally you're going to want some level of. Yeah, storage essentially with it. So. And because these are sort of, well okay. In previous versions of Kubernetes there were a lot of different storage. What's the best way of storage? Connectors built into kubernetes. You could connect to things like SMB shares and various different other mechanisms. Local volumes. That's quite the thing. Kubernetes has been slowly deprecating and stripping a lot of that functionality out of the core build of kubernetes. The reason for that is to make the whole platform lighter and more manageable. So a lot of that sort of storage functionality is now in community packages essentially to bring those in. And volume management is, it is a challenge because volumes are sort of abstracted within, you know, Kubernetes itself. It's not essentially like generally volumes aren't like a folder on the host operating system. As an example, it is like a sandboxed packaged environment where you can store data. And that can be a challenge to first orchestrate against multiple containers. What if two containers need access to the same volume and they're on different hosts? As an example, that could be challenging because one host could be in the UK, another host could be in us, us east. As an example, how do you orchestrate two containers talking to the same volume? How do you back them up? What's your disaster recovery strategy? All of those things that you've got to think about as well. And this isn't local storage either. It can be a challenge with volumes for things like SQLite databases that need certain read write attributes and access. So for instance, a strategy might be to connect a container to an SMB share or an NFS share, but that won't work with something like SQLite. You can't write to network locations correctly. You do need some element of local storage in certain scenarios, and you may need performance as well, which is why sort of local managed, locally managed volumes can be really important.

12:26

Yeah. Okay, so in effect, just to sort of recap a little bit, containers are lightweight, in effect, virtual machines, but with very, very cut down information there. It's all kind of built from an image that you generate and your code is executed in that environment. And in effect, if you wanted to save any data which could be config, it could be like you said, part of a database, the database files, things like that. In effect, you need to have some equivalent of external storage to it, be it local SMB, etcetera, or just access the local disk somehow to be able to store that information there. So that like you said, when you rebuild your image or you have to restart it or anything like that, then that data is kept, you know, it's persistent I suppose, persistent storage in effect at that point, that. Is that fair to say in very simple terms?

13:32

Yeah, yeah, definitely. Okay, cool. Okay, so I guess, you know, container storage, you know, azure container storage, you know, what, what, what is that then?

13:43

Okay, so yeah, so what we essentially want for our containers is what's called block level storage, essentially mimicking as close to like a real hard drive, Ssdnvme drive. Also the way that you access and write to it, but also for performance and latency reasons, and at scale that can be a real challenge. You essentially have to manage all of that infrastructure yourself, essentially. So your container storage, I believe it's a wrapper around an open source tool called Open EBS. Open elastic block storage, I believe is the name, which is essentially a connector for different it's essentially a storage provisioner that runs inside of kubernetes that allows you to connect to different types of storage mechanisms and that sort of, I believe that's been driven from the moving out of those storage providers from the Kubernetes core. Now you could go ahead and you could use open EBS, build your own storage solution in Azure, I suppose, or even on live, no, sorry on prem to do that. But Azure container storage is going to take a lot of that away for you, a lot of the complexity away. And this does tie back into our previous conversation around Azure storage accounts, because some of the methods of storing this are actually linked into the storage account. So you can connect your containers to local NVMe storage for essentially super high performance, also local SSD. We'll talk about pricing later on, but the pricing is pretty interesting the way that they've sort of approached it. You can also connect it to azure disks, which is going to get you an element of replication there, or a more well supported replication strategy, because Azure disks is going to take a lot of that away from you. And also Azure elastic san is also supported. I believe that's to get you even more levels of scalability, essentially. So replication is a big there's two things I'm going to mainly focus on because I think it's important, which is storage pool expansion and resizing replications, and zone redundant storage. Sorry, three apologies. So let's talk about storage pool expansion. In Azure container storage, there is the concept of a storage pool. A storage pool essentially holds a number of volumes inside of it, and that storage pool is essentially resizable. You can allocate more storage to it as and when you need that sort of hyperscale. When we get to pricing, we'll talk about some of the size limitations that they're talking about here. So I think that's really important because that can be relatively complex. Resizing of data stores and their underlying infrastructure can be a real pain for organizations replicating. So replicating between machines or availability zones with the zone redundant storage that is also supported. I believe the best level of replication is with azure disks, because you've got that remote blob storage element we talked about in that in our last episode about how files are replicated between availability zones and potentially even regions. So that there, those three elements are a big focus for this, which, and they can be really hard to, yeah, that can be really hard to manage. So, you know, why is your container storage over rolling your own? You know, you've got loads of different block storage offerings there. I've talked about four. They previously only available for virtual machines and now they're being made available for containers. So especially when you're talking about low latency, some workloads are very sensitive to latency and more traditional sort of remote, relatively, I'll call it remote like object storage protocols that could be used, could be pretty, pretty bad for latency sensitive applications. So yeah, but you've also got the scale element there as well. This solution is you effectively install it into kubernetes and it's managed within the Kubernetes control plane. So there isn't a separate thing that you've got to learn to manage it. It hooks into a azure Kubernetes service like any other sort of connection. And you don't need to provision anything, you don't need to pre provision any sort of resource even in like Azure. So your total cost of ownership is essentially a lot further reduced than making some sort of storage appliance. Obviously on Prem is the more traditional one. But you've also got the scaling and consumption billing model from Azure. You essentially pay for what you use and no more than you actually need. What this can allow you to do is rapidly scale your pods out because you have the scale behind as you're sort of on your side. You can fast attach detach persistent volumes and you can really work on your pod lifecycle and build out in a really resilient way across Azure without having to build all that today. So yeah, so I think, you know, the real sort of, you know, benefits this is it's highly natively integrated with Azure. You know, it's essentially a first party service wrapped around and you know, an open source service.

21:23

All right, okay, so it's not as you said, it's not saying, it's not like a resource you deploy and then hook up kubernetes in effect install it into the kubernetes, what do you call it, the management sort of solution that sits on top of it and then that then gives you the ability to then spin up the resources, is that right? To say what you want to attach and detach in Azure, is that right?

21:49

Yeah, that's basically it. You manage it in the Kubernetes control plane essentially like you would with any other connector. There's essentially an extension that runs. You do deploy it as an extension via Azure, but then it hooks into Kubernetes natively.

22:15

Okay. Yes, that is like you said, that is really good for if you're used to using Kubernetes and things like that, then it's all in the same place. You have to worry about jumping out and into Azure to then manage that. Okay, so why. I guess you kind of talked about this a little bit. It kind of sound like you were, but why would you use this? Use a native cloud storage service.

22:42

I think the main thing is if you really want high performance block level storage, you're going to want to look at a system like this. Also, if you don't want the management overhead of essentially dealing with these volumes at this type of scale, you're going to get a lot of benefits and creature comforts from sort of outsourcing this problem to Microsoft. You could look at that in two ways. You could look at that in a sort of vendor lock in way. The underlying technology is open source. So in theory I suppose you would be able to port out. That would be my guess. I can't 100% confirm that, but. So you do give up some flexibility from that sort of standpoint. But as most of the time the challenge is resource and cost. These types of solutions can be very beneficial for those two challenges that organizations have.

23:55

Yeah, and I guess when you talk about vendor locking and things like that from a container perspective, that's not the case, is it? Because it's just seeing a volume that kubernetes is providing it kind of thing. Is that probably fair to say? It's more just around the Kubernetes. Management is obviously potentially vendor locking you into Azure.

24:16

Well, the Azure Kubernetes service is essentially just kubernetes with their load balancer and I load balancer built in essentially. So I, you know, because containers are, there's, they're stateless effectively. Right. You know, best practice is just to deploy a container image, use it and connect volumes if you want storage. So actually migrating Kubernetes to another provider or any other containerization system for that matter, you're not super locked in from that perspective. I suppose you are from a load balancing and networking perspective. But that's no different than any other provider because if you went to somebody like, I don't know, Hetzner or GCP or AWS, you'd need to use their load balancer for their networking anyway. So the volumes are really where the challenge is there. And it would be interesting to see how locked in you are with this system, but it doesn't feel like you might be necessarily really badly locked in because the underlying technology is open source. You would just have to migrate the data to a different provider and probably install it and maybe manage it yourself.

25:44

Yeah. Okay. Okay, so what are the key features we should know about?

25:50

Okay, so I think volume snapshots are really important, which essentially allows you to, as the name suggests, snapshot a volume. So it's not a backup, but it is a point in time sort of restoration point for an actual volume itself. And you can also restore those volumes as well. So if you do want to take something offline, migrate, etcetera, you can use snapshotting and restoring. What's also good that's natively supported is expanding a persistent volume. So in Kubernetes you have the concept of a PVC, a persistent volume claim, essentially saying, hey, I want this much storage, hey, I want 5gb worth of storage as an example, and your storage provider should allocate you that amount. And once you hit that limit, you're artificially limited. In Kubernetes, if that makes sense, you can expand volumes just like you can with other storage systems as well, which is, yeah, really, really important. You can also clone volumes. I'm not 100% sure why you would clone a volume instead of just snapshotting and restoring it. I assume it's to actually make a, you know, in place duplicate. You might use that for your doctor strategy potentially. There is also multi zone redundancy, so you can create what's called a multi zone storage pool, which very much like Azure storage account. When you use Azure disk storage, you can make it zone redundant as well. So you could either have standard SSD zone redundant storage or premium zone redundant storage when you actually create your storage pool, essentially. So yeah, if you want literally zone redundant storage, you can back off to that existing technology that's there and integrate it. So yeah, I've talked about Azure disk, Azure elastic sand, local nvMe, so those, the types of storage mechanisms that you're going to be connecting in. What was the other part that I just wanted to talk to you about and just make you aware of? Yeah, I just did want to talk about resource consumption because that is a, in Kubernetes your resources for each container are very highly controlled. You can split cpu cores into the, is it hundreds, I think see the hundreds or thousandth of a, I can't remember what the increments are, but you can essentially subdivide virtual cpu cores by, yeah, into tiny, tiny amounts. And that is because you might have microservices that use like five megabytes of ram. They might run once run a day and you want to limit them to a quarter of a VCPU as an example. So yeah, cpu and ram is vcpus and ram is controlled sort of inside the Kubernetes cluster itself. So I just want to talk about the resource consumption of the different storage mechanisms. If you are using local NVMe storage, you should account for 25% of your CPU cores as a CPU core requirement for local NVMe storage. I think that's just because of the performance that's there and what the infrastructure is going to have to do to support that. Should also provision 1gb of memory for that as well. If you're just using what they call temporary SSD storage. So just local SSD storage, you're looking at one cpu core and one gig of ram. If you're using Azure disks, you should also account one cpu core and one gig of ramdhennae if you're utilizing Azure elastic. San again that's not something that I've used or tested out, then you don't have any performance hit to Kubernetes itself. I believe that's all handled on the San side of the connection. So yeah, you just need to think about, you just need to think about those, those types of requirements for each connection that you're making.

31:12

Okay, cool. Okay, I guess the big question, you know, how does this, how much does it cost?

31:19

Okay, so I really like this billing model because it's Azure Kubernetes service in itself I believe is free and you essentially pay for the resources that you use. This system is very similar up to a certain amount. So between, if you, if you provision a storage pool, and I believe it doesn't matter how many storage pools you've got, but if you provision a storage pool and you're using less than five, I think it's called tibby bytes. Tib, it's not terabytes, tippybytes, which is very similar to five terabytes. I think it's just 1000 versus 1024. Can't remember, let's call it five terabytes. You're on your free tier. I don't think there's any difference in terms of the service on the free tier. It's just how big your storage pool is. It's completely free at that free tier as the name suggests. So you don't pay anything at all. When you go over five tb, you pay an orchestration charge on top of that of $0.006 per gigabyte per month. So I think that's $6 per terabyte, I think, or tb per month essentially of storage. I assume that's if you've got. So I've got a couple of examples just from the pricing page that I can talk us through. So an example is a customer's got two azure storage container pools ones. They're both local redundant storage using premium SSD's. One's four terabyte, one's nine terabyte. The four terabyte one, there's no orchestration charge because it's under the five terabyte list. You do pay for the underlying storage just like you would anyway. So for four terabytes is $379 per month in this example. So that's all you're paying. You're just paying for the underlying storage that you're essentially using. The second deployment though, that is over the five terabyte limit, but there's an extra four terabytes on top. So you take the four terabytes by the $0.006 comes to $55 a month on top of the storage. Storage is $850 so it's $910 a month essentially. So that orchestration charge on nine terabytes worth of storage is only, you know, $55, you know, a month. So I think that works out to around $11 per terabyte for an orchestration charge. Because once you get to scale, you can't really think about the free tier, right? Because it's only the first five terabytes. So if we just pretend the first free tier doesn't exist, if you just say for every terabyte of storage you've got, then you know, you're looking at $11 a month approximately on top, which I don't really think that that's too bad to be totally honest with you, you know, because under the five terabytes, which I don't, I don't know, like the sort of distribution of that, you know, of customer accounts, what the histogram is going to look like. But I assume you can fit a lot in five terabytes and I believe you can have multiple storage pools. So could you orchestrate round that architect round that? I don't know, but yeah, seems pretty, seems pretty good to me. Azure container storage is slash was in preview. If you were on the preview, you have to upgrade and install the new version into your Kubernetes customer a new version of the extension to move over to Ga. I believe a lot of the documentation still says preview even though there is actually some sort of gaeheenden pricing and. Yeah. Ga availability now.

35:39

Cool. Yeah, that's definitely not, it doesn't seem too bad to have that to give you that flexibility there. And like you said, if other mechanisms are being deprecated slowly then there's definitely a, a way to help you, you know, have a connector that's supported and as you said, five territory bytes. Yeah, it's probably quite a lot. We've probably actually butchered that. But yeah, seems like quite a lot for a storage pool you might have for a service. I mean obviously with larger organizations or larger applications that could be quite. Yeah, that could be consumed like you said quite quickly, but probably generally. Well to be fair it fed for development that it's that resistance start coming, what I call it now. But the fee to get started there's no, no friction. Yeah, yeah, you can just get. Yeah, no friction. Yeah, get building. Exactly.

36:49

No excuse and like start building, you know. Yeah, I know but I gotta think about my local redundant storage of my volumes. No you don't anymore. Just do that. It set up a timer job to snapshot your volumes. Get. Get building basically. Yeah. Yeah. Cool. Okay. Is there anything else?

37:10

No, nothing else from me on Azure container storage. Yeah, check it out. If you're running a aks cluster and yeah, it's something you want to take a look at. Previous episode call out I would say is our previous episode Azure storage accounts? We don't really touch on this too much but it was sort of the sort of stimulus for covering on this episode. Next episode. Alan, what are you covering?

37:38

Yes, I'm going to talk about Microsoft's security exposure management within the defender XDR portal. It came out in March and Microsoft secure in effect briefly high level. You know, it's bringing in the secure score and other signals from, from various things. It kind of, kind of seems like at the moment I'm going through a stage of looking at like pre breach stuff as a protection, you know, getting yourself in a good state to reduce that risk of being breached rather than thinking about the post breach sort of scenarios. So yeah, I'm going to kind of continue on that theme a little bit and look at this and how this can help identify your risks or where you are on against certain things.

38:29

Nice. Yeah, sounds great. Yeah. So it's a really new. Yeah, really good addition to sort of Microsoft's security tooling. So yeah, I really interesting to see what your initial thoughts are.

38:41

Yeah. Cool. Okay, so did you enjoy this episode? If so, please do consider leaving us a review on Apple, Spotify or YouTube. This really helps us reach out to more people like yourselves. If you have any specific feedback or suggestions, we have a link in our show notes that you can get in contact with us or leave a comment against the episodes on YouTube. Yeah, and a few thanks. Thanks everyone for listening if you made it this far and we'll catch you on the next one. Yep, thanks all. Bye.

Transcript source: Provided by creator in RSS feed: download file

S5E30 - Azure Container Storage - Native container volume management in Azure

Episode description

Transcript