S5E25 - Azure Storage Actions - Serverless storage actions across your storage accounts - podcast episode cover

S5E25 - Azure Storage Actions - Serverless storage actions across your storage accounts

Jul 19, 202435 minSeason 5Ep. 25
--:--
--:--
Listen in podcast apps:

Episode description

This week Alan and Sam discuss Azure Storage Actions a serverless framework currently in preview that allows users to perform common data operations on millions of objects across multiple Azure Storage accounts without needing additional compute resources. It involves creating storage tasks with defined conditions, operations, and assignments. Users can monitor and manage these tasks, which run asynchronously, via metrics and reports. Azure Storage Actions is integrated with Azure Event Grid for event handling, and is supported in several regions, here is what we covered:

  • What is Azure Storage Actions and why are they useful?
  • How do you create them? What are the configuration options?
  • How much does it cost?

What did you think of this episode? Give us some feedback via our contact form, Or leave us a voice message in the bottom right corner of our site.

Read transcript

Transcript

Hello and welcome to the let's talk. Azure podcast with your host Sam Foote and Anand Armstrong.

If you're new here, we're a pair of Azure and Microsoft 365 focused it security professionals. It's episode 25 of season five. Alan and I recently had a discussion around Azure storage actions, a new serverless framework currently in preview that allows you to perform common data operations across multiple Azure storage accounts without needing additional compute resources. Here's what we what is Azure storage actions and why are they useful? How do you create them? What are the configuration options and how much does it cost? We've noticed a large number of you aren't subscribed. If you do enjoy our podcast, please do consider subscribing. It would mean a lot to us for you to show your support to the show. It's a really great episode, so let's jump in. Hey Alan, how are you doing this week?

Hey Sam, not doing too bad. How are you? I, I'm good, thank you. Congratulations. Alan is a renewed. What are your categories? Seam xDR, Siemen xDR and cloud security. So two categories there. Two categories, yeah. MVP. Congratulations, Alan. Were you at all nervous, sleepless nights?

Yeah, a little bit. It's well, maybe not sleepy tonight, but I have those anyway. But um, no, it was a bit nerve wracking, I guess because we kept getting sort of pushed on a little bit around when we're actually retold and uh, yeah, so, but we've known a lot of the MVP, the other security MVP's that I sort of know now have all renewed as well. So that's good. Um, so yeah, onwards and upwards for another year.

Yeah, they're really good. Yeah. Congratulations. Yeah, lots of um, and to all the other MVP's, I suppose as well because a lot of work that goes into, you know, I would say they set the bar pretty high, I would say and I think that's, I suppose that's a good thing, that's a positive thing for the community, you know, and it, I don't think it's just a case of filling out a form each year. I think you, you definitely have to keep on top, keep producing content, helping people, you know. So, yeah, it's really good. Yeah. I'm glad because what else did you, what else do you mainly submit? I suppose it's mainly the podcast, but is there other bits that you can.

Sort of COVID Yeah. So for the submission? Yeah, it was mainly the podcast but also included the, some of the events that I've been to where I've spoken like Infosec and some of the webinars that we've done with Microsoft as well. Plus kind of just indicating some of the, the customer connection program sort of feedback and things like that I've been doing around products. So a bit of everything really, and keeping up with certs and keeping up with the technology to make sure that we're still current and that. So yeah, it's a, it's definitely like a second job.

Yeah, that's for sure. Yeah. Well, well done. Yeah. Like you said, onwards and upwards for another year. 100%. Yeah. Okay, so what's this episode then, Sam around?

I'm gonna cover azure storage actions. It's a newer addition to Azure. I definitely checked that this time, other than the time before one of my episodes where I got very excited about a piece of technology and it'd been out forever. Yeah. So it's currently in preview and I think it's going to have some really good sort of applicable use cases, which is why I wanted to cover it. Okay, so I guess let's get into it. So what is or what are Azure storage actions?

Yeah, so there's a resource type in Azure called Azure storage. Have we done an episode on Azure storage? That's a good question.

It's over the background, it doesn't really matter, but it's like, it's probably one of the core offerings in Azure, I suppose it is a group of different technologies, I would say. But one of the big parts is blob storage. So this is where you can upload files to have them stored in Azure. And the whole point of it is that you can have a repository of your content. And when we're talking about blobs, we're talking about things like binary large objects, so things like PDF's, word documents, zip files, images, those types of anything that you can sort of save on a disk, on your machine. So yeah, you can upload these into these storage accounts. There's many different options for storage accounts, different tiering of storage, different redundancy and durability that you can set. But the whole premise is that you provide a file to azure in a storage account and it's their responsibility to keep it safe for you. So that's great. And there are some other technologies that layer on top of storage accounts like data lakes and other various different scaling mechanisms as well. But we don't really need to go into that for this conversation. Sorry, Ellen, did you find out about.

Yeah, we haven't done an episode on Azure.

Stormy days, right. Okay, that, well, I'll do an episode on Azure storage. So yeah, so we store these files essentially in folders, in containers and storage accounts. It's just sort of the naming scheme. So what can then happen is because of the size you've essentially got, I don't know what the maximum storage capacity of a storage account is. I don't know if it's terabytes or petabytes. I can't remember. There might be different tiers, right? But essentially you could store millions of files in these storage accounts and you may have to do sort of administrative work on them for different various scenarios. And I will go into that, but because it's not like an SMB share or a folder, you can't manipulate it as easily like you would in say Explorer on your desktop. It's not presented to you just as a folder, if that makes sense. It's this abstraction and this structure that they put on top of it to manage it. You can get clients for it. There's third party clients, first party clients to access the files and you see the files like a file browser like you would with Explorer. But when you're interacting with them, it's very client driven. If you want to download one, if you want to edit one, you're downloading it first and then you're saving it back up. If that makes sense, you're not making any of those changes on the server side. If that makes sense you can from the portal do, I would say, very manual work of renaming, deleting, changing metadata, changing tags, tiering options, those types of things. But the portal is really designed for manual actions on those storage accounts. And there's many different reasons that you might have to do things in bulk. So that's where Azure storage actions comes in. It's what the industry refers to as a serverless framework, which essentially means that you don't have to bring up any additional compute capacity to support this because if people need to do things at bulk in Azure storage, they might write a script for it. You might write a Powershell script that uses the SDK to then access the storage account, manipulate those storage items and then save them back potentially, or update their metadata. As an example, you might use one of the management APIs for Azure to do it as well. So there's, there's, there's all this custom work that needs to go in to support that, you know, and if you're not, if you're not sort of developer focused or, or wanting to set those types of things up because you know, if, let's say you've got a, let's say you've got a job where you want to, let's say you want to delete. I mean you can do this with like retention and lifecycle actions, but let's say you had a job where you wanted to delete certain things from your storage account. As a very basic example, you might have to write a script to do that. You might want to pick certain items to delete. It might not be just a case of it might not just be as simple as delete everything that's over a year old. Maybe you've got some business logic that you need to put into that process. So yes, you got to write your script, you got to write your business logic. And then you need a mechanism in order to run that. You might need an automation account, an azure function or logic app insert solution there, and you need to pay for that resource. You need to make sure that it's up and working, it's healthy. You've got logging in a production scenario. You want to know if it runs every night. You might have an SLA that you need to keep. So storage actions takes some of those use cases away and sort of wraps them up into a portal driven experience. So you can configure this in the portal and you create what's called storage tasks. And those storage tasks essentially have three different components, conditions. They're sort of like clauses. Conditions are there to define and filter what storage blobs that you're actually referencing that you're going to action. So you use conditions to say, hey, only show me files that are PDF's, only bring those into scope, those types of things. And then you've got operations which is actually the task that you want to perform on each one. So like there isn't, there is an operation to delete a blob as an example. And then you've got, finally you've got assignments, which is how you sort of assign this storage task to an account, but you can also target it to a subset of objects inside of there as well. So conditions are really about, I suppose just to clarify, the conditions are really about identifying the individual items themselves. Assignments are more thinking about the scope that you're wanting to look at basically. So you set this up in the portal and Microsoft handles everything about running them. You can do them one off, you can do them on a schedule. There's all of those types of controls in place as well for it.

Okay, this sounds quite interesting. It's like you said, you're saving on potential. These actions and things like that that you want to perform. You're saving on compute and things like that, or you're bringing it at least closer to the resource at least anyway, or the, the PaaS solution there rather than having something else outside of it to maybe do these actions. So yeah, how do you configure these actions on the azure storage?

Okay, so they're their own separate azure resource called a storage task. So you can just jump into the portal, search for in the top bar. As I said, as we're speaking today, they're currently in preview. So it's currently marked up as a preview service. But the first thing you would do is when you go to the Azure storage actions blade, you would say create new storage task essentially. And that's what I was describing before you select where that storage task is going to live, because it's got its own sort of home, it's its own resource. So you've got to pick a subscription or resource group name for it and a region that it's going to sit within as well, just like you would with kind of any other resource. The next tab is the conditions tab. This is where you set up this sort of, if this, then that visual builder, you can provide code to this as well. So you can script this part of it as well and give it a payload. But in the interface there's actually a sort of a query builder basically. And you can have groups of queries and sort of subqueries basically. So let's say one of your queries could be if in this visual builder you set the conditions that you're looking at and then the actions in one place. Okay, so you'll basically say if the blob name contains PDF, then set an archive tier for them. Let's say, let's say you said everything that's older than a year old as an example and is a PDF, then set a different archiving cooler storage tier as an example. You might not need to do that with this, but you might have some other sort of business logic that you want to put in there. And then yeah, let's just actually talk about some of the conditions and the operations that you can do. So when you're looking at conditions like when you're deciding what blobs that you want to target, there's quite a few different properties that you can base it off. You can look at things like the current blobs access tier, the name of the blob, the type of blob that it is, the container that it lives within, any tags that they have. The last time it was accessed, the last time it was modified, when it was created. Whether it's the current version of an actual blob itself, because blobs have versioning as well. So, you know, so there's quite a few different properties that you can actually sort of hang off to identify, identify these. So once you set up all of that, you can essentially. Let me just get back to my notes. Sorry, I just jumped around because I got them in the wrong order. You essentially then move on to assigning this blob storage task. And this is where you set the subscription and the storage account that it's actually going to be focused on. You can also set things like the run frequency, whether you want to run it once just for one singular job, or whether you want to run it on a frequency. Because sometimes when you're doing these things, you do just want to run things once instead of running it every night. You might just say, hey, I want to archive a bunch of different content once every three months. So I'll just log in and run it manually. Potentially, if you want to keep an eye on it or you want to make changes, you can. If you set it on a frequency, you can set a start and an end date to actually sort of time bound as well. And you can change the amount of the interval between each run. So it doesn't have to just be each day. But I believe the minimum is a day because the repeat is in days. So you could say seven days, 14 days, 30 days if you wanted to. But you couldn't say, I'm not sure if you could do 0.5 days as an example. I don't know if it's whole numbers. I didn't test that. I just assumed it was a minimum of a day. Basically you create your assignments, you can add tags to it just like any other resource, and then you create it and it will just start running basically. And that's all you need to do in terms of actually configuring it.

Yes. A really, really simple to do. And you can kind of understand maybe if the compute, you don't have to spin up anything. Maybe it's a shared. Shared resource. Shared compute to run that against your storage account or storage accounts. So I guess running once a day is probably somewhat of fail safer. Make sure it's not run too much, I guess. Yeah, exactly. Kind of makes sense.

Yeah, because there are costs that are incurred to. With this we always talk about at the end. But yeah, you want to also make sure that you don't accidentally set your cron format to every minute instead of every day or something like that. This is very simple. How many days do you want in between each run? Basically, it's very hard to confuse that.

Yeah. Okay. So I think you kind of started talking about some of the things you can do with them, but you know, what, what are some of the, I guess the actions out of it, because you kind of set how you can scope or define as and when they sort of trigger and scoping what, what is in, you know, in effect, in play? You know, what, what are some of the, the actions that you can take, I guess as an outcome of you detecting you, determining what's in scope for doing an action or a task?

Yeah. So there's currently seven supported operations. The first is setting a blob tier. So whether it's hot, cold or archive tiered storage. So that could be good in archiving scenarios or more advanced archiving scenarios. I think there are other ways of doing that in storage accounts, but this is another way of doing that. You can set a Blob expiry as well. So you can set an absolute expiry, which is like an actual dead date for that expiry. You can set it to never expire. You can set an expiry timeline relative to the creation time of the blob and also relative to the current time that you're running the thing. So if you wanted to say, I've got a container full of PDF's for this customer, I want them all to be moved to archive storage, and then I want them to expire in 15 years. Then you could do that as one operation in this scenario. Maybe that's your data retention period for these blobs, if that makes sense. You might still want to run this manually because you'll want to target a specific customer. But you can set up more advanced scenarios where you can run multiple operations in the same task. Basically there's delete blob. There's also undelete blob as well. So that might be for, because, you know, if you've got soft delete enabled, you can undelete a blob. That could be good for, if you have a bit of a mistake, maybe in a storage account, and you want to undelete a bunch of stuff potentially, because you might have to do that manually, you know, so that could be pretty cool. You can set tags, honor blob, you can set a blob immutability policy, which is sort of a policy that goes on top of a blob which says this can't be deleted, essentially. You can also set a date and time of when that immutability policy expires. So that's also good as well. And you can also set blob legal hold as well. True or false, essentially. So if you are wanting to, like if you wanted to quickly create an operation to legal hold a whole like folder full of content for maybe a customer or something like that, in sort of an ediscovery scenario, you can do that as well. I'm just thinking about it more from manual processes, but I assume there must be people out there that have scripts to manage a lot of these sort of blob lifecycle states as well.

Yeah. Okay, okay, question around and this might be because it's the preview, it might not be available or unknown yet, but can you, do you know if you're able to trigger them kind of like from the Azure resource, Azure API, do you think? Probably not looked into it, but I'm just thinking about like you said, if you've got a legal hold requirement kind of thing, you can go in there manually by one of the saying else you could say actually just trigger this and go and sort it.

For me it's, it's only a resource, so it can definitely be created automatically using AZ API or similar management azure.com dot so yeah, you could definitely IAC these. So you could definitely do that. There is also monitoring, there's a whole monitoring tab of how they've run and all of those things. And there is also like a full azure monitor metrics and data schema for the storage tasks themselves and when they actually run and all of that sort of stuff. So if you do want to monitor them as well, you can do quite easily, you know, because then, because your question was about the first part about creating it. But I also just want to make everybody aware that you can also track the output of after it's run using Azure Monitor, right, if you, if you need to as well. So you've sort of got full lifecycle end to end. I haven't seen anything about like a dedicated API for it, but it is it is it is there. The other part that I just should probably call out actually, is it is actually the events are pushed using Azure event grid as well. So you can subscribe other like services off the back of Azure event grid. So if you wanted to, you could trigger like a logic app or an Azure function if you wanted to run like a post run event or something like that. If that makes sense. Right. So if you wanted to put this inside like an ETL process that you might have or something like that, then that is also possible to do. I haven't gone to the sort of depth of looking at that side of things. Yeah, that could be very helpful.

Yeah. So running a task against the data, whatever it might be, and then you triggering something else to then go and process the data in the format or something, whatever it's done. Yeah, that sounds quite good actually. Yeah. It's another trigger for, yeah, because you.

Could use Azure monitor to create an alert for when it's finished and stuff like that, couldn't you? Or with event grid or even, well even with Azure, not really for Azure monitor because not used for that, but like event grid, then you could trigger some sort of programmatic update to your ERP system or something like that, or whatever, whatever it is that you're sort of using.

Yeah. You could change the status of the files to confirm they're in archiving or something, in a CRM or something. Yeah. Cool. Okay. You kind of mentioned that there is some form of cost to this. So I guess the question is, you know, how, how much does it cost?

Yes. So currently it's currently in preview. So the actual service itself is zero cost, but there is a cost associated with the underlying storage account actions that you run. Right. So if you're modifying blobs, deleting them, etcetera, the whole underlying pricing of the storage account does come into play for some of these operations. So you need to think about that side of things.

Yeah. So pulling stuff out of archive and things like that, it's not a thing you'd want to be doing unless you really need to, isn't it?

Exactly, yeah, yeah. So taking things out of archive storage, moving them to cold or hot storage. Yeah, you could, because these are en mass as well. That could be quite scary. I should probably just cover permissions and roles and things like that. There are a few limitations. So per subscription you can only have 50 storage tasks. So that's not, and a storage task can be assigned to 50 different storage accounts, if that makes sense. You could run this across fleet of them. So, and I mean 50 sounds like a lot, but it's not really in the world of azure storage because it's like, yeah, hyperscale storage storage account, storage task assignments per storage account. So like the other dimension of that is 50 as well. And you can have different storage tasks, definition versions, it looks like actually versions, the different edits that you make, and you can have 50 versions per storage task as well. I did just have. I need to just find it again. I'm just sorry, I'm just trying to find my notes on it because it's a bit frustrating because I can't up supported regions currently today. So no UK south for us yet. North central US north Europe, west Europe, west US Canada central India, Brazil, South Australia, like Germany west central France central. Good coverage, but just no UK at the moment for any of our UK listeners. So.

Yeah, I think that's normal though, isn't it, for the initial push out is normally north and West Europe anyway, so. Yeah. Yeah, I just. I always would. Yeah. I don't know. We usually get second wave, don't we? Usually? So there might be capacity issues or something like that. And they do say quite clear in the documentation, the pricing information will be published before general availability. Okay, so that may be a cost, but we don't know what it is yet.

My guess is it's going to be per storage run or something like that, or blob interaction. It's going to be like. I reckon there's going to be a. I reckon there's going to be like a million operations a month for free or something like that, and then pence per thousand or something like that. That's what I'm going to guess. I don't know anything, but that's what my guess would be.

Yeah. I mean, a lot of the cost really is around the actual storage account though, isn't it? About it moving and things like that? So it might just be per run? It might be. I know, three p per run or something. I don't know. Yeah. Or unless they. Unless it's done, they do something like x pence per or x cents per run times by how long it was running for or something.

I reckon it's not going to be. I reckon it's good the amount of blobs interacted with. Yeah. Because I think that's the metric. I think it's going to be like this run, modified or just, you know, there were a thousand blobs that were targeted. We ran a thousand successful actions, you know, on this run. So we're going to charge you 1000th of a penny or something like that, or whatever the number is. So be interesting to see how this scales, because what you'll usually find is that these sort of PaaS solutions, they start off really cheap, but then when you get to scale, there becomes like a tipping point where it might actually be more cost efficient to run a Powershell script in a function as an example. Right. Because the more abstraction there is away from you, there's less risk to you. But the trade off is per compute unit, you spend a lot more money, but you pay for the privilege of no. Or very low operational overhead and ease of deployment. So I'm not criticizing it at all because they need to make a margin on top of their ip that they've built and also the underlying services they're using as well. So it's always going to be more expensive than running a virtual machine that does millions of API calls to storage. There is going to be a tipping point at some point.

Yeah. I think that's the kind of general thing around Paas though, isn't it? Because of things. Yes, slightly off topic, but app service, things like that. If you're running a virtual machine, it might be cheaper from a your resource perspective, but your overheads of patching and everything is just out the window or the, or whatever the term is there where you're just paying for Microsoft to do it for you. And it's an easier life in some form.

Yeah. And it's really sort of democratization of technology to smaller companies and startups. Right. Because if you don't have the scale, you can get started really cheaply. Well, relatively cheaply I would say, because these services are expensive. So I love these types of solutions because they're quite niche, but when you need them, they're really handy to use, if that makes sense, because you don't have to, you're not like, oh God, I've got to write a function or I've got to write a script and I've got to run it manually every three weeks or something like that.

Or keep it up to date. I've got to spin some everything. Yeah, exactly. Yeah. So yeah, that's pretty much it. Quick one for this week, but yeah, pretty small feature, but it'll be interesting to see what they add to it, what other things you'll be able to do. I'd love blob moving. I think that would be cool. Moving them into different locations and things like that. But I don't know if they'll just keep it to like, you know, in. Blob actions kind of thing.

Yeah, in container actions, if that makes sense. Yeah. So yeah, that's pretty much. Pretty much it.

Cool. Okay, so yeah. Okay, so next, next episode then I'm gonna do, well, I'm gonna do one. I was looking and see if we actually did a Microsoft Sentinel one in a seam one. They kind of feel it's kind of similar to storage accounts. You know, we kind of talk about it, but never, never ever actually do an episode on it. And we did do one probably in the same scenario that we went, oh, we've never done one, but I think it was beginning of last season or something like that. It's been probably last year. I can't remember exactly the number. And we did one around the Defender XDR poll. So I'm gonna sort of bring it all together and talk about Microsoft's unified security operations platform. You know, bring in Sentinel and Defender XDR into one, as it sounds, one platform where you can do, in effect, run you met your operations secure operations center from. Because there's been a lot of development there. So it's worth probably calling all that sort of stuff out now.

So. Yeah, that's what I think. I think it's best to probably get that because the hot topic again.

So, yeah, it's definitely where I, everything's going, if that makes sense. Right. So it's, yeah, there's going to be like two pools for security, isn't there, at some point. Right. It's going to all just like converge into like security and data security purview. Right. That's, yeah, that's why I get the, that's, I think that's why I feel like the line is going to be drawn, you know, because we're seeing everything go into security Dot Microsoft.com now, aren't we? Like everything, you know, certainly, yeah.

So it's probably, it's just, it's, it's, it's going to be talking about the whole platform, things that will go through that. And it's kind of also like an update to the other ones as well as Microsoft are enhancing the portal and things like that. So, so, yeah, it's going to be a good one. Nice. Thanks, Alan.

Okay. So did you enjoy this episode? If so, please do leave us. Or do consider leaving us a review on Apple, Spotify or YouTube. This really helps us to reach more people like yourselves. If you have any feedback or suggestions on episodes, there's a link in our show notes to get in contact with us. Yeah. And if you've made it this far, thanks ever so much for listening and we'll catch you on the next one. Yeah, thanks. All.

Transcript source: Provided by creator in RSS feed: download file