Azure Purview - podcast episode cover

Azure Purview

May 21, 202135 minSeason 1Ep. 28
--:--
--:--
Listen in podcast apps:

Episode description

In this episode Michael, Gladys and Mark talk with guests Gopal Shankar and Arvind Chandaka discuss a new data governance product Azure Purview. We also discuss Azure Security news for the following: Azure Monitor, Storage, cryptography, Zero Trust, Incident Response, Azure Information Protection, Ransomware and more.

Transcript

Welcome to the Azure Security Podcast, where we discuss topics relating to security, privacy, reliability, and compliance on the Microsoft Cloud Platform. Hey everybody, welcome to Episode 28. This week is Mark, Gladys, and myself. We also have two guests, Gopal Shankar and Arvind Chandaka, who are here to talk to us about Azure Purview and Azure Information Protection. But before we get to Gopal and Arvind, let's head over to the news. Gladys, why don't you kick us off?

Actually, the first new that I want to talk about is about Azure Information Protection Unify Label Clients. There is a new version that is general availability. As many of you know, Azure Information Protection, the administrative interface was deprecated as end of March. But the information rights management capability are still needed, even though also certain products like Office have embedded capabilities.

For example, you may need it for third party files, such as PDF and non-office file supported, and this is provided by the client. So this new version includes a lot of scanner, usage, logging, diagnostic, and report improvements. If you're not familiar with the scanner, basically it's used to search for sensitive files within storage like SharePoint and File Shares. So it will be really good to upgrade to this latest client.

The next news that I wanted to talk about is attribute-based access control, which is available under Azure Storage. If you're not familiar with this, attribute-based access control or ABAC is an authorization strategy that defines access levels based on attributes associated with security principles, resources, requests, and the environments being used. Azure ABAC builds on role-based access control by adding conditions to Azure role assignment.

I'm really excited about this because it expands the zero trust principles further by enabling one to author condition-based on resource and request attributes. Finally, I wanted to talk about Azure AD signing logs that are currently in preview. Azure AD has some signing logs previously, but these ones are enabling logs for non-interactive user signings, service principles signings, and managed identities for Azure resource signings.

We released some instant response playbooks, which are really built on our experience from our dark team, our detection response team that's doing investigations and incidents as well as some internal Microsoft teams as well working together and providing, hey, these are playbooks on how to deal with really three different popular attacks, password spray and phishing and whatnot. Those are out. So far, these are actually landing really well.

Normally, I get five or 10 likes on a tweet, and I'm like, woohoo. This one's sitting at somewhere around 800 now. So I'm like, oh my gosh, we definitely hit a nerve there on what people need. So we're definitely looking closer at that. How do we invest and keep getting people what they need? So definitely check those out. You are all there that AKMSI are playbooks. Personally, I also took on Tech Reviewer Role for a book on Microsoft certifications.

So looking forward to reading through that and trying to make that as high quality as we can. I made some more guidance. I just wanted to bring this up to the top of mind. There's been a lot of headlines lately around the colonial pipeline attack and whatnot, and that is no exception. There is a lot of ransomware going on right now.

So I just want to remind folks to check out the AKMSI human operated because those ransomware gangs have a lot of profit and a lot of technical debt that they can defend or technical debt that they can exploit in terms of having run an IT with security as a fairly low priority in many cases for the better part of 30 or 50 years in many organizations. And so there's a lot of opportunity for these attackers to really run rampant.

And they finally found a model that will allow them to do so to profit. So please, please, please follow this guidance. Get secure backups, protect against lateral traversal, and work your way through the rest of the list as well. But we do have the guidance, full plan, objectives, key results, metrics, stakeholders, checklist, technical links, et cetera. We really tried to make it as complete as possible. So please check it out.

Zero Trust Principles, core principles, just as a reminder, the open group released those not too long ago. So we've got a link to that. So you can check those out. It's a really nice set of principles to help organizations kind of understand Zero Trust, get their head around it, in a very vendor agnostic kind of way. I'm actually co-chair of the Zero Trust Architecture Working Group over there. So definitely check that out.

And don't quite have the cyber reference architecture yet ready to announce. But possibly by the time we publish this podcast, it'll be out. But that one is just about ready to release. And then we do have a fun little surprise as well, fairly big one actually, that will be coming along with it as well. So those are coming soon. Just a bit of a teaser there. So there are a few items that sort of piqued my interest over the last couple of weeks.

The first three are all to do with Azure Monitor, as most of you should probably know, Azure Monitor is primarily there for data plane and control plane management or notifications and alerting and so on. Three announcements that I saw, the first one is support for customer managed keys for encryption of data at rest in Azure Monitor.

As you can imagine, some of the information can be relatively sensitive in Azure Monitor, even though best practice dictates that you shouldn't store anything sensitive in a logging infrastructure, it could happen. So some customers have asked for control of the encryption keys, so that is now available. The next one is in public preview, is the ability to have a one minute frequency log alerts update in Azure Monitor.

I don't know, to be honest with you, I don't know what the old frequency was, but I can tell you one thing, it wasn't one minute. So now we've got that ability that's in public preview. The other one, a couple of years ago, it worked at a financial organization or with a financial organization. And one of the people there, they're really top of her game. I was a lady by the name of Ronnie Kwan, and she's just written an article, a blog post, on using Azure Monitor with PrivateLink.

Fantastic article, shows you how to hook it all up, how it all works, some of the pitfalls, one of the best sets of documentation I've seen on the topic. And as you're probably aware, if you're listening to any prior podcasts, one thing I've said all along is one thing we're seeing across more and more PAS services in Azure is support for customer managed keys for data at rest and for PrivateLink, private endpoints. So here's an example of Azure Monitor meeting two of those goals.

The next item is to do with storage accounts. That's the ability to put a policy in place that prevents the use of shared key authorization, which means that you're only gonna use Azure Active Directory. Some customers I know, they only want AAD at the data plane, and they don't want the use of shared keys, and this is a way of enforcing that. And then a self-serving note, I wrote three blog posts in the last couple of weeks.

One is about some of the best security practice that I can give you in my humble opinion. I'm not gonna give the game away, go ahead and read the blog post. Another one is about being pedantic about cryptography. In other words, when you're talking about keys, and you say, how are you gonna encrypt with a key? Which key? Well, we're gonna rotate keys. Which key are you gonna rotate? Is it their encryption key? Is it a key encryption key? Because they're two totally different things.

They're two different, where things can go wrong. So be really pedantic about your wording, especially when it comes to crypto. And you have to, it's kinda funny about that, Michael, is when you're dealing with the upper end of sort of the security organization chart and the CISO and working with the business, it's almost the exact opposite, where you have to talk about risk and these sort of fuzzy concepts that aren't really well-defined.

But when it comes down to like the crypto, technically you have to be extraordinarily precise. So I just, I love that contrast. And it's the weirdness of security. You know, I've been in so many conversations with customers where literally my opening statement is, I'm going to be really pedantic with my wording when it comes to the crypto. So when you're describing something and it involves crypto, don't be surprised if I constantly keep asking you, which key are you talking about?

Or when you say you're doing this, what do you actually mean? Because again, the devil's in the details when it comes to crypto. And I really want to know which stuff you're talking about. I mean security in general, but crypto specifically.

And then the last one, the last blog post was about when David LeBlanc and I wrote the second edition of Running Secure Code, I put a section in there called the Attackers Advantage and the Defender's Dilemma, which talks about the whole asymmetry of cybersecurity. And someone brought up a topic on LinkedIn just recently and this exact asymmetry came up in this conversation in LinkedIn. It made me think about this section of the book.

So basically essentially sort of reprinted that part of the book in a blog post and commented on it, what is it essentially 20 years later. So anyway, take a look. So that's it with the news. So now get onto our guests. This week we have Gopal Shankar and we have Arvind Chandaka, who are here from the Azure Purview and Azure Information Protection Teams. Gentlemen, thank you so much for joining us this week.

Gopal and Arvind, would you mind introducing yourselves and what you do at Microsoft and how long you've been here? Thanks Michael, thank you for having us on this podcast. We are really excited to be here. My name is Gopal Shankar and I work as a senior program manager in the cloud customer experience engineering team, which is part of the cloud security team. I've been with Microsoft for 17 years and in this team for about a year.

In my role, I focus on product adoption, product development, specifically around Azure Information Protection and Azure Purview. I work with select set of customers to help maximize their investments in these products. We're also the voice of the customers so we take customer feedback, feature asks and relay that back to the product groups. And hi everybody, my name is Arvind Chandaka, also from Gopal's team.

I'm also a program manager from the team, working on all the things that Gopal said above and particularly working on a lot of different feature initiatives that we have on growth and so on and so forth. So very excited to be here talking with everybody. Hey, thanks for the introduction. So the first question, so what is Azure Purview and why do we need it? That's a great question. So Azure Purview is a new product.

It is a unified data governance service that helps customers to manage and govern data on-premise, multi-cloud as well as software as a service. It is a cloud-based service in which you can register data sources, scan data and get deeper insights about your data estate. Your question, why do we need it? As organizations embark on digital transformation, it is clear that they are generating data everywhere, right? From IoT devices to operational devices to analytical data.

As they migrate and modernize, this is becoming even more important. Data is everywhere spread across business units and geographies too. So with Azure Purview, customers can create a holistic map of their data landscape with automated discovery, classify the sensitive data, which is super critical for security folks and have a deeper understanding of the data of the VH. I'm gonna do a follow-on question. I spoke in the news about a Azure Information Protection Scanner.

Could you explain a little bit the difference between Purview and the scanner that I mentioned since I mentioned that we could scan data sources? Sure, so Azure Information Protection is our solution for scanning data on-premise.

If you wanna understand what kind of data you have in your on-premise, then you use Azure Information Protection to scan, sorry, Azure Information Protection Scanner to scan your resources to understand what kind of sensitive data you have so that you can classify and manage it. Purview takes a little beyond that. It's more about managing data on-premise as well as in the cloud, right? So it's going to help you to manage data across clouds and also on your on-premises.

One of the things I want to add, and it's a very important distinction, is AIP and AIP scanner overall is highly focused on information worker data. So this would mean, you know, office documents, Word, Excel, PowerPoint, additional ones like PDF, and so on. So these kinds of files that are basically sitting on on-prem file fairs, SMB drives, and so on and so forth, these are the targets for the AIP scanner, versus Azure Purview focuses more on operational and analytical data.

So an example of a sort that Azure Purview would do, would basically scan on-prem, would be something like a SQL server. And inside the SQL server, you could have basically all of these kinds of data rows that could help describe application data as an example, and you'd be able to collect that and correlate that in Azure Purview, which is slightly different than what we're focusing with AIP. So is the product available now? So we launched this product in December.

The product is currently in public preview. We've been testing the product, we've got a lot of customers have signed up and we're getting great feedback. It's likely to be generally available sometime in the second half of the calendar year. So until the point it will be in public preview. I mean, it's got the Azure in front of it, so I can make an assumption that's in the portal, but how do you get access to Purview? Is it something available to everyone?

Or, you know, and where do you get to it? Like how would people actually get to try it out and check it out? Sure, so you definitely need to have an Azure account with an active subscription. That account must have permissions to create resources under the subscription. So simply sign into your Azure, you know, under resources, look for Purview, create a Purview instance.

Once you have the instance deployed, you launch Azure Purview and then make sure that you have security principles added to various data plane roles that we have. We have Purview Data Reader, Curator and Administrator. So based on your, you know, needs, add respect to, you know, users to these groups so that they can access this portal. And there you go, you're all set. You'll be able to view all the data from there on. Cool, now who should actually be going to that portal?

I mean, you know, cause we're talking about data folks here and I know there's a lot of different interests in data, like, hey, how do we, you know, find new markets and get more insights on customers and our operations and how do I keep it secure? Like so what roles would interact with Purview and would use it? That's a great question. You know, Purview caters to a very wide range of personas.

So it provides a single plate of glass view data in your data catalog, give you an example, you know, for example, Chief Data Officers who will benefit from the holistic and, you know, coherent view of data estate, right? Once you have all the resources configured and scanned, it's gonna give you that birds eye view. This helps them to understand where their data is. They can have, they can view variety of reports in the dashboard.

Risk and compliance officers, you know, they can understand the risk of the data and, you know, what needs to be done from a compliance standpoint to meet their organizational needs or regulatory requirements. So you can actually group data sources into collections and have a nice hierarchy view of your enterprise and manage data from there.

CISOs are interested from a security aspect of the data, data source administrators, you know, they wanna make sure that they can scan all the data that's available in the enterprises, whether it is on-prem or in the cloud. So they will be able to, you know, get all those resources into Azure Purview.

And finally, you know, the data consumers, the business users, you know, who will be actually consuming this information, they will be able to search, understand where the data comes from as well as, you know, how it is classified and how can actually they get in touch with the owners of the data. So those are some of the personal assets. It even expands beyond that too. So ultimately, what kind of problems are we trying to solve with Azure Purview?

So, you know, generally we've talked to many customers, you know, customers today, you know, have a very manual process. They have homegrown solutions that do not adapt well and grow, you know, with the data growing in the environment. And it's a very costly affair as well as full of gaps, right? There's a sprung to human error. And Purview helps to reimagine the data governance in the cloud.

It empowers data consumers to find valuable, trustworthy data, you know, which is spread across the enterprise, right? It helps you discover data. Data consumers, for example, can discover the data in the enterprise and, you know, obviously this has been a challenge for them. There is no one place to go, you know, creating and maintaining documentation for data sources can be very difficult and ongoing effort. So it becomes a barrier to share data across the enterprise.

So Purview solves that problem too. From a security administrators perspective, you know, data is constantly growing and sharing in different ways. So the task of discovering and protecting, governing these data is a super humongous task, right? So it is super important to make sure the content is being shared with the correct people, you know, applications with the right permissions.

So understanding the risk levels in the organization based on the sensitive data type that resides, such as credit card numbers, social security, et cetera, et cetera, you need to constantly monitor these resources for managing sensitive data. So these are the problems that, you know, Purview will be able to solve once you onboard all your sources into Purview. So you have this one place to go to basically manage your data and the security aspects of it.

So you explained briefly what Purview was used for, but can you walk through a fuller scenario? Sure. So imagine a situation where you have everything manual today and you're able to only, you know, share with limited number of people, not many people are able to see it. Once you have a Purview instance up and running, the administrator is going to basically go and register all the sources in the enterprise to bring everything into one umbrella, right?

And I mentioned earlier about having this collections view where you can actually have a holistic view and also have a deeper view based on how you want to slice and dice the data. It could be by geography, it could be by function, right?

So once you have that, now you will be able to provide access to the consumers based on their role who will be able to actually go and view this data in the portal and also understand what kind of sensitive data is available, what kind of labeling is available if they have integrated that with Microsoft Information Protection so that they get this end-to-end view. They can also see the data lineage, you know, as the data moves. So that's another big feature that they have.

So if we're thinking also about a scenario, just imagine you have some subsets of data lying around. So say you had information in various Azure data assets, ADLS, Azure files, blob storage, et cetera, you also had some information in Amazon S3 as an example, maybe even on-prem and SQL Server. But all of these different kinds of sources, the workflow basically will function as this.

You can then, you could first go into sort of the area in the registered sources area for Azure Purview and then you're able to register each of these individual sources that you have around your various environments. Once you're able to register that and provide basically the necessary permissions and visibility into being able to scan those, you'll basically go through a scanning operation to discover all of that data that underlies these sources.

And so these data assets or metadata assets will be populated within your, what we call it a data state in Purview. Inside of this particular area, you'll be able to see the results of your scan and all of the sort of individual documents that exist as a result of your scan. And then you could basically filter by different kinds of settings, look into the kinds of information that you want, and so on.

From my perspective, it sounds like this is just a really radical shift in kind of data management for an organization almost like on the level of going from physical servers to VMs or from on-premise to cloud because all of a sudden, boom, your stuff is there, obviously after it's all set up and whatnot, in one report and one console, instead of having to chase after it in a thousand places. So I'm really interested in the kind of insights that you can get now that you have this in one place.

Like, what is the value people are getting out of this? Absolutely. So there's two real large insights as a result of that workflow. One is that data state or data catalog I was talking and talking about. And in this particular area, imagine you were an individual, like a data scientist, as an example, going through and trying to find the data set that you need in order to basically get your models created and test them and so on and so forth.

This keeps it all in one place because you've been able to go through and identify all of this disparate data in so many various sources. You're able to go to a single holistic sort of pane of glass in order to get what you need, get access to it and so on and so forth. So that's incredibly valuable for data consumers overall.

Another piece is actually, funny enough, it's also called Insights, our data insights pillar, where you can go into this particular tab in Azure Purview and what you'll be able to get out of it are different kinds of reports on the kinds of files you were able to scan through and the results of your scan. So as an example, Purview is looking at many different kinds of classifications as you're going through these scans.

And so classifications can be considered to things like sensitive information types, credit card data, social security numbers, driver's license numbers, et cetera. And being able to identify what is the breakdown of this kind of sensitive information within the scans that were done. What percent of that is from Azure assets? What percent of that is from AWS assets? AWS assets and so on and so forth. You'll be able to get that breakdown there.

You'll also be able to get a breakdown into any sort of sensitivity labels you can use, whether your scan worked or failed over time, understanding the different kinds of file types that are good.

It's a very rich ecosystem to be able to go through and actually see these reports because ultimately like Gopal was mentioning, you can go ahead and provide this information to your leadership, your IT team, data team security, et cetera, and be able to garner some tangible insights such that you're able to make any necessary remediations, any necessary actions to continue protecting your data, as well as maintaining a holistic database or rather data catalog.

Yeah, and as I'm thinking about this, because I'm both a business geek and a security geek, and like the security geek side of me is like going, okay, this is awesome because I can now see what data we have to protect and be able to ask the business, hey, what's important, what should we be focusing on? But at the same time, I'm now a little bit freaked out because now that it's easier for the business to find this, it's also easier for the attackers to use the same tool.

So actually one of the questions I was thinking about, like now you guys classify and label this, right? Now, is that tied in with Microsoft Information Protection? How do those two connect? Absolutely, you're right on the money for that. So just a little bit of a background for our audience who might not be aware, Microsoft Information Protection is a suite of capabilities really driven around basically security and information protection in M365.

One of the core components of MIP is basically this component called a sensitivity label. And what a sensitivity label does for an organization is it helps define basically how important that piece of information is. And a taxonomy in different kinds of corporations help denote that. So as an example, you could have something that is public, general, confidential, highly confidential, et cetera. Different corporations will do it in different ways.

And not only will this sensitivity label denote that, but in the scope of MIP, it also provides data at rest encryption and almost transparent encryption so that if it was moved around essentially to these informational worker files, you'd be still getting some level of protection from them. Now, there is an integration. There is basically a better together story where we are integrating the sensitivity labels with purview.

And so if you were to use basically E5 or rather have the E5 license, you can go ahead to Security and Compliance Center and set up the configuration of what a label is and what kind of sensitivity, sensitive information types or classifications are included there. And so that will help you essentially be able to identify in a purview scan when you find say a driver's license information and a credit card info that that label to be attached to it that signifies it as confidential.

So right now we have this currently available and hopefully later in the future, we will also introduce protection capabilities as well. I was gonna ask why one needs to classify or identify data, which you have talked a little bit about already, but I wanna make sure that the audience understand what we mean with classification, especially when we are talking to government, classification has a different connotation. What we're talking here is about labeling and or tagging the data.

So why would an organization needs to do this? Yeah, I mean, ultimately there is sort of like a information protection framework that we took from MIP and AIP and are applying it here in purview as well. This kind of concept of discover where the sensitive information is in your environment. Once you're able to discover it, analyze it and understand basically what kind of taxonomy to create off it.

And then once you get a kind of understanding of taxonomy, understand how to go about protecting and governing it within your environment. And so we're basically trying to implement that kind of framework here as well, with at least starting originally with the discovery piece where we're able to attach these same labels that you can use in the MIP world here as well in purview. As it's something that a lot of different customers that we currently have are heavily leveraging as well.

And this is just a really great synergy we have here and being able to do that and follow in the steps of what is that best practice, that overall framework for information protection. Just to add to that, what Harbin just mentioned, the platform should be automatically able to classify data and allow manual overwrite when possible. So it's the foundation for effective governance as well.

My guess is that as you purview can take data from multiple data sources and categorize it and classify it and identify it. So what does that look like right now? What sort of data sources can you use within as you purview? That's a great question. So right now you could classify a huge plot of different kinds of Azure data assets, ADLS, Blob storage, et cetera. You also have the capability now it's in public preview to basically scan AWS S3 buckets.

We also have on-prem resources like Power BI, SQL server, et cetera. And then as well as a lot of different SaaS connectors and integrations that we have. So Oracle DB is one that comes to mind. And there's a lot of different SaaS ones that we're working on right now that is sort of the brunt of a lot of the work going into GA. For the full list, of course, feel free to look at the documentation. We will be attaching that in this podcast as well.

So you could see the full list, but it's a pretty big litany of items. So what is the future roadmap look like? What is coming down the pipe? That's a really great question. Data sources are obviously one of the biggest things we're focusing on right now. So just the different kinds of connectors for different SaaS providers and so on and so forth. That's been a big ask from customers. We are working a lot on the security side as well.

There's a couple of stories going around for that access governance, alerting, and sort of the work we're doing on that front is going to help bolster sort of the security story of curvy a lot more. Unfortunately, I can't say too much other than those pieces because we are working on some pieces right now that are not public yet, but they are coming public really soon and we're excited to see how the security community reacts to it and are we looking forward to some feedback on that area?

Essentially, there are a lot of data sources that's coming which will be added. And like Arvin mentioned, from the safety aspect, there is also a couple of features that we're working on. There's a lot of discussions on the multi-cloud effort as well, so you will see them coming as we go towards GR beyond. Another big piece is actually a lot of work on the multi-cloud area. So I mentioned AWS through buckets, scanning becoming public preview.

There's a lot of other sources sort of along that area of AWS and GTP resources essentially that will also be great capabilities to have for customers. So there's a question we ask all our guests at the end and that is, do you have any last thoughts? Is there any sort of takeaway like to leave our listeners with? The biggest piece here to understand is it's a product that really impacts data overall and security.

And many times when we're talking to customers, we see that these two pieces, although they should really be one and together and talking to each other and collaborating, are very much sometimes separated. So tools like this are sort of helping with the market push and driving those conversations together. And I'd like to see it's already starting to say that it's already starting to work a little bit.

Having these engaging conversations, having sort of customers lead the way and having these kinds of conversations and hopefully we'll continue to see this market trend. Yeah, I'll just add few more things to that. We've talked to many customers, they all have one common challenge. Data is growing very fast at a very high velocity, higher volumes and also the variety of data. So it's extremely important to have a comprehensive data governance solution.

So Perview is unified data governance cloud-based solution and also supports data on-prem as well as in the cloud. So highly recommend our listeners to try Perview and help us with feedback. And with that, let's bring this to an end. Thank you so much for joining us this week. We really appreciate it. And I certainly learned a great deal. I know Mark and Gladys for you learned a great deal too. To our listeners, we trust you found this useful too. Thanks for listening.

Stay safe out there and we'll see you next time. Thanks for listening to the Azure Security Podcast. You can find show notes and other resources at our website, azsecuritypodcast.net. If you have any questions, please find us on Twitter at AzureSecPod. Background music is from ccmixter.com and licensed under the Creative Commons license. Background music playing

Transcript source: Provided by creator in RSS feed: download file