Welcome to the Azure Security podcast where we discuss topics relating to security, privacy, reliability and compliance on the Microsoft Cloud Platform. Hey everybody, welcome to episode 27. We have a full house this week. We have Sarah, Mark, Gladys and myself. We also have a guest Sharon Shaw. She's here to speak to us about applied data science in cybersecurity. But before we get to Sharon, let's take a quick look at the news. Mike, why don't you kick things off?
Our first piece of news is that the Open Group has released the Zero Trust Core Principles white paper. It's a free but it's a registration wall so you have to sign up for an account if you're not already part of the Open Group. I'm actually co-chair of the Zero Trust Architecture Working
Group over there. And the cool thing is for those of you that are familiar with the Jericho form, which was the first formal challenge to the perimeter-centric view of security back in the day, talking probably 15 years ago now, that Jericho form was actually hosted in the security form and became part of the security form within the Open Group. And so I got to work with some of the original members of the Jericho form as we figure out how do we modernize these ideas and recognize
what Zero Trust is in that open industry agnostic way. And so those core principles came out and I think they're pretty good. I worked on them some little biased, but I want to make sure everyone was aware that those are out there. It's another great reference point just like NIST or any others for sort of a vendor agnostic view of what Zero Trust is because there's a lot of vendor claims out there that you just buy my product and you get Zero Trust, which I know our customers
are getting really tired of. The other thing that caught my eye was there was a publishing that was done of the top five VPN vulnerabilities that are being exploited by advanced actor groups. It was really interesting, triggered some thoughts, and then I'm actually going to put in two links there. One is the actual report and then the other is two kind of Microsoft's recommendations in this space because we've seen a lot of these kind of exploitation of VPNs a lot lately because
people don't patch them. Windows update is easy. It's easy to patch, right? Or Microsoft update, it's just easy to use those well-established channels or your iPhone or whatever it may be. But once you get into sort of appliances where you have to do downloads and other kind of stuff, it gets really kind of challenging and tends to get forgotten probably because it's hard, but it does. And so it's really important to get those patched, but there's more that you can do
it as secure VPN, like make sure you're not keeping credentials on it. Just use Azure AD to do your authentication. Most vendors, almost all, I think all the major ones, do that and so you can authenticate with it. So there's ways to protect it above and beyond patching as well. So I wanted to point folks at that guidance. Azure Network Security Book came out, which I'm really excited about. So that will get you a link there to take a look at that in case you're interested in
learning a lot more about that. Yeah, I think I actually contributed a little excerpt to that particular book. The other thing I want to call folks' attention to is there's a Security Technical Content Library, essentially a technical catalog of all the security content and guidance that Microsoft publishes. And so I wanted to put that link in there for folks to find it. It's a great way to sort of find a lot of our security content and guidance in one place.
Last one's a bit of a teaser. We are very close to being done with the cyber reference architecture. So the MCRA is some people like to call it. Highly complex diagram with all Microsoft cybersecurity technology. So that will be coming soon. Don't have a link. Don't have a download point yet, but we are actively working on getting that up and running and ready to go. That's all I got. Cool. So then it's me and I'm going to talk about, unsurprisingly,
a ton of Sentinel things. But first of all, we now have some Azure policy-based data connectors for Sentinel, which is really cool because Azure policy is useful. And of course, having that being coming into Sentinel is really, really helpful. So that's a good start. And as you know, we're always adding new data connectors to Sentinel. The next one I wanted to talk about, slightly different. We've now released a preview, a public preview, a number of
additional logins for Azure AD. So in the Azure AD connector in Sentinel, we used to just have sign-in logs and audit logs. There's now a number of additional log sources in there. Of particular note there is the non-interactive logons. Now, that was arguably, you could say, a bit of a blind spot because that wasn't able, that wasn't something we ingested. And in the context of a couple of things that have happened in the cyber world, some attacks that have happened
in the past few months, the non-interactive logons are pretty important. So what we've done is you can now ingest them natively into the connector. And also, the Mystic team, who are the acronym Mystic, if you don't know it, is Microsoft Threat Intelligence Center. They've also updated 24 analytics rules that are identity related, that are now performing correlations of those new non-interactive logons. There's a really cool blog post that one of the guys in my team,
Yeneve wrote, and we'll link to it in the show notes. So go and check that out because it's something that you should definitely consider if you're already using Sentinel and ingesting Azure AD. You should definitely go and have a look at ingesting those logs. And then last thing, slightly different. We did mention this a while ago on the show, but all the new security exams, the Microsoft security exams, so that's the SC200, SC300, SC400, and SC900 as well.
They are now not in beta anymore. They have gone to generally available. So if you are looking at taking them, then of course, you will now get your results straight away. I took them in beta, and I'm still waiting for my results. So fingers crossed, I passed them. But yeah, if you want to go and do them, they're now in generally available. They're out of beta. So go and have a look. And keep your eyes out because there will be in the not too near future, not too distant future,
hopefully, a lot more learning resources to help you study for those as well. Obviously, when their new exams takes a little while for that stuff to come out, but yeah, keep your eyes open because it's definitely coming. And yeah, that's my news this week. Sarah, I'm actually really excited about the SC200 and currently looking at the Azure Defender and Sentinel material. I think it's pretty good. Yeah, Gladys, I took SC200 and SC900 in beta, still waiting for my results.
Going to be quite embarrassing if I fail either of them. I'll update everyone next time, maybe, if I've got my results by then. Maybe you could give me some hints. Anyway, so I wanted to talk about this website that I found that had a lot of interesting information. This podcast is a great source of information. However, there's so much that is happening in Azure that it's impossible for us to cover all of it. So I always wanted to see a website or a place with a list of
all Azure services. And actually, I found one previously I had not found it, but it was due to these other site called Azure Charts. I had no idea that there were 250 services in Azure. I thought it was less. Anyway, so I always was asking myself, where can I find that list of information, user stories, information about the latest capability release for each service, list of regions where I could find the services, reference architecture, solutions, idea,
security. Actually, there's a section in security and compliance. And even parts of the services that have been retired. At first, it was a little bit difficult to look at these Azure charts. I wasn't sure how to get the information. But there's this video called Azure Fundamentals. I think it's video 26 that focus. I think it's like 15, 20 minutes. It goes and provides a quick overview about the site. So I really recommend watching this and then keeping these as a
source of information because it just being updated all the time. The next thing that I wanted to talk about is Azure Purview. As everyone knows, I'm really a fan of labeling and classifying data. And I have spoken about Azure Purview. As I mentioned, it helps manage and govern on-prem, multi-cloud and software as a service structure data, such as databases and storage resources. It does this by labeling data within defined resource sets, using built-in and custom classifier,
and even the Microsoft Information Sensitive label. So for example, Azure Data Lakes Storage Gen2, Azure Blobs Storage, Azure Files are some of the samples that the resource set can be used for. So now Purview Resource Set Pattern Rules is not available. And what this does, it allows you to customize or override how Azure Purview detects which assets are grouped in the resource set and how they're displayed within the catalog.
Thanks, Ladas. Hey, there's a few things that took my interest this week. The first one was featured us in preview for Azure Automation, and that's support for managed identities. As I mentioned, I think I've just got every single podcast so far. One thing you'll see more and more is more services move to use managed identities because that way, storing the credential is actually managed by Azure, and you don't have to worry about where that credential is stored
or worrying about it being compromised. So this is always a good foundation for providing client authentication for one service to provide authentication to another. And on the other side, of course, we'll use TLS for the server authentication, but that's another discussion. The other thing is kind of cool, and I'm a huge fan of this, is that Azure Virtual Machines DCS version 2. That series of VMs is now in public preview in Azure Government. So these are the
VMs that are used for confidential computing. They're the ones that have the Intel Xeon CPUs in there that can support the software guard extensions or SGX technology. So if you're building your own secure enclaves or you're running applications that can take advantage of secure enclaves, then these are the VMs that we will use. So this is there. This is great to see. The last one is we've now just put a new ability in application gateway called
URL Rewrite. The notion of URL Rewriting has been around for quite some time. It's not really a security feature per se, but you can certainly use it to provide some kind of security functionality such as writing specific headers, for example, based on URL. You might want to redirect to a different URL based on some kind of logic. So again, the concept has been around for some time, but it's now available in application gateway. And with that, that's the
end of our news this week. It's a relatively quiet week. So now let's turn our attention to our guest. This week we have Sharon Shah. She is a principal program manager in the Azure Cloud Security team focusing on data science. First of all, Sharon, thank you so much for joining us on the podcast this week. Would you mind spending a moment just to explain what you do at Microsoft and how long you've been with the company? Sure. Thank you for inviting me to this podcast.
I joined Microsoft three and a half years ago, so I lead the program manager team with five PMs. Right now, we build threat detections using machine learning algorithms in security products like Azure Active Directory identity protection, Azure Defender, and Azure Sentinel. We also own a security data platform that supports trillions of data that are processed by various detections, including the machine learning based threat detections. I want to start with a basic
question. Like, what is artificial intelligence and machine learning? And what's the difference between them? I'm really kind of curious like how we think about that. Yeah. So, you know, let me talk about those three terminologies, data science, artificial intelligence, AI, and machine learning, ML. So data science is an interdisciplinary field that use scientific methods, processes, mathematics, algorithms, and systems to extract knowledge and insights from
many structural or unstructured data. So data science is related to data mining, including machine learning and a big data. So this is a very wide, big field. Artificial intelligence in our definition is that machines or computers mimic cognitive functions that associate with the human mind, such as learning and the problem solving obviously needs to use some algorithms. And one of those, including machine learning algorithm. So machine learning is the study of computer
algorithms that improve automatically through experience. So it's a subset of the artificial intelligence. Gotcha. So like, because I went through statistics classes, you know, when I was, you know, working to get my college degree. So it's like sort of machine learning is kind of like the progression of that into like really sophisticated algorithms, right? And that's sort of like a foundation. And then AI is kind of turning it into, hey, we're trying to mimic what
humans do to reason, right? And then data science, including doing, you know, all the data analysis processing. Yeah. Okay, nice. Now, um, so how do we apply this to security? Like what does this bring us? How does this, how does bring this bring value to security? That's a great question. You know, digital transformation and the tech intensity now across all the organizations have led to exponentially like a data growth, right? And the regulations are consistently
involving the attack surfaces is growing faster. Because, you know, organizations are moving to cloud. Now you have, you know, the hybrid clouds, multi crowds, and then you have on-prem, you know, this attack vectors and the surface is just growing, you know, tremendously. And all the attacks are more sophisticated and the stealer. So, you know, the traditional like a rule-based approach no longer meets the demand of the skill and the constant changing of landscape.
And people are looking for new solutions dealing with this complexity. And what is good at, of machine learning is it's good at dealing with big data and handling with the multi-dimensional and the multi-variety of data. And it's also good at continuous improvement as the machine learning algorithm gain experience and the learning. You probably heard lots of like a talk about deep learning, neural network, all these terminologies in the machine learning.
Yeah, and I kind of smile and nod like, okay, someday I'll understand that terminology. Yeah, you just like mimicking, you know, human mind, human brain, right, be able to learn through experience. And from algorithm point of view, it's learn through data. So, it can keep learning and keep adapting to the environment change. That's why, you know, machine learning is, you know, it's the technology that can help us to, you know, keep up with this data volume,
grow this complexity of that tax. Yeah, so it's sort of like the, like, I think of it like everybody loves to tell people in the security business or really to tell people, read your logs, which is like impossible when there's a million lines a minute, right? So this is basically helping kind of do that without having to burn out like a biological mind. Right. I obviously, Sharon, I come at this from a Sentinel perspective. You know, I, when I work with Sentinel, of course,
we know that Sentinel has ML in it. But I know that there's far more to ML, AI, ML capabilities than just Sentinel. So can you tell us a bit about where ML and AI is used in different Microsoft security products? Sure. Yeah. So you asked the right person. I actually own the ML
feature in Sentinel. Sarah, you probably know. I do. I do know that. Yeah. So in addition, in, you know, we build machine-based threat detection behavior analytics in Sentinel, we also virtually every security product in Microsoft use machine learning, like Azure Defender, right? We also build behavior analytics for Azure Defender for storage. Glad this touched about Azure. You have Rob storage. You have SQL. You have files. So you have ADLs, data lakes. You know,
that's so much data in there. We do use machine learning to analyze the access pattern and the behavior to detect threats to the storage as well as like the critical security service, like a keyboard. We use machine learning to detect threats to the keyboard. So Azure Defender is an example. I mentioned Azure Active Directory identity protection. We process billions of logins every day on our machine learning platform to detect
unusual suspicious logging and a potential compromised account. And in Exchange Outlook, we use machine learning to identify phishing attacks. So literally every security product leveraging machine learning after Microsoft. That is a lot of machine learning there. I'm going to have to ask you because tell us a bit, because it's my baby. Can you tell us a little bit? Because it's something that definitely a lot of my customers are interested in. A little
bit specifically about the Sentinel ML, because I know that's your thing too. So just for anyone who might have heard about it but doesn't know much about it, watch your elevator pitch for the ML in particular in Sentinel. Yeah, I know all the Simbenders talk about the use ML and the people, is it real? Is it high? And I can tell you at least in Sentinel, it's real. We do have basic behavior analytics like in our UEBA model in Sentinel, we use machine learning. And we have
building machine learning threat detections like anomalous SSH logging or RDP logging. And we have a fusion we call advanced multi-stage detection that we actually have four different machine learning algorithms in that fusion to correlate the, you know, like the we call yellow signals and find those multi-stage attacks. You know, sometimes as a suspicious logging from a tall
browser, it's not, maybe it's a deny. But then it's followed by, you know, data expectation, followed by C2 communication, and, you know, all these steps, then it's seriously, it's an attack. So fusion detection detects those kind of attacks and many like ransomware patterns. We work with Mystic, you know, our threat intelligence threat hunters, they identify those patterns and we feed to the machine learning algorithm. And like I said previously, it learns
and can detect those emerging threats. And that's the like the outer box machine learning we have built. And we also have the platform. So bring your own machine learning to Sentinel. So you can ingest your data into Sentinel and build your model outside of the Sentinel or random outside of Sentinel and bring the signal back. So we heard customer, they have this platform scalability issue. So, you know, the BYOML platform solves that issue for those kind of customers, they have the
data scientists in their organization. That's a very high level of what we have in Sentinel. I know, I know we can go on a lot about that. But just to the interest of time, I guess we'll leave it there for that one. Thanks Sharon. Actually Sharon, I'm not familiar with what is yellow alerts or signals that I think you mentioned. Is that alerts that you need to correlate more data to ensure whether it's true positive or false positive or what exactly it is?
Yeah, so, so you know, if you, if you have a delta or talk to the security analysts or delta with those security products, and they generate lots of signals or alerts and all the alerts are in the different, they put apply different severity level on the alerts, like high severity or medium low, or some lots of them are informational. And the security analysts deal with like thousands,
thousands alerts every day. They never keep up. So most of our customers told us like they never look at any alerts severity level, like lower than medium, or even they don't have time to look at the median severity alerts. So those are basically like learn under the radar of those security, you know, SOC operations. Nobody's seen it. So we call those like yellow signals. Lots of attacks are very like still see like I said, you know, you give you can find many examples,
including, you know, the latest solo wings. So it's like hiding under your radar and going on, you, if you, you know, you think about all the security news and compromises, then people say, oh, attacker already has been in that environment for nine months before the organization and the discover the attack or the compromise, right? So that kind of, you know, it's not like there's no signal triggered. It's just all these signals are not sure and, you know, low confidence that
were not surfaced to the eyes of the security analysts. So machine learning our fusion algorithm actually correlating all these signals and find the, you know, the kind of a multi-stage attack. Awesome. Hopefully for those median signals for the customer now, they can see them because we bring so sure integration and unification that in automation that hopefully they can speed up the meantime to our knowledge and remediate. So you just mentioned a good case, a common case to use
AI ML. Are there other common use cases where one may use it? Yeah. So the common use cases, because AI, you know, it's good to find the patterns like in the huge amount of data, right? So it's very good at do behavior analytics. And you can find a spark of like excessive download from VPN or excessive upload, you know, those kinds of sparks usually indicate some problem or, you know, access to IP address or host to never seen before. So those are good, like the AI
and machine learning is good at it. Then you correlating with these abnormal behavior with the threat intel you have information you have, then you can elevate the signal a little bit, make sure, you know, oh, now this abnormal access, maybe outbound connection combined with our threat intelligence information, or this IP is a C2 or this UIL is waterhole UIL malicious. Then you combine these and, you know, we will find the trace of these attacks. So that's kind of the
good use cases for machine learning. Another one is we really think machine learning is good at finding the emerging threats in the unknowns, you know, for rules, you know, oh, I matched the minicad.exe, that's an attack, right? Or, you know, you have a blacklist, but that's limited to know. But with the machine learning and the behavior analytics, it observed the trend, it find the abnormal behavior, it finds, you know, it can detect like emerging threats or unknown threats.
So looking at all this AI and ML stuff, as it relates to security, I can't imagine this is this particularly easy. So what are some of the challenges that you come across applying artificial intelligence and machine learning in the realm of security? Yeah, good question. We've been doing this for years, like I joined the team three and a half years, and before I joined the team, the team, you know, have done this for, I think more than three years already.
We have lots of experience applying this, and we also encountered, you know, lots of, you know, obstacles and problems. And we also talked to customers, like Sentinel customers, you know, what's their issue. And they realized the machine learning is the way to go, but, you know, have trouble. So basically, the number one trouble is the data quality and the lack of uniformed schema.
Like data is everywhere, and in different format, and in different, like, meanings, even maybe the format is same, like a self format, but every field is different meaning, because it's, you know, whatever the interpretation of vendor, firewall vendor, or software vendor putting there, right. And for the machine learning algorithm, you basically, it's a garbage in garbage out. So you have to do lots of data transformation, cleaning, blah, blah. So that's one challenge. The second
challenge is a lack of labels in security. If you give machine learning algorithms some labels, they tell them, oh, this result is good, this is bad, then you will, you know, the model will learn and improve by itself. But literally, you know, because all the security information are confidential, you know, or, you know, there are lots of like data privacy or information. So literally, the machine learning model used in cybersecurity, and, you know, we don't get enough
labels to improve the model. So that's the second challenge. The third challenge, which I heard from lots of customers, it's not us, is one lack of data science resource in their organization. And, you know, you, if you want to use machine learning in security, you really need both security skill and the data science skill. And it's rare already to get like security experts. And it's also rare to get, you know, machine learning experts. And it's even harder to get like people
who have both, right? So that's a huge challenge. Another challenge, you know, with data scientists in their organization, they have problems with dealing with the large data, the scalability. So sometimes they do well in prototyping, but they have trouble to bring it to production because the data volume and they are not able to support like this kind of scale. So Azure, it's an awesome platform, you know, it's elastic, you know, and the Sentinel is running on Azure. So it inherits
this scalability, availability, capability. So that's the perfect platform for building machine learning on top of it. Yeah, I can certainly speak to the hiring security people aspect. It's pretty hard to get a good security person. I can only imagine what it's like hiring someone who's a security person and a data science person. That leads into another topic. So one thing that I've been doing over the last few months is taking all the 900 level exams, Microsoft, and one of them
is AI 900. And the reason why I'm doing it is just to essentially make sure that even though I'm a security guy is just to make sure that I'm actually focusing on, you know, the platform in general and getting a better understanding of various aspects of the Azure platform beyond security. And one of them is AI 900, which is an introduction to all the fundamentals of artificial intelligence. One thing that's talked about a lot in there, looking at the study materials
is this notion of responsible AI. Could you just give us a quick overview of what responsible AI is? Yeah, so like a machine learning model, machine learning algorithms and a model are like mimicking the human brain by thinking. And so what do you have to be really careful? Like there are many examples, you know, the ML algorithm and the gut abused, produced the results that was not the intent of the author of the model or algorithm, right? There is an example, I think earlier, Microsoft
has a chatbot called Tay on the Twitter. You guys probably know it's a chatbot and it's chatting on the Twitter with, you know, with conversation. And it got attacked. People feed it's all these languages, offensive language. And then it learned from those and, you know, spit out those offensive, you know, language. So Microsoft shut it down. If you Google like a TAY tweet,
you will find a lot of discussion and the news on it. And so this brings like, when we build out machine learning algorithm, when we build out those advanced features, for example, maybe do a HR, like a resume screen, we need to think about do we actually unintentionally have bias, the ML model, like a robot, and learn, you know, potential, learn some, you know, some, you know, some behavior that's caused a bias, or is it possible it's unintentionally have the leak of the privacy
information? You know, there are lots of study and articles on that about responsible AI. Microsoft is very serious about this. So how, how are we securing from the introduction of bad data then? Yeah, this is good question. So, so there are two things. One is like I talked about unintentional, right? And there's another aspect of is the, you know, malicious attack to the machine
learning model and to the data used by machine learning model. And our team has a trustworthy ML project, and that we worked with MITRE, you guys probably know MITRE has an attack framework for enterprise for, you know, IoT for I see it's leveraged like a widely used by in the security industry to have the security tech, the kill chain, right, the security tactics techniques. Our team worked with MITRE and we published MITRE attack framework for adversary ML to identify,
call out those cyber attacks specific to machine learning algorithms. For example, data poisoning. Gladys, you asked the bad data, right? So attack can intentionally poison the data that's used by training the machine learning model. If the model is trained on a bad data, and it will produce bad results. So this is the one. So securing the data source is extremely
important, you know, in the ML system doesn't matter. It's for, you know, the ML system is for for using insecurity or using in speech recognition or using in facial recognition, or even it's, you know, you think about if it's used in healthcare, it's about human life. So this is really important. And there are other common attacks to ML models, like an invasion attack.
Basically, there is a very famous example, like people, the researchers put a few three, like a very small sticker on the road, and it fooled the Tesla's ML model to drive to the, you know, opposite lane. So that's, and that's scary, right? And there's research, like they can pull out called a machine learning inversion test. It's about privacy, right? The facial recognition program, you know, you know, use a lot of data sampling to train the model to recognize
fusion, the face, like Windows Hello, right? So we're logging with our face. So there is a specific attack, they can reverse the train the model and discover like your face. So, so, so this called a model inversion attack that can kind of like invert the PII data from your binary machine learning model. So this definitely is a privacy concern. So there are lots of attacks
in this area. You know, if you are interested, look at, you know, the material we provide within this podcast, you have lots of to read in this area. Okay, so lots to think about the Sharon and I've learned a lot today. But if someone who's listening wanted to know more about security related AI and ML, are there any sources or
materials that you'd recommend they went to look at? There are lots of paper online, you can search, you know, just being it Google it, and you apply machine learning data science in security, you can find a lot of online. And there are, and in this area, actually, it is relatively new. And, you know, we definitely going to include some links in this podcast and for you to get started. It's relatively new. And there are lots of research and lots of green
area we can, you know, explore in this area. That's like a basically what I have been doing. And our team is doing half research half, you know, building the features in the product. And sometimes some attempt may fail. But that's fine. So before we let you go, one thing that we ask all of our guests, is do you have any final thoughts
you'd like to leave our listeners with? Yeah, you know, like I said, it's hard to find security experts in the market, and also hard to find the data scientists, it's even harder to find the both. But I would say that, you know, my background is from security. So I just like jumped into this applying data science to security, and started learning, taking courses, you know, on Coursera and LinkedIn, YouTube, and just keep learning that way. So I would say
you are passionate about applying, you know, data science in cybersecurity. Don't worry about that you don't have much knowledge, you know, or maybe you are a security expert, I don't know data science. That's okay. All your data scientists, I'm really interested in using my data science skill in security. That's fine. As long as you are willing to learn. So threats is changing. And the new machine learning technology is emerging. And the only way to be successful is continuous learning.
So, you know, my final thought is jumping to it. If you are really passionate about it, I see great future in the data, like I said, lots of green area for us to explore. Thanks for that. And thanks so much for joining us this week, Sharon. We really appreciate you taking the time. We know you're extremely busy. I learned a great deal this week. It's another example of I learned stuff I didn't know I didn't know. And to all our listeners out there, we hope you found it useful too.
Thanks for listening. Stay safe. And we'll see you next time. Thanks for listening to the Azure Security Podcast. You can find show notes and other resources at our website azsecuritypodcast.net. If you have any questions, please find us on Twitter at azuresetpod. Background music is from ccmixter.com and licensed under the Creative Commons license. Background music ends.