S5E3 - Azure AI Content Safety - Utilize AI to create a safe environment for your users | Let's Talk Azure! podcast

00:00

You. Hello and welcome to the let's talk. Azure podcast with your hosts, Sam Foote and Alan Armstrong.

00:07

If you're new here, we're a pair of Azure and Microsoft three, six, five focused it security professionals. It's episode three of season five. Alan and I had a recent discussion around Azure AI content safety. Here are a few areas that we covered. What is content safety and the scenarios that it's used for all what is Azure AI content safety? The products contained within Azure AI content safety and what are the skus and how much does it cost? We've noticed that a large number of you aren't subscribed. If you do enjoy our podcast, please do consider subscribing. It would mean a lot for us for you to show your support to the show. It's a really great episode, so let's dive in. Hey Alan, how are you doing this week?

00:50

Hey Sam, not doing too bad. How are good, good, thank you. I think the highlight for me this week has been secure global access. I've seen you doing a lot of testing of it this week. It seems really promising, doesn't it? Yeah, it's definitely growing. It's always been an interest since it's been out, but it's definitely growing on me on how to use it. So yeah, just got to find out how much it's going to cost.

01:18

I don't think I've had a conversation with anybody that hasn't thought that it's a great idea, if that makes sense. And it always ends up in how much is it going to cost? And that's like the, I don't know, million or billion dollar question, isn't it? Yeah, definitely. Because we just need to understand how it compares to the other solutions out there in the. Exactly, yeah. Anything else that's been exciting tech wise in the Microsoft space over the last week or so?

01:52

Can't think of anything. It's been pretty busy at work, so not been a chance to sort of catch up with too much probably last week they were doing all the virtual parties for the new start of the year for all the customer connection programs for security management and entra. So yeah, just catching up with those and what the new badge system is or what the plan is for the program this year. So that's been good to catch up on.

02:31

Yeah, definitely. They're a really great source of getting involved with the technology really early on and sort of helping to shape early feedback for those products. Right. The private preview stage before public preview. Right. Yeah, definitely. And be able to shape some of them as well. So it's good to give feedback early and get it tuned more for the wider group, the wider world of use. Yeah, exactly. Definitely. Cool. So Sam, what's this week's episode about then?

03:10

Yeah, so I'm going to be covering Azure AI content safety, and it's quite an interesting use of cognitive services within Azure. I feel like it's a safety product and that's obviously in the name, but I think it's a real good use case of AI. I think it's got actual tangible benefits to it. I've been playing around with it a little bit, learning more about it, so yeah, I hadn't really noticed it until recently, and it's just something that I spent some time to understand and play around with, basically.

04:03

Cool. Yeah, no, I haven't really heard about it or known about it until you said about it last week on the podcast. Shall we start from the basics then, and about why content safety is important?

04:18

Yeah, I think to me we see content being created and uploaded into systems all the time. Probably the most prevalent example of that is a social network. You can take any picture that you like and upload it to your social media feed as an example. So that does lead to scenarios where potentially harmful content is added to those social networks. There is a big conversation about people's rights to free speech, other people's right to privacy. We're not really going to go into those types of discussions in this episode because social media networks are more of a public facing display of content. Right. You can see content from people without having to, well, you have to maybe log into your account, but you can see content from anybody else a lot of the time, sort of public content. It might not actually be public, it might be behind a login, but it is public to the users of those networks. And then in other scenarios as well. If you think you upload content to onedrive as an example, the content that you upload there, you can consume that and use that data just for your own use, maybe for your organization or your personal onedrive, but you can create sharing links and then share that content forward and redistribute it. So in some scenarios, organizations may have a regulatory reason to, or a desire to check the safety of that content as it's being uploaded. And usually this comes in the form of hateful, violent content, maybe physical or emotional violence. Increasingly, there is moderation that needs to happen on this content. We see it with social media networks where they will have content moderation teams, and that's effectively a massive human challenge at the moment, you have teams that have a finite bandwidth in terms of how much content it can review. And also we also have to think about the real humans that are having to review all of that potentially harmful content that's being uploaded. And it's very reactive in that mode. Right. It could be somebody reporting something that's been up on that site for x amount of time. So yeah, content safety is a real challenge. And I can't really claim to know the regulatory reasons or the laws or legalities around what organizations have to do in terms of content safety, what responsibility they have to their users. I'm not 100% sure, but I can imagine that private organizations that have content being added to them would have a vested interest in making sure that that content isn't harmful to other users on its platform. Just think about business scenarios, you uploading files into your CRM system or SharePoint as an example. You as an organization might want to make sure that none of that content is harmful.

08:33

Okay. Yeah, you're definitely right around having big teams, especially within the larger organizations that deal with a lot of content, being able to, like you said, moderate it and whether it's proactively as it's being uploaded where possible. But also like you said, the reporting of it within those applicational services can be challenging with resource, as you said. So how can Azure AI content safety sort of help make that content safe?

09:19

Yeah, so content safety, by the way, when we're talking about this, I'm not going to talk about any specific categories of what that it checks for because those categories can be quite disturbing. What it looks for. I don't want to bring this conversation down at all, but I'm just going to talk more generally about what it looks for and how it looks for it, if that makes sense. But I won't go into specifics as I'm talking through it, but really what content safety is doing, its primary use is for developers to check content as it's being ingested into their systems. So there are sdks, there are APIs which you can hook into to automatically test content in effectively near real time. A user drags an image onto your platform. You go to store that image somewhere, some sort of repository, maybe once it's dropped in, or as it goes to be dropped in, you can call Azure AI content safety and you can effectively get an answer back into what sort of content, category of content is inside of that content. So just to call out, there's a rest API, there's a Python SDK, a C sharp SDK, a Java SDK and a JavaScript SDK as well to go into it. And really what we are seeing is we are seeing this content safety being added into your ingestion pipeline inside of your application. So specially trained AI models are used to detect different categories of content as this data flows in. What's also great about it is there is a content studio which you can use. This is called the content Safety studio. So what you can do here is you can upload content to a portal that Microsoft has and you could effectively test your configuration, because what you can effectively do is you can say the types of content that you want to look for and you can effectively tune it as it goes in. So this is a portal where you can upload dummy content and you can see how the system would have reacted. What's also great is that you can also export the code for actually implementing that checking as well. It's probably worth me talking about the types of content and how the categories of that content works. So I'll dive into these a little bit in more depth afterwards. But we're looking at text moderation, image moderation, jailbreak, risk detection, and I'll talk about what a jailbreak is because this was new for me and protected material detection. So effectively looking for copyrighted material being uploaded. Now what happens? So you have those different types of content that could be uploaded, text images, we'll use those as our ongoing examples. Then you have what's called harm categories for them. So these are all the different categories of content that these models are trained on. And then for each of those categories there are what's called severity levels. So when you're talking about any particular category, if you're looking for fairness in content as an example, it's not a binary yes or no that content is fair or it's not, if that makes sense, right? It's going to be on a scale. And you as an organization can sort of decide where in that scale you start to flag and maybe where you start to block content for review. Maybe you flag it for human review. So you might look at something and say I want to look at content and whether it's fair. I'll use that as my example. And you might say once it gets to level, it goes from level zero, which is categorized as quote safe by the model up to level seven. So you may say that actually from the content that we're seeing, maybe our false positive rate is level one or level two. So we won't flag anything unless it's level two and above for human review. And that's something that you're going to be able to monitor and test over time with that content studio because it's probably worth calling out in that content studio. It's not just about testing that content. You can monitor your online activity so you can sort of see the logs as they flow in about what is being checked against and what levels of severity those different categories are hitting, basically. So you're able to see what your false positive rate is basically based on the average data that you've pushed through. So to start off with, you could just push content through silently, look at the content that comes through, start to pull out content to review, and start to really decide what levels that you want to start looking at.

15:57

Cool. Yes. I kind of see this as we're kind of talking about the people that are in the market for doing the moderation. Potentially this is a mechanism to still keep that service there, to do a human check in effect. But with the AI in effect saying, like you said, if you've got a risk level or a severity level, you might say, like you said, if it's above one, I want to flag it, but if it's above four, just bin it sort of thing, and then the moderators aren't exposed to that content. I don't know what you call it, the psychological thing that comes around with that, that they potentially may be. That's kind of where I'm going with it. So you can improve that, the morale or whatever it might be of that service, not having to look at the severe content. So it's kind of like you said, it's doing your first part, your first scan, giving you that bin it or it just needs a check sort of thing. So, yeah, I think that's good in itself, even if you get that sort of part in it. Even if you, like you said, you could flag it all initially, but then you're going to have a severity that you just don't. If it's pretty confident it's a four or a five, maybe, or six, I don't know. Again, it depends on what application or what services you're providing. It might be that that is just binned, got rid of and no one has to deal with that. So that's really good, I think.

17:43

Yeah, I suppose, as you say, it's a really good point. I hadn't really thought about it like that. You're giving protection not just to your users but also to your moderators as well, potentially. Right. So you're sort of saying actually we're going to say that we're going to review everything level one, two and three. Anything above that is never going to see the light of day potentially. And I suppose that's where that free speech and responsible use of AI comes into the mix. I don't think we're here for the technology more than we are that debate, but I suppose if you are an organization and it's your platform, I suppose in some respects you can decide what's on that platform or not, I suppose. And what is appropriate for what is appropriate content.

18:33

Yeah, exactly. And from the sound of it as well, it's very well integrated with a service that all your application, you've got a lot of sdks already out, the rest API. It's pretty much maybe not drop and go, but pretty ready to go to start sending stuff to it, sort of thing. So that's really good as well. Not to say I was expecting it not to be there, but it's good to have a wide range of other programming languages as well. It's not just a few of them, probably tying onto this. How do you integrate the content? Safety. Okay.

19:18

Yeah. So let's talk about the different types of content, I suppose is probably a good place to start. Let's dive into those. So the first one is text moderation, probably the most, feels like the most simplistic type of input, really. So what happens here is you send a payload to the SDK or the API of the text that you're searching, that you're analyzing the types of categories that you want to test it for. There are various numbers of categories, I'm not going to go into them, but you can specify which ones you want to check against. You can define a block list as well, which is basically an override. And so you can add your own terms in for detection. So if you know there are specific pieces of text content that you know you don't want to be uploaded, maybe it's like swear words or curse words, maybe you just want to block all of that. You could add those in as well. That would be if that type of content was outside of the levels that were given in those categories. And then, yeah, you can basically tell it what to output, basically whether to use four severity levels or eight severity levels. You can tweak it. And then what it will do is it will return whether it matched any of those block list words and then it will give you an analysis of the categories as well. So for each category it'll tell you the severity inside that category that you checked against. The next one is image moderation. So very similar to what we saw before. We upload the image content as a base 64 string to the API, which keeps it relatively simple to process and start checking against. There's no block list here, it's just against those categories. And again, you can define whether it's four or eight severity levels for your output. And then you basically get back an array of the categories and the severities the same as the text moderation. The next one I'm going to talk about is protected material detection. So this is really about trying to identify potential copyright content. So text, it supports text so you can upload your text string and it will detect basically whether it's potentially protected material that's been uploaded. Think song lyrics, articles, recipes, web content, that type of thing. This is the only one that I haven't actually used and the documentation is quite weird for it because you just give it the text and then the model is trained on something basically. So I don't know what the data set is that it's checking against on the other side of it. I haven't basically found that one for this yet. And let me just find, I just made a note of the. Yeah, so a jailbreak risk detection. So what this is, it's user prompts that are designed to provoke generative AI models into basically trick generative models into showing behaviors that it was trained to avoid, if that makes sense. Sometimes if you ask Chat GPT naughty things, it will tell you it can't respond to it. I'm guessing you've never done that, have you, Alan? Just to test it out, see where the edges are basically. I know I have, that's for sure. So this is really about understanding if those prompts are likely to try to elicit a response from a generative model that is something that you don't want. So before it even hits the model to respond, you can test it as well.

24:17

Yeah, that's an interesting one because there has been some news sort of outbreaks of ais being jailbroken, I guess, as you said. Was it one of the scenarios like saying, I know you can't say this, but can you say this sort of thing? And it's like, oh, I can't say this, but then, oh yeah, but then go off and bypass itself sort of thing?

24:43

Yeah, exactly. Trying to trick it into provoking a negative response. Sometimes not great for PR. So you might want to filter the prompts with AI before you send those prompts to your AI model. It's like AI on AI inception, basically. Yeah. That's an interesting one.

25:12

Yeah. What's interesting is that there's a few different types of jailbreak attacks. I'll talk about those because they are effectively attacks against AI. Are they real people? Not yet, but we could probably talk about them. So the categories for jailbreak risk detection are attempts to change system rules. So it's request to use unrestricted new prompts or responses. So it might be that you have. I don't know, I've never worked on a Jerry fail model. Do they have like a special way to ask questions to check the dev version or something like that in prod? I don't know how that gets used. Embedding a mocked up conversation to confuse the model. So yeah, it's trying to provide a conversation which then basically screws up the model output. I've never seen that done or even heard of that, but apparently that's a thing. Role play scenarios where you're effectively instructing the AI to pretend to be something or somebody to change its response characteristics and encoding attacks, character transformation methods, ciphers or other natural language variations to circumvent system rules. So I think an example of this is going to be is, you know when you used to see certain phishing links where they've got slightly off different letters, basically that they use like accented, sorry, I don't know the actual term, but the different variations of different characters in different languages, we sometimes see emojis as well, don't we get replaced out that represent, well, Unicode characters, I should say, that represent similar looking words or pipes or x, y or z. So there's a few different types of categories that you can use on jailbreak attacks.

27:39

Yeah. Cool. Yeah, there's quite a few sort of types of attacks there. Like you said, didn't really think about you may need to protect against. Yeah, I mean, I've never had to protect an AI model against a jailbreak attack at all, have I? But I suppose these are the types of scenarios in testing and security that you wouldn't even really know about as just like a normal user of a system. Right.

28:07

Yeah. And it's good that the output as well from the text and the picture one is, again, it's very somewhat simplistic about the outcome. It's very to the point there's no sort of beating around the bushes that. Oh well, I don't know, it's like severity is this category, is this bang you deal, you set, now set what you want to do with it sort of thing. So that's really good. How do you integrate, come back to our question just before sort of talking about the content types. How do you integrate the content as your content safety?

28:48

Yes, I think there's going to be a, the primary way that you would integrate it into your process or application stack is really going to be around those sdks, Python, C sharp, Java, JavaScript and then the rest API. And that rest API gives you that ability to sort of hit it from sort of anywhere. There are some limits for the different types of checks. So for instance for text, let me just get that off because I made a note of that. The maximum text submission length is for 10,000 characters, but for protected material detection it's only 1000 characters and it's also got a minimum length as well. So dependent on the different types of detections that you want to run, you're going to need to look at the different scenarios. There, there's really good documentation on learn. It's pretty self explanatory once you get going. So you're going to integrate. I'm not really sure what that process is going to be like. It's going to be different for every type of application and then you can sort of test and validate what you're seeing through the studio. That's going to be one time uploads of example content and it's also going to give you that monitoring ability to look at the content that's flowing through there. You can connect it to a virtual network. I have not gone to that length. So just like you can with cognitive services, I believe you can run this privately as well.

30:29

Yeah, okay. That kind of makes sense because your services come in from your front end and go through your vnets to the services rather than going across the Internet. And I guess as well using that studio as well can validate your application because you can in effect use that studio to get the scoring and then just confirm that when you throw it through the APIs sdks it should be the same service. But you're just validating that that connection is correct and you're getting your output the same, aren't you? You can trust the outcome. Yeah, that's good.

31:08

Yeah, exactly. And there is a really good section on the responsible use of AI in the documentation as well about talking about letting users know that there's moderation and what your moderation policies are being very open, that you're using AI to sort of enhance that as well. And there's some really good documentation on about how you tune your severity levels and your best practices and trying to understand what's going on. It actually really defines what a good feedback loop is and a moderation loop for this content as well. So yeah, there's a big long sort of page on that so it's definitely worth checking out.

31:54

Cool. Yeah, definitely sounds like you said it's definitely well documented and yeah, best practice there. Okay, sounds great. I guess the million dollar question might be how much does it cost?

32:10

Okay, so there's a free and a standard version. So we always like a good free version. So it's put into two. I'll talk about free to start off with because hopefully that will stay free. It is ga. So maybe, hopefully it's going to stay like that for text jailbreak, risk detection and protected material detection. They're all text records. Right. So you get 5000 text records each month. But a text record is every block of 1000 characters basically. Okay.

33:01

So basically if you sent 1000 words that would be one of your text records. If you sent 2000 characters, sorry, not words, characters. 1000 characters. 2000 characters would consume two text records basically. And I'm guessing if you sent 500 that would be one text record as well.

33:23

I think so, yeah. Basically. Their example, if a text input sent to the API contains 500 characters it will count as one text record. I don't know about lower than that. I assume that just means one would be my guess. Could be, yeah. So there's 5000 text records per month on free. There's also 5000 images as well for image detection. So you can do 5000 images a month and there's seemingly no limit. There's no chopping of that up or anything like that. What's interesting is I can't see anything in the documentation for how many categories you search multiplies that.

34:09

Right. Because I would have thought that every time you hit a different category it would charge you for a text record potentially. I can't see any evidence of that. So it'll be interesting to see if that's the case. But there's nothing on the pricing page to say that. Yeah, because you'd feel like it's doing another search against another model then, wouldn't you? So it's kind of like you said, another submission in effect.

34:33

Yeah. Okay, so that's free. So there's a hard limit of 5000 each side for free. Then you move up to standard which is the same bucketing on each side. But it's dollars. We'll go for dollars. We'll go east us. Yeah.

34:55

That's probably worth talking about that I haven't talked about okay, fine. I'll remind myself in a second. Us pricing in east us is text records. So a text record is up to 1000 characters and images is one dollars 50 per 1000 images. So I don't know how much it costs to run a moderation team like at a social network. I've got absolutely no idea how much it costs them to moderate a singular image. I don't know that this isn't something that I've ever done at scale. So I don't know if that's competitive pricing. Like 1000 images for one dollars. 50 sounds pretty good to me.

35:48

But zero point 15 cent that point per image. Yeah, but I take the example. Let's say you went on holiday and then you uploaded like 100 images to Facebook or something. Dump up to Facebook. Right? It could be that if you upload 100 images, that's going to cost you $0.15. Right? It's going to cost them fifteen cents to check all of those images for safety. 1.5 cent does even less. 100.

36:24

Oh no, it's fifteen cents. Yeah, mass. Yeah. But I don't know, is it worth doing that? And I suppose it really depends on your organization because what's really good is, let's say you had a niche, b to b. Let's say you had a private social network for like local businesses in your area as an example. I don't know, maybe you wouldn't have this requirement. Maybe people can upload and chat to each other, but maybe there's not that many users. Maybe you've only got 500 businesses in your directory, in your little mini social network and you want to protect against people's accounts being hacked and harmful content being uploaded. If you don't have a lot, let's say in your application, you only upload 10,000 images a month, right? Might not even do that. That would only be $15 a month, wouldn't it? Either to block it and have peace of mind that it's blocked, or even to give you a way to technically create a moderation or the starting point, a detection for your moderation platform.

37:41

Yeah, it's great, isn't it?

37:45

It's good that you can get started for basically nothing, but that's like a lot of pass and sort of sassy solutions in Azure. By the time that you have to do a million images per month, you're probably at a scale where you could probably create something of your own potentially, right? And I'm guessing all the big players, they must have their own versions of these. I don't know, maybe that's why this exists. But when you're starting out, at least you've got the ability to get started with very little upfront investment. Right. Which is a positive thing.

38:24

Right? Yeah. Even. And start like your development with it, you haven't got to pay a big price up front, even though $1.50 isn't much, even if you had to go up to that sort of tier.

38:37

Well, you do have to integrate with the SDK, don't you? And you have to integrate it with your moderation process or create a moderation process, don't you? If you don't have one. So there's potential costs there as well. But the core actual detection, which I suppose is the hard part because everything else is just, I don't know, other moderation or moving images around, it's not exactly that complex as a. Yeah, the only other one thing is it's not available in every region at the moment. It's quite specific. So east us, I'm scrolling down the list on the pricing page, so I get it. West Europe. And that is basically, if that is an issue, data sovereignty could be an issue there for you as well. So currently only in those locations, but it does have VNEt support. So in theory you could secure the data in and out back to your region, I suppose, couldn't you? There are ways to get around sort of public transiting of your data, I suppose.

39:52

Yeah, true. And I guess at least there's one in GDPR environment in Europe, and then you got the US one, which is probably where it started off anyway. It'll probably go to the other ones, just waiting for the AI cards to be available. Exactly. Because that's the same that we see with a lot of the AI solutions, isn't it? It's only in very limited places, that's for sure.

40:21

Yeah. And that's in effect the goal from the ignite, wasn't it, that they got new chips and it's just manufacturing, I mean, getting them out and building new AI supercomputers in the data centers. Cool. Okay, that sounds really good. And cost effective as well, I think for that service, even processing the images is with a reasonable price. And like we said, it's protecting those moderators. Having to see potentially paying zero point 15 to stop someone seeing an unsafe image is, I think, worth it.

41:05

Definitely, yeah. Or different content teams. 100%. Cool. Okay, so do you think there's anything else you want to talk about around this?

41:16

No, it's just something that I thought was, I felt that it was a good use of AI, to be totally honest with you, and I like it when we do have some really useful because generating images of squirrels is great, that's great. And classification problems that are solved with AI are really powerful. But this I feel is an actual, there's a safety element there which just sits right with me, to be totally honest with you. So yeah, it's a good use of AI.

41:54

Cool. Is there any episodes we previously did around any AI services? I know we've probably talked about copart a couple of times recently, but is there any other azure services that we've talked about?

42:07

Back in season three, episode eleven, we did azure cognitive services. I don't know how out of date that episode is now, but the core of it will still be there. And a lot of this technology is built on top of cognitive services in the background. So yeah, if you want to look at other AI related technology that's not specifically niche down into this use case, go and check out azure cognitive services. Cool. Alan, what's the next episode that we've got lined up?

42:44

Yeah, so I think I mentioned it last week that we haven't done a defender for Endpoint episode. Considering that I've probably talked about it quite a few times and I pretty much do it day to day. It's sort of bread and, well, not bread and butter, but know, dealing with it every day sort of thing. I can't believe I forgot about it. So yeah, we're going to do an episode on Defender Endpoint. What it covers, how it works, how its deployment is slightly different to other AV and EDRs solutions because I think that's worth calling out and yeah, some of the other bits and features that you get with it because I think it's quite a powerful tool set there.

43:26

Yeah, you're going to be able to cover what it integrates with in like 40 minutes or whatever. Alan, or are we just going to whistle stop? We've covered some sections of it, like in piecemeal, I think like Defender vulnerability, Mazda that we've talked about before and things like that, but then just never actually the AV itself in EDR, which missed it. We should be able to do it. If not, we'll be whistle stop touring.

43:52

Yeah. Or it'll be like a three hour episode. Yeah, don't worry, it won't be nice. Yeah. Cool. Okay, so did you enjoy this episode? If so, please do consider leaving us a review on Apple Spotify. This really helps us reach out to more people like you. If you have any specific feedback or suggestions, we have a link in our show notes for you to get in contact with us. Yeah. And if you've made it this far, thank you ever so much for listening, and we'll catch you up on the next one.

44:21

Yeah, thanks all. Bye.

Transcript source: Provided by creator in RSS feed: download file

S5E3 - Azure AI Content Safety - Utilize AI to create a safe environment for your users

Episode description

Transcript