Questions From the Community | Episode 38 - podcast episode cover

Questions From the Community | Episode 38

AI Security Ops

Feb 05, 2026•17 min•Ep. 39

--:--

--:--

Listen in podcast apps:

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Click here to watch this episode on YouTube.

Creators & Guests

Brian Fehrman - Host

Joff Thyer - Host

Derek Banks - Host

Brought to you by:

Black Hills Information Security

https://www.blackhillsinfosec.com

Antisyphon Training

https://www.antisyphontraining.com/

Active Countermeasures

https://www.activecountermeasures.com

Wild West Hackin Fest

https://wildwesthackinfest.com

🔗 Register for FREE Infosec Webcasts, Anti-casts & Summits
https://poweredbybhis.com

Click here to view the episode transcript.

Transcript

Joff Thyer

00:01

Welcome to another fun and exciting episode of AI Security Ops with your illustrious hosts, myself, Joth Theyer, Brian Fuhrman, doctor Brian Fuhrman, should I say, and doctor Derek Banks. In fact, we'll all be doctors today. Why not? I'm not a doctor.

Brian Fehrman

00:17

Doctors all around.

Joff Thyer

00:18

That's right. Doctors all around. This episode is brought to you by Black Hills Information Security. If you are interested in AI assessment work, please visit us at blackhillsinfosec.com. Click on the contact us page, and we'll see if we can help you out.

00:33

Having said that, today's episode is all about our listeners' questions and answers. Hopefully, we'll give you some answers that are enlightening, and, somehow somebody probably shared document, which I didn't read. So I'm actually going to allow my fine colleagues here to kick it off with the first question. And, actually, when you ask a question that you want me to answer, you can just, like, play pick on Joff. You'll just have to read the question because I don't have the document up.

01:02

My bad. Okay. Let's go, and we will start with doctor Brian Furman.

Brian Fehrman

01:09

Alrighty. Could someone extract training data through model inversion attacks, and how realistic is that today? The answer is yes because that's basically the definition of a model inversion attack in which you are, sending a bunch of queries, to the data or to the model and getting data back and using, that the response data to make inferences about the training data that was used. A lot of times the examples that you'll see this on are actually image classifiers, and so essentially what they do is they'll just start feeding in a bunch of random pixels and what they might get back is, let's say, a facial recognition system and, they'll get back maybe some kind of a probability of how well those random pixels match something within, the database. And so then they just keep modifying the pixels until they get closer and closer matches.

01:58

Then And eventually, they might be able to extract out training data that's there. And so that's, you know, one way that you'll you'll typically see examples about it.

Joff Thyer

02:05

So, I'm curious, though, about one thing here. I think, Brian, fundamentally, you're right because you are extracting the predictions from the model, which is which is fantastic. This is what models do. But are you extracting the original training data? That's a slightly different question. I guess that's not the actual question being asked.

Derek Banks

02:23

So I guess that's one of the things that, like, one of the fights that I try and fight is that, you're not really storing data inside of a machine learning or AI model, right? You're basically creating math that will predict the data, right? And basically, if you looked inside of a model file, it would just be huge matrix of numbers. That's what it would look like to you, right? Like this big, just blob of numbers.

02:53

And so you're not really extracting any data as much as you're inferring and reconstructing the data, right? So if you look at like ChatGPT as a large language model, it's not condensing or it's not storing data from the internet. It's not like a database, right? It's kind of distilling and condensing it into a mathematical construction that can then re predict, like predict the data, predict the most likely output based on your input. So we use words like data exfiltration because those are the words that we've used all this time, right?

03:36

But that's not I mean, if you're gonna extract data from a you know, not not training data, but other data from a large or from a model, it'd have to be connected to the data through some kind of, like, tool.

Joff Thyer

03:51

Right. But actually, you know, sort of going digging a little bit deeper, you know, these attacks are useful because do they enable you to reconstruct a facsimile of the model? Then the answer would be yes, they do because you're extracting those inferences. And we are welcoming Brumwin to the show. Brumwin, welcome.

Brian Fehrman

04:11

Hey, Brumwin.

Bronwen Aker

04:13

I'm sorry. I didn't realize you were already recording.

Derek Banks

04:16

That's okay. That's alright.

Joff Thyer

04:18

You can just you're like the Brady Bunch. You just jumped in, and we're we're we're we're about to break out in song story about a lovely lady. Exactly. Okay. Let's go with the next question. Derek, do you have a question up in your question list?

Derek Banks

04:34

Actually, the one I picked out is actually kind of close to what Brian said. Not the same one, but a similar question, so I'm gonna skip that one. I'm gonna go with what kind of telemetry should be collected for detecting prompt abuse or API misuse?

Joff Thyer

04:54

That's an interesting Prompt so let's deconstruct that question a little bit. Prompt abuse. I'm not sure what the listener is talking about when they talk about prompting. Maybe we can make the assumption prompt injection.

Derek Banks

05:09

Right? Yeah. I think that's what they mean. It's like basically people are trying to get our models to do our model to do something unintended. How do we detect that?

05:17

And I guess taking a step back, I'll just say that I've been doing information security for many, many, many, many years now, many moons. And it's really not the norm to see people logging application traffic, web traffic in a meaningful way ever? Sure, web servers have logs. Typically what we see is by the time we actually get to those logs because of some kind of incident, they've well rolled over, right? They're not being offloaded to some central location.

05:53

So that my first thing is I would say just as a general logging rule, are you putting your key and critical application data into some kind of centralized location, and I would start with that.

Joff Thyer

06:09

Yeah. And the other issue that's slightly related here too, and I've asked I've asked some, customers this on occasion, you know, are logging around AI model use, especially if you have a chat interface to that model? And the answer normally is is no. And there are instances where logging can actually potentially get you into trouble. Right? Yeah. I mean, we know that there is such thing as too much logging in our industry as I

Derek Banks

06:39

mean, it all depends on what we're talking about, right? Like, I don't think that ChatGPT should be logging things centrally for what I'm putting into it because I we pay them and they say they're not storing those things. If it was an application in an enterprise setting, maybe that's a different story. And so I was really talking about more of like an enterprise like setting type of thing. Like we have this critical chatbot thing or critical process that people are interacting with and we want to make sure that no one's, you know, how do we detect if someone's trying to abuse it?

07:10

And so collecting the data is step one. That's my point.

Joff Thyer

07:13

Yeah, and I would also add to that question. It is often the case in most AI model deployments with regard to dangerous or, you know, even potentially, you know, malicious prompting, that guardrails are deployed around that model. And, you know, you as a, if you're providing a model in some sort of local sense that's backing some sort of application, you would probably be implementing similar guardrails around that model. And if you are, you should certainly enhance the logging on the guardrail aspect of the model. If it trips, make sure that you're logging that instance, and it would also give you a sense of how effective or not effective your guardrails are.

Derek Banks

07:58

That sort of leads into my second take on that is, okay, now we have all this, like data. How do we determine whether or not it's prompt injection? I guess my response to that would be, well, is like, well, my friend, now you have a natural language processing problem. You're not gonna write like, traditional SIM rules to get yourself out of that one. You're going to have to essentially do some kind of NLP.

08:24

And to your point, there are already pre trained models that would take in input and then return a Boolean, whether or not that matched some kind of, like, prompt injection. Right?

Joff Thyer

08:36

Right. So what was the second part of that question? We we covered the prompt injection, I think, pretty well. The second part was?

Derek Banks

08:44

Or API misuse. So prompt injection or API misuse. And what I think they mean by API misuse is probably a little broader than prompt injection, and probably goes back to kind of our first, the first thing that Brian was talking about, about model inversion is that, I mean, without prompt injection, I could send theoretically, depending what kind of information I can get out of the model, I could send prompts and measure the response and the tokens out and do some data science y type stuff and essentially try and recreate the model. In fact, there is a suspicion that the Chinese did this with DeepSeek against OpenAI. Right.

09:22

Because you could buy a $200 a month subscription and essentially get what, like unlimited queries, right? $200 a month is cheap if you're trying to, you know, it's cheap compared to what it took to to pre train the base model for ChatGPT, I bet.

Joff Thyer

09:38

Yeah. Oh, one of the other aspects of API misuse would be the denial of service aspect, of course. But if you throw a lot of traffic at an API that's connected to a back end AI model, that's gonna produce a tremendous amount of compute load because, you know, even though it's only inference, inference is still using the multiple processing units of a TPU or GPU. Right? So that's gonna run up electricity bills.

10:03

It's gonna run up compute use. And so throttling the API is probably the first step there.

Derek Banks

10:08

Yeah, I would throttle the API, but even, you know, let's say I was able to get right under the throttle threshold, the good news is is that, like, that kind of API misuse is a little bit easier problem than an NLP problem. That problem is basically looking at like, you know, standard deviations of normal baseline traffic, right? Like, why does this user have a million more queries an hour than this other, like the rest of our users?

Joff Thyer

10:34

Yeah, exactly. Actually, of the, this I'm I'm gonna give OpenAI a compliment, but I was looking into this the other day because for, coursework, I wanted to, put some constraints around how the API key was being used, and and OpenAI is actually doing a fairly reasonable job. They're putting they allow you to budget constraint. They allow you to throttle, and they also allow you, to dictate specifically which models that API key will answer to. So it's actually pretty cool.

11:03

So I'm I'm impressed by what they're doing, and I'm sure other providers have similar things. And if you're doing a local model or hosting a local model, you would have to look for similar mechanisms in your implementations. So alright. We ready to move on to the next question?

Brian Fehrman

11:17

Yeah. I think we have time for one more.

Joff Thyer

11:20

So I think we should throw this one at Bronwyn because she's been sitting there patiently just waiting and itching to answer a question. I can feel it. So somebody

Bronwen Aker

11:31

Oh my goodness.

Joff Thyer

11:32

And if Brumman has the questions document up, she can, ask the question.

Bronwen Aker

11:37

So the the question is, have you seen instances of anyone implementing secure phrases almost as safe words to protect against things like prompt injection and confusion attacks. So this concept of a safe word I haven't seen regarding prompt injection, but I have seen it come up in discussions about how to protect against deep fakes. So I'm actually not going to answer that question. I'm gonna pivot though to the fact that this concept of a safe word as a human centric defense against deep fake attacks is something that I am seeing show up much more often. And it's coming up in a variety of different conversations because at the end of the day, if we don't have a human to human means of verifying what, is sometimes called proof of life, then these deepfakes have gotten so good now that it's almost impossible to detect, at least without doing an in-depth, insane forensic analysis.

12:57

It's getting that good.

Derek Banks

13:01

This is

Joff Thyer

13:01

something Give us a context.

Bronwen Aker

13:03

A context.

Joff Thyer

13:04

Yeah. Where would this apply?

Bronwen Aker

13:05

Okay. This would apply. You're you're working for a company, and in order to prevent a chief financial officer from releasing funds based on a phone call from someone, There would have been there would have had to have been an out of bounds previously set up safe word that you know so and so treasury officer or manager comes in and says, I need x thousands of dollars payable to this bank account. And the CFO turns around and goes, Okay, what is the code word of the day? And the person, if legitimate, would hopefully have said code word.

13:54

And if not, they would fumble or flail or whatever. And hopefully they wouldn't have had presence in the network and been able to find out what that code word was. But that would be something that could be used. Again, it would have to be arranged in advance.

Derek Banks

14:11

And if I could pick a safe word, that's danger. Right?

Joff Thyer

14:15

Well, no. I could

Bronwen Aker

14:16

Stranger danger.

Derek Banks

14:17

My safe word. Has anybody seen Bert Kreischer's birth stand up?

Joff Thyer

14:20

Like I could certainly see a scenario, as Brahmin describes, where maybe a company every month, let's say, let's just pick a period, publishes, a random list of words that are common, that are indexed somehow, actually, in a paper mechanism and distributes it out.

Derek Banks

14:40

Aren't you call you're you're basically talking about a one time pad. Exactly. Exactly. So

Bronwen Aker

14:46

And and there's

Derek Banks

14:47

a lot of precedent new again.

Bronwen Aker

14:49

There's a lot of precedent for this. We've we've seen all of the the scenes of the movies

Derek Banks

14:53

where World two and so

Bronwen Aker

14:54

and so can't access this and such security system, can't press the the big red button until they enter the correct security code that's kept in an acrylic thing, and and it's on paper.

Derek Banks

15:07

Reminds me of the the Navajo called to code talkers. Right?

Joff Thyer

15:12

So so Brian Brian has a comment, and and I have one as well. The key there is out of band. Right? It's out of the communications channel. But but, Brian, what's your comment?

Brian Fehrman

15:20

Oh, yeah. Well, I'm just gonna say I know that at, the previous credit union we were at, we did there was actually a code word that we had. It didn't rotate, but it was the one that, like, you picked. And anytime we called, like, in addition to other account information we had to give, you had to give code word. It was actually yeah.

15:35

It it was I don't know. It was Did you get the pick? I'm trying to remember it. And then, like, sometimes, like, we'd say it and they'd be like, I don't think so, but it's it was the word, they just pronounced it differently and I'm not gonna say what the code word was, But

Joff Thyer

15:48

English is not And But it's very sad that it didn't rotate.

Brian Fehrman

15:52

Pick one

Derek Banks

15:53

that's not safe for work.

Joff Thyer

15:56

Well, I you know what? I think we've covered the waterfront today. I think we all do need to run off and do other things, unless anybody wants to try for another question, which I don't think January's

Derek Banks

16:04

not supposed to be this busy. I I don't and it's not. Yeah. That was It's cold that it wasn't gonna be this busy.

Joff Thyer

16:11

Alright. Well, thanks again to my illustrious host, Brahmin, Derek. Everybody wave. Brian, doctor, doctor, doctor. I hope you've enjoyed listening to this episode of AI Security Ops, and we will be seeing you net next time. Keep safe and keep prompting out there. See you.

Transcript source: Provided by creator in RSS feed: download file

For the best experience, listen in Metacast app for iOS or Android