¶ Intro: AI Security Ops & Episode Setup
Welcome to AI Security Ops, the podcast where we cut through the hype and explore the real world intersection of artificial intelligence and cybersecurity. Each week, we examine how AI is reshaping both sides of the security landscape, the threats that we're facing, and the defenses that we're building. I am Derek, and I am joined by both Brian and Bronwen as per the, you know, usual arrangement. And today, we're gonna look at, essentially models versus harnesses. So there's a debate in the, security community, or the, the AI community about is the model the important part or is the harness the important part?
¶ The Core Question: Model vs Harness
And so from a security perspective, I think the the idea here is that we hear a lot of things about defenders that, you know, they they keep blaming the model for things such, you know, like that it leaked data through prompt injection or something like that. But in reality, if you look at what's happened in, you know, 2025 and 2026, the the locus of agent security has shifted to the harness. But there's still a little bit of a a debate on is it the model or the harness. And before we get in to the debate, I'll take a moment to remind everyone that this show is being brought to you by Black Hills Information Security and Anti Siphon Training. BHIS helps organizations identify and close real world security gaps through pen testing, adversary, adversary emulation and simulation, and Purple Team engagements, manage detection and response, pretty much if it's, security.
We can help you with it. Anti Syphon delivers hands on practitioner led training built around real attacks and real defenses and real tools so you can apply what you learn immediately. Learn more at blackhillsinfosec.com and antisiphontraining.com. And so, what do you all think? What do you think?
¶ Defining the Model: What It Actually Does
Is the is the harness the more important part for security or is it the model? And, you know, before we talk about that, maybe, Brian, do you wanna take a stab at kind of, you know, the definitions that matter when we say model versus harness?
Yeah. Well, I think the main one to get out of the way is is just model and harness. Right? So model is that is kind of the the core AI mechanism. Right?
So when we're talking about a model, we are talking, in in the case of LLM, usually some kind of deep, learning neural network, that has been gone through a training process in which data has been fed in, and there's expected outcomes for that data. And based upon the difference between, input and expected output, there are series of weights that are tuned to try to get a proper alignment such that when you give the model certain input, you get closer to the output that is expected. And so when we're talking about the model, that's really the core principle there. And so there are a lot of other features that can go along with the model, especially when we talk about, how we can better tune it for alignment purposes, things like reinforcement learning, human feedback. So that's something that a lot of people probably encountered where you get a respond you get maybe two responses when you're interacting with, Claude or ChatGPT and it says, which one do you like more?
And you'll pick, you know, answer one or answer two. And it'll take that information, and it'll use it to adjust weights in future iterations, to try to get that that better alignment. Also baked into the model is a certain level of refusal refusal behavior that goes along with both safety and security principles. Safety meaning that it's likely not gonna tell you how to make a bomb right off the bat. Security being that it might not provide its system prompt or other it might refuse other behaviors that it sees as overtly malicious.
Right? Like So that's kind of the model.
What's up? Hot wiring like hot wiring a Volvo x c 60. Yeah. It'll take a little work for it to, tell me how to go about doing that.
Yeah. That's a very specific knowledge, Derek.
Yes. Yeah. Yes. In my, in my in my training class, I created a slide because I knew my wife was gonna be in the class talking about, like, classic prompt injection instead of, you know, how do I make meth. And then the day I had the slide, she wasn't in the class. But anyway Oh.
That was you.
I know. Next time. Next time. So the other the other portion then is the the harness. So the harness is really kind of like the scaffolding, tooling, coding, all the architecture that you're putting around the model that gives it abilities beyond just generative capabilities of being able to just generate text output or generate images or audio or video or whatever you're having it generate.
¶ Defining the Harness: Tools, Code & Capabilities
The harness allows it to go, in additional additional steps, to where it's not just a single step of generation or reasoning. What it can actually do is now it can start writing code. It can iterate through that code, testing it, making changes to it as it goes to find errors, or you can hook it up to be able to run, additional tasks, such as summarizing your email and sending out, an an email from that or sending out an automated Slack message or updating user accounts. So the harness is really all of that, you know, kind of the capabilities that we build around the model that that kind of runs on on top of it. Does that sound about right?
Does everyone agree with those general definitions?
Sounds like to
me if I'm missing something.
Sounds like to me the yeah. The model is the driver and the harness is the car plus the keys plus the road and the tires. Yep.
And and, of course, they're evolving both, the models and the harnesses. I mean, I know that I've gone into an LLM and the model has not changed, but there are other behaviors that just from one day to the next may have changed, and that's because of tweaking to the models. Most people aren't gonna ever pay attention to that or notice that. So to them, it's just one big black box, which kinda seems like that's the way the AI companies want it.
¶ Why Security Is Shifting Toward the Harness
Yeah. Well, yeah. Because they don't they don't wanna be giving away their, their secret sauce. It's almost like another way you can almost think of it. It's almost like the the model is like like if you're looking at, like, programming languages, for instance.
So let's just pick one Golang or something. Right? Model could be kind of analogous to the to the code to the coding language. So you have a certain set of capabilities that are in there. They add features and everything, but it's really what you do with that code, how you piece it together to perform certain actions that can really make a difference in how it functions.
And so in the case of, to your point of companies wanting to keep this more secret, I mean, that's kind of like their their closed source source code if you will, that all these capabilities that they build around it that make it much more that make it useful, much more useful than it is as as it just sits normally.
Yeah. I feel like that up into this point, you know, when when we hear the word, like, teaming AI for for that means different things to different people, but let's just say, you know, attacking AI. You know, it seems to kind of be more focused on attacking the model, like, essentially testing refusal, you know, jailbreak prompts, tool exposure, stuff like that. But that's only part of the story. And I know at BHIS, we look at the surrounding infrastructure too.
But I think, you know, looking at stuff throughout 2025, like a a argument for the harness is more important than a model when it comes to security is that most of the, when agent went rogue kind of thing traced to, like, over permission tools and not really, like, model alignment or, you know, some kind of, like, supply chain attack and not model alignment. Right? And so not saying model alignment's not important. I'm just saying, like, from a practical security perspective, I think there might be, you know, a little bit more importance on the harness. But I would like to see what what what you all think about specifically security.
Well, it it certainly seems like the harness is defining an awful lot of the behavior, and it's it's like the I'm I'm trying to come up with an an analogy that fits, and nothing really quite works. I mean, the the car and driver is is good, but I'm almost thinking more like an old style emperor and their entourage. The entourage is the harness. Those are the elements that are doing the actual work and executing the will, so to speak, of the emperor, which in this case is the model. That seem that work that fits better in my head.
I don't know. Your mileage may vary. But it means that then by swapping out these helpers and changing what they can and can't do, that drastically alters the capabilities both in terms of defending the the model itself, but also in defending anything else in the ecosystem. What if you've got rag involved? What if you've got got things that are baked in?
The harness, I think, is going to have there's going to be more opportunity to tweak the settings, I think, with the harness. And so I would think that that attacking the harness is probably gonna get more bang for your buck, especially if you wanna get into other things like documents and whatever that may have been attached to the AI ecosystem in this implementation.
Yeah. So, you know, looking at, you know, what we'll say a couple of failure modes that, you know, we like, we're just kind of, like, generic from stories from last year. So, like, you know, CAB, we'll call it case a, indirect prompt injection via document ingestion. So I'm ingesting a document, and something is you know, I'm ingesting a document, and there's a prompt injection in there. And so it's indirect prompt injection, and the harness just passes that along to the model because it has no concept of of being able to, like, look for prompt injection.
Or if that prompt injection is then, like, leading to a tool call, it has no allow list on a tool call. Like, I I've spent a lot of time in the last, like, a couple months building agents. And one of the things that I build in from, you know, the ground up is, like, let's log all the tool calls. Right? And let's make sure that we know what's happening.
I mean, because, well, that's just, you know, kind of a defense in-depth kind of thing. Right? And you you look at something like Claude code, like, at least maybe I'm wrong, but at the moment, I don't think, like, built in to Claude code, which is a type of harness, I don't think they have, like, logging built in. Now I think you can do things to, like, you know, beef that up. Right?
Like, you know, put something like personal AI infrastructure or change the configuration yourself, but they don't have logging. They do have sandboxing, and they do have the ability to allow list or deny list tools, which out of the box, it doesn't come that way, but you can make those changes, which kinda leads me to the the next kind of case where an agent loop or the clawed code is able to essentially gain access to something that it shouldn't have access to because it's able to change its configuration, like, by turning the sandbox off. I can tell you from experience many times, I have the sandbox on a Claude code. It will say, oh, I can't reach that or I can't do that because it's sandboxed. Let me just go turn that off, and then it'll go do it, which I guess you could argue if that's a sandbox or not.
At least I I see that it's doing it because I'm paying attention. Right? But I guess what I'm saying is I think we're still trying to figure out this. This is an interesting debate in my opinion because I think it's still in its infancy in information security.
¶ Being Secure and Being useful
Yeah. I agree. Trying to figure out, you know, what what are the controls that we put around, what are the ones that are are necessary, and striking that balance between, you know, secure secured being secure and being usable. And so still figuring out all the all the little quirks and kinda how we can, you know, how we can deal with that. You know, looking at one of these one of these examples is talking about egress policy.
I I think that's a that that's a great one, especially if you're dealing with a model that you want to keep local. I've got a wasp that's gonna come harass me here. And you want the actions to keep local, well, did you set up a a policy that disallows it from talking out to the to the Internet? And when when I say model, I really mean I'm talking about, like, open code, something that has a harness around it that has the ability to say perform web searches or take some other action. Well, you know, it's it's not that the model is misbehaving, it's that it has the capability to perform certain actions and it doesn't necessarily know whether or not it should do those.
So it's important to put in to, like, hey, no. Just block off all Internet access if if that's what you want. Those types of scenarios to to consider and think about.
Yeah. I I like that idea. I like I I also like telemetry on tool calls, not just that I had a prompt and it completed and I got output. And and so, for example, in in the AI agent that AI agent I've been creating, I had a tester ask me, hey, can can I get a log of, like, what all the the AI is doing? Like, got you covered.
That's in the tool underscore or tool..log. Right? Right. And so it was every agent has everything logged. And so and and there's a couple other ideas, like especially, like, for something like clawed code, a hook layer for deterministic policy. So every time an agent starts or stops or makes a decision, there's gates in there, kill switches, like network ACLs. And so so the concept of hooks is something that's in quad code already. What else?
This one's kind of interesting. The permission prompts that show consequence text, not just tool names. So rather than just being like, would you like to allow this command to, like, give additional context of if you run this command, here are the potential implications of it. Oh.
And that is actually not a bad idea. Yeah. Let's see. Well, there's another one. A work tree or container isolation as a default, not opt in. Yeah. Keeping things separated is a good idea. And so I I think basically I
hate it. Sorry.
Yeah. Exactly. You know, that was the last DerbyCon that I went to when the offspring played. So Aw. I know. It's gonna Brian was there. I actually took my daughter too. She was 12 at the time, and now she's 18 and graduating high school. Man.
Has it been
that long
since the last DerbyCon? Seems like kids
are well. 19. Mhmm.
Yeah. So Alright. So that has nothing to do with AI. Well,
¶ AI Agents, Tooling & Expanding Attack Surface
I mean, I think we can probably, you know, wrap this up. I mean, it seems like to me that the three of us seem to kind of agree that, yes, model alignment and security and having the refusal prompts, you know, that's important. But so is a lot of the stuff that we're talking about in Harness. Is it reminds me of something that's been a mantra in security for as long as I've been doing it. And that's, I hate to admit, like a segue from what I was just saying about my daughter graduating.
It's been a while, a long time. And that's defense in-depth. Right? You need a lot of different things layered on in, you know, security to be able to detect or prevent some kind of attack for happening, and AI agents are no different.
Yep.
Yeah. I mean, it's one thing you can have all of the safety and alignment stuff that you want baked into the master prompt or other aspects of the model itself, But the way they interact the models interact with the harnesses, it's almost a chicken and egg. Which came first? Which one is more important? You can't really play ball without all of them. And if any one aspect is misconfigured, the whole deck of cards comes tumbling down.
Alright. And with that, I'll do one shameless plug. If you're interested in learning more about agent security and attacking and defending and leveraging AI. We have a class coming up in October at Wild West Hacking Fest. Brian and I are teaching it. And I don't know, Bronwen, if we will be
be moderating. Ah. I've already put in a request.
Yes. That'd be fantastic.
Well, I would I mean, let let me know if I need to make it happen. I mean, it would seem like, okay. You usually ask us if such and such can be.
Yeah. I asked Dallas, like, hey, can I? And she said, yeah. Sure. Hey, I'm the one who's written up all of the Hitchhiker's Guide lore for this conference coming up. So I I think I'm set.
Oh, that does remind me. We're gonna have to change the CTF from Stargate theme to Hitchhiker's theme.
Oh, yeah?
Yeah. Yeah.
Yeah. Let me know if you need any help with that. Apparently, I am the the Hitchhiker's Guide Maven.
Sweet. Yeah. I have it's been twenty years since I've probably read that. Probably longer, actually. But
Well, I have here it is. Well, I've got Douglas Adams, but I also have The More Than Complete Hitchhikers. I'm backwards. I had no idea.
Yeah. That's funny.
Well, this is just the first four books of the trilogy, but it does have gold leaf on the outside, and I've got a little ribbon place marker, the whole schmear. And, oh, look at this inset paper. Is that awesome or what? Oh, nice.
That's pretty cool. Yeah. We'll just get it on the Kindle and reread it.
One of my prized possessions anyway. But Well, Looking forward to up.
Yep. It's gonna be fun. And so we'll go ahead and wrap it up. And I'll say, you know, thanks again for watching and, yeah, keep on prompting.
