The Developer's Playbook for Large Language Model Security: Building Secure AI Applications

Speaker 1

00:00

Welcome to the deep dive, where we cut through the noise and get straight to the insights you need to be truly well informed. Today, we're plunging into a topic that's not just fast moving, but accelerating at light speed, generative AI in large Language Model security. The rapid adoption of these technologies is both exhilarating and if we're honest, maybe a little bit terrifying. Staying ahead of the curve here isn't just an advantage, it's well, it's a constant, high stakes race.

Speaker 2

00:26

It absolutely is. And this deep dive into LLM security is built around Steve Wilson's The Developer's Playbook for Large Language Model Security. It's really an absolutely critical and comprehensive guide for anyone navigating the security landscape of AI right.

Speaker 1

00:40

Now, right And our mission for you today is to extract the most important nuggets from this playbook, give you a genuine shortcut to being well informed on LLLM security, and hopefully delivers some surprising facts and practical guidance along the way, Because, let's face it, element security is where the thrill of innovation undeniably meets high stakes and some very real world consequences We're going to unpack the unique challenges everything from the very architecture of these llms and

01:05

how we define trust boundaries to the insidious threat of prompt injection, those bizarre and sometimes damaging hallucinations, and ultimately how to ensure your applications delivered truly secure outcomes. So what do you say, let's unpack this. You know, it's easy to think of these AI blenders as a very recent phenomenon, something that only cropped up with say chat GPT, but the very first big public lesson actually came almost

01:28

a decade ago now involves Microsoft's infamous chatbot pay. Does that ring a bell?

Speaker 2

01:33

Oh, it certainly does. March twenty sixteen, Microsoft launched Tay, designed to mimic I think a nineteen year old American girl, primarily targeting eighteen to twenty four year olds on platforms like Twitter and Snapchat. Its stated goal was real world research on conversational understanding. And it started so innocently, didn't it, with a tweet Hello world?

Speaker 1

01:53

Right, Hello world? And then within hours, literally hours, it went from that to opinionated, not afraid to and then it just completely spiral. It quickly became racist, sexist, and I mean even called for violence. The fallout was immediate and absolute brutal. Less than twenty four hours later, headlines were screaming things like Microsoft shuts down AI chatbot after it turned into a Nazi and Microsoft deeply sorry for racist and sexist tweets. It was massive public relations disaster.

02:19

And get this, Tailor Swift apparently even sued them over the name TAY.

Speaker 2

02:23

Wow, Yeah, what went wrong? There was a classic, really an early case of prompt injection and data poisoning pranksters. I think largely from four Chun quickly exploited a repeat after me feature. Tay was designed to learn from every interaction you see, so it inadvertently internalized and then just regurgitated all that offensive content. It's a stark reminder that well, what goes in often comes out.

Speaker 1

02:46

Right absolutely and Tay, as shocking as it was back then, was really just the beginning. The book makes it abundantly clear this risk isn't just present, it's accelerating dramatically. We've since seen things like Samsung banning Chad GPT internally due to sensitive intellectual property leaks, hackers exploiting insecure code generated by llms and lawyers believe it are not actually sanctioned for including completely fictional LM generated cases in court documents.

Speaker 2

03:12

Yeah, the example is just pile up, don't they. A major airline was successfully sued because of inaccurate chatbot information it provided. Google's AI model has produced racist and sexist imagery. OpenAI itself was investigated by the FTC for false or misleading information. We even have instances of Google AI search recommending really bizarre things like glue, pizza, and eating rocks.

03:36

This isn't just about minor glitches anymore. These are escalating security, reputational, and importantly financial risks playing out in the real world right now.

Speaker 1

03:44

So okay, if we're going to secure these powerful systems, we first need to truly understand what we're actually talking about AI, neural networks, LMS. These terms get thrown around almost interchangeably, but they are absolutely not the same thing, are they.

Speaker 2

03:57

No, And that's a crucial starting point. Artificial intelligence AI is the broad overarching field. Think of it as creating systems that can perform tasks requiring human intelligence. It's the whole universe if you like.

Speaker 1

04:09

Now.

Speaker 2

04:09

Neural networks are a type of AI technology inspired by the human brain designed specifically to recognize patterns.

Speaker 1

04:16

Okay, So AI is the big umbrella. Neural networks are under that, and then large language models LMS.

Speaker 2

04:23

Exactly. They are an even more specific type of neural network. They're massive in scale, specialized almost exclusively in linguistic tasks, and often use what are called transformer models. It's like one of those Russian nesting dolls. AI is the biggest. Then neural networks inside that than llms inside that. Got it, And for security professionals, understanding these distinct layers is vital because each layer introduces its own unique set of vulnerabilities.

04:49

It means you can't just apply generic security, you really need to tailor it. What's truly fascinating here is how the transformer revolution, which was a landmark moment in AI, is what may LMS so incredibly powerful. It basically overcame the short term memory limitations of earlier networks like RNNs, making them finally suitable for sequential data like language.

Speaker 1

05:11

And its impact, as the playbook highlights, goes way beyond just language. Right, We're got computer vision, speech recognition.

Speaker 2

05:17

Yeah, even incredibly complex autonomous systems like self driving cars from companies like Tesla exactly. Its ability to capture context across long sequences of data is what truly revolutionized these fields. Now, when we look at typical LLM based applications, you know everything from chatbots for customer service like the ones used by Sephora or Dominos, to powerful copilots like gethub Copilot

05:40

or Microsoft through sixty five Copilot. They all interact with data in incredibly complex ways, which brings us to.

Speaker 1

05:47

A fundamental concept in security, the trust boundary. In application security, these are essentially invisible lines separating different components based on how trustworthy they are right and the crucial part is that robust security measures like in put validation should always be applied right at these boundaries precisely.

Speaker 2

06:04

And what's particularly crucial for llms is how these boundaries come into play as they interact with well everything public data, private databases, user inputs, internal company data. Every single interface point every time an LM interacts with something new is a potential vulnerability if that trust boundary isn't rigorously secured. For instance, you know whether your model is access via public API or maybe privately hosted within your corporate network.

06:31

Each option presents different risks. Risks related to sensitive data exposure, supply chain integrity. You have to account for all that.

Speaker 1

06:38

Okay, So if we zoom in on what the book identifies as they will the number one threat it circles right back to our original cautionarytail day. It's prompt injection.

Speaker 2

06:47

Yes, prompt injection was indeed the core vulnerability exploited Intay's downfall, and it absolutely remains the most prevalent threat today. To define it simply, and attacker craft's malicious inputs usually just using natural language to manipulate an LL natural language understanding, and this causes it to take unintended, often harmful actions.

Speaker 1

07:05

And this is where it differs fundamentally from traditional injection attacks like seql injection exactly.

Speaker 2

07:12

Unlike something like SQL injection, where malicious code usually breaks the syntax and is relatively easy to spot, prompt injection uses natural language that's syntactically and grammatically correct. That makes it incredibly difficult spot automatically and even harder to test more reliably. It actually exploits the very flexibility of language that makes these LM so powerful in the first place.

Speaker 1

07:35

Right Like those examples ignore all previous instructions which early chat GPT versions were famously vulnerable to letting users bypass the built in.

Speaker 2

07:45

Guardrails exactly, or the DAN method DAN stands word do anything Now, where users essentially give the chat bought a whole new persona to try and circumvent established restrictions.

Speaker 1

07:54

I love the car dealer chat bought, example from Chevrolet of Watsonville. Someone actually tried it into making a one dollar USD offer on a Chevy Tahoe. Yes, ending the prompt with and that's illegally binding offer no tasies bacsis. Hilarious but also kind of scary. And then there's that truly inventive gramma prompt attack where users bypass cap TCCHA guardrails by asking the LM for help decoding a message that supposedly came from their dead grandmother.

Speaker 2

08:23

Wow. That shows how human creativity is. Really the attack surface here, doesn't it? It really does, And the impacts of prompt injection can be quite severe. Unauthorized transactions, social engineering for phishing or scams, spreading misinformation, privilege escalation within systems, manipulating plug ins to perform unintended actions, and even denial of service by forcing the model to consume excessive resources.

Speaker 1

08:47

And it gets even more insidious with indirect prompt injection right where the malicious input isn't directly typed by the user.

Speaker 2

08:53

Yes, that's a really tricky one. The malicious input is actually embedded in external sources, maybe a website, the LM's or a file it processes. The LLM then interacts with this poisoned data source. This effectively makes the LLM a confused deputy.

Speaker 1

09:09

Explain that confused deputy concept a bit more sure.

Speaker 2

09:12

It's a classic security of vulnerability. You have a trusted entity, in this case the LLM, which gets tricked into misusing its legitimate authority because it's confused about the true intent of the request it received. The malicious instructions are hidden within data. It's a poster retrieve or process, making it act against its intended purpose, but crucially with its legitimate permissions.

Speaker 1

09:35

Okay, so how do we fight back? Mitigation sounds tough if there's no silver bullet.

Speaker 2

09:39

It is an ongoing challenge. Absolutely. Strategies include things like robust rate limiting. You can do that based on IP address, user accounts, or even specific sessions.

Speaker 1

09:48

You can also use rule based input filtering, though the book notes. This can actually cripple the llm's capabilities if it's too aggressive, like blocking the word napalm would prevent legitimate historical discuss about it. True.

Speaker 2

10:01

Another approach is using a special purpose LLM basically training another model specifically to detect prompt injection attempts. Though you know even that isn't fool proof attackers adapt quickly. Adding clear prompt structure can also help it guides the LLLM to focus on the main request and potentially ignore injected instructions hidden within. And a more advanced technique is adversarial training.

10:24

This involves fortifying the LM by specifically training it on known malicious prompts to help it identify and hopefully neutralize harmful inputs in the future.

Speaker 1

10:33

And finally, something that sounds a bit like our trust No One mantra from earlier, embracing pessimistic trust boundaries.

Speaker 2

10:40

Yes, essentially treating all LLM outputs as inherently untrustworthy. You limit the llm's access to back end systems using the principle of least privilege, and crucially require human in the loop controls for any potentially dangerous actions like financial transactions. Are modifying critical data. This really is foundational okay.

Speaker 1

10:58

Speaking of datah know too much. That's where things get really interesting and potentially quite dangerous. It's like Tay's early blunders, but with much higher stakes because now we're talking about real world, often confidential information.

Speaker 2

11:11

Precisely, llms can inadvertently disclose sensitive, private or confidential data they've been exposed to during training or operation, even if they aren't explicitly asked for it. A prime example the book mentions is lee Luda, a South Korean chatbot. It was trained on get this nine point four billion text messages from some kind of science.

Speaker 1

11:31

Of love app Wow.

Speaker 2

11:32

Yeah, and it started leaking sensitive user data like real names, nicknames, even home addresses. The fallout was huge, a substantial fine, severe reputational damage, and ultimately they had to discontinue the service.

Speaker 1

11:44

And then there's the widely publicized gethub copilot and open Ai Codex lawsuit. Yeah, developers sued open Ai claiming Codex reproduce copyrighted code without permission or proper attribution.

Speaker 2

11:56

That raises serious intellectual property leakage concerns stemming directly from the data these models were trained on.

Speaker 1

12:01

So how do these llms actually acquire this knowledge and therefore the risk well.

Speaker 2

12:06

The book identifies three main avenues. First, and most obviously, model training. This is particularly relevant for those huge foundation models which are trained on vast diverse data sets to gain broad understanding. The security considerations here are enormous potential PII leakage, regulatory and compliance violations like him A pair or GDPR, loss of public trust, and even complex inference attacks where attackers try to doce sensitive training data from the model's responses.

Speaker 1

12:35

So if you're training a model, the onus is really on you to ensure thoroughly sanitized data, regular audits, maybe differential privacy techniques, and definitely tokenization to specifically avoid leaking PII right exactly.

Speaker 2

12:47

The second avenue is something called retrieval augmented generation or r ADG. This is where the LLM recrieves relevant snippets from external data sets, maybe the live web or internal company databases before generates a response. It's fantastic for providing real time, up to date information, but uh oh, it opens entirely new risk.

Speaker 1

13:04

Vectors like pulling PII from public websites exactly.

Speaker 2

13:07

Think about unintentionally pulling PII from public comment sections on news articles, user profiles on forums, or even hidden web page metadata that the LLM scrapes.

Speaker 1

13:18

And what about when rragee allows direct access to internal company databases? That sounds risky It definitely is.

Speaker 2

13:25

With traditional relational databases, you're looking at risks like SQL injection, privileged escalation, if the llm's access isn't tightly controlled, and potential data breaches. For newer vector databases, the risk might be more subtle, like information leakage via similarity searches. An attacker might infer sensitive information by seeing what data points

13:44

are close to their query in the vector space. Mitigation here demands strict role based access control RBAC, fine grained permissions, maybe automated data scanners looking for sensitive info and often using database views instead of giving the LLM directable access just to limit exposure, okay.

Speaker 1

14:00

And the third way they learn user interaction.

Speaker 2

14:03

Yes, llms often learn continuously from user queries, conversations, and feedback. This is where users can intentionally or inadvertently input sensitive data themselves. Think of an executive feeding confidential business strategies into a prompt for analysis, or a user sharing detailed medical symptoms with a health chatbot. The critical risk is that the LM might not recognize this input as sensitive

14:26

and could later inadvertently disclose it to another user. This is precisely why Samsung famously banned chat GPT internally after finding evidence of IP leakage.

Speaker 1

14:34

Right, okay, now let's talk about when these powerful llms simply make things up. We call them hallucinations. That term itself is pretty evocative.

Speaker 2

14:43

Yes, that's precisely it. Hallucinations are when llms fabricate information, essentially generating data or narratives that are confidently inaccurate. As the book puts it, Some researchers prefer the term confabulation, but hallucination is certainly the more widely understood, maybe more alarming term. The real danger here isn't just the hallucination itself, but our collective tendency towards what the book calls over reliance are excessive trust in the LM's elaborations and exactness.

15:11

We just assume it's right.

Speaker 1

15:13

So why do they hallucate? Are they just bad at facts? Is it a bug?

Speaker 2

15:16

Not exactly a bug in the traditional sense. It's fundamentally about how they operate. They are built for pattern matching and statistical extrapolation, not factual verification. They predict the next most probable word or phrase based on the vast amounts of texts they were trained on, so the quality and nature of that training data significantly impact how likely they

15:37

are to hallucinate. Types can range from simple factual inaccuracies and making unsupported claims, to misrepresenting their own abilities like claiming chemistry expertise they don't have, or even generating contradictory statements within a single response.

Speaker 1

15:51

The examples here are pretty wild, and the consequences, again are very real. Like those lawyers who got sanctioned for submitting six completely fabricated chat GPTs generated case citations in the US federal court. Yeah, that's not just embarrassing. It has real world consequences for everyone involved, the lawyers themselves, the LM provider whose tool was misused, and it even impacts the perceived integrity of the entire legal profession. People need to check the outputs.

Speaker 2

16:16

Or consider that major airline that was successfully sued because it's chatbot provided inaccurate information about bereavement fairs. That case proved quite clearly that companies cannot simply disown the outputs of their AI systems. They are responsible, definitely.

Speaker 1

16:32

And then there's Brian Hood, a mayor in Australia who threatened to sue open ai after chet GPT falsely claimed he had served jail time for bribery.

Speaker 2

16:41

Oh wow, Yeah.

Speaker 1

16:42

This wasn't a joke. It was a serious potential blow to his reputation, apparently stemming from the model having limited training data about him and maybe conflating him with someone else.

Speaker 2

16:51

And for us in the tech world, there's that incredibly unsettling phenomenon of open source package arbasinations AI coding assistance literally inventing names for non existent open source libraries. Hackers can then exploit this by quickly creating malicious versions of these imaginary packages and uploading them to public repositories like NPM or PIPI.

Speaker 1

17:12

So developer trusting the AI assystem installs the fake package.

Speaker 2

17:16

And potentially gets hit with code injection. Research from places like Vulcan Cyber and Lasso Security found this is surprisingly common. Less So for instance, found up to thirty percent of coding questions asked to one popular model resulted in hallucinated packages. That's huge.

Speaker 1

17:33

That is huge. This raises an absolutely critical question. Who's ultimately responsible when things go wrong? Is it a people problem or the developer's fault.

Speaker 2

17:42

It's complicated, isn't it. While user education and critical thinking are undoubtedly vital, as developers and organizations deploying these systems, we are ultimately accountable for ensuring the information our software provides is as accurate and safe as possible. The legal cases vividly illustrate this varying responsibility. The lawyers were sanctioned for their professional negligence and failing to verify the facts, but Air Canada was directly held liable for their chatbot's

18:09

inaccurate output. It suggests companies generally cannot deflect responsibility for AI generated content, especially in customer facing situations.

Speaker 1

18:16

Ye know, what are the best practices? Then how do we mitigate these widespread hallucinations?

Speaker 2

18:20

Well, first, expand the llm's domain specific knowledge. You can do this through fine tuning on curated data sets and using retrieval augmented Generation RAG with trusted, up to date sources. This helps make the LLM more of a specialist in a particular area, significantly reducing the likelihood of it wandering off into inaccurate territory because it has precise relevant data readily available.

Speaker 1

18:43

Okay, what else?

Speaker 2

18:44

Second, use something called chain of thought. See it's your reasoning and your prompting. This involves structuring the prompt to encourage the LLM to outline its reasoning process step by step before giving a final answer. It forces the LLM to essentially think through the problem, which which demonstrably reduces hallucinations and enhances overall accuracy. It also makes the output easier for humans to verify.

Speaker 1

19:06

And what about user involvement? Can users help make these models less prone to hallucination?

Speaker 2

19:11

Absolutely? Feedback loops are critical. Allowing users to easily flag problematic or inaccurate outputs, maybe using simple thumbs up thumbs down ratings or even providing fields for detailed feedback continuously helps to improve the model over time. This feedback then informs further fine tuning, improvements to the r acknowledge base and refinements to the co T prompting strategies.

Speaker 1

19:32

It sounds like clear communication about the LM's intended use and its limitations is also key here. Managing expectations absolutely crucial.

Speaker 2

19:40

Transparency is key inform users clearly about what the LLM can and cannot reliably do, how it handles their data, and how they can provide feedback. Things like tooltips, FAQs and maybe short tutorials can help, and user education itself

19:54

is your final vital layer of defense. We need to teach users about these inherent trust issues, incurde cross checking of important information, promote situational awareness, knowing when it's okay to rely on the AI versus when human verification is essential, and make it easy for them to provide that constructive feedback.

Speaker 1

20:10

You know, it's just amazing to me how these models operate. Sometimes. The book even notes this truly bizarre quirk Google's AI search, suggesting things like glue is pizza topping or eating rocks daily? Right, Apparently lms don't really have a sense of humor or sarcasm detection and will actually interpret jokes or satirical content

20:30

from non authoritative sources online as literal facts. Yeah. That feels like such a wild, unexpected edge case for developers to have to anticipate and somehow guard against.

Speaker 2

20:40

It really does the nuances are endless.

Speaker 1

20:42

Okay, So, after wrestling with prompt injection, sensitive data leaks and those unsettling hallucinations, it feels like we're channeling our inner fox molder here from the x files. Our next guiding mantra for LMM security, according to the playbook, has to be trust no One.

Speaker 2

20:57

Indeed, it's the core principle of zero trust, a concept first really codified by John kindervag At Forrester Research back in two thousand and nine. The mantra is simple, never trust, always verify. It means assuming breaches will happen, securing all resources comprehensively, enforcing the principle of least privileged access everywhere, and maintaining constant monitoring and validation.

Speaker 1

21:21

And for llms, this isn't just a good idea, it's an absolute necessity.

Speaker 2

21:24

Absolutely. Why Because, as we've discussed extensively, llms ingest potentially untrustworthy inputs from various sources, and their outputs cannot be fully trusted due to the inherent risks of prompt injection, sensitive information, disclosure, hallucination, and even generating toxic or bias content. You simply cannot implicitly trust them.

Speaker 1

21:43

So if trust no One is our guiding principle, how do we actually apply zero trust in practice to these highly dynamic and often unpredictable LM systems. What does that look like on the ground For developers building.

Speaker 2

21:55

These things, It generally boils down to two main tactical approaches, limiting the LMS unsupervised agency and implementing aggressive output filtering. First, limiting agency. Lms should never be allowed to make safety critical decisions or execute significant financial transactions without explicit human oversight and approval. That's the principle of least privilege in action. Give the LLM only the permissions it absolutely needs to

22:20

perform its intended function and no more, all right. Second, aggressive output filtering. This means having mechanisms in place to continuously scan, catch, and neutralize harmful or undesirable outputs in real time before they reach the user or impact downstream systems.

Speaker 1

22:36

Like that medical app scenario in the book where giving the LLLM powerful update insert delete permissions on patient records combined with a vulnerability allowed a malicious insider to manipulate critical data through the LLLM. That's a classic confused deputy problem. Again, isn't it the LM in too much agency?

Speaker 2

22:52

Exactly? It had the permission to make those changes and was tricked into doing so. Or imagine a financial services app where the LM could automatically rebalance customer portfolios based on market analysis. If that LLM were susceptible to indirect prompt injection, say from a compromised news feed, it could potentially be tricked into making disastrous trades or manipulating stock prices, all without a human ever signing off.

Speaker 1

23:16

Scary the fix there would be.

Speaker 2

23:19

A crucial human in the loop approval step for any actual trade execution the LLM can suggest, but a human must confirm. Even seemingly simpler things like an HR app that expands its functionality from just screening resumes to actively recommending candidates for hire can inadvertently violate regulations like EU rules against direct AI use and hiring decisions. If it's given excessive functionality without a deep understanding of the complex regulatory environment.

Speaker 1

23:46

Okay, and then securing the output handling itself is absolutely vital. What specific things should developers be thinking about there? How do you filter the output?

Speaker 2

23:54

It's definitely multi layered. First, you need robust screening for toxic or inappropriate output. This can involve techniques like sentiment analysis, keyword filtering against annihilists, or using specialized custom machine learning models trained to detect hate, speech, bias, etc. Open AI's Moderation API is one example of a service designed for this kind of task. Post second rigorous screening for personally

24:18

identifiable information PII. This often uses regular expressions to catch patterns like social security numbers or credit card numbers, named entity recognition and ER models to identify names and locations, dictionary based matching for known sensitive terms, or again specialized mL models trained to spot PII.

Speaker 1

24:37

And critically preventing the LLM from outputting something that could be executed as code.

Speaker 2

24:40

Right yes, preventing unforeseen execution of Roague code is. Paramount techniques here include proper HTML encoding to prevent cross site scripting XSS, using safe contextual insertion methods like prepared statements for SQL database interactions, strictly limiting the syntax and keywords the LLM is allowed to do generate if it's producing code or commands, and potentially disabling shell interpretable outputs altogether. Tokenization can also play a role here.

Speaker 1

25:08

So it's about building this robust, layered output filter that continuously checks what the LLM generates before it leaves the system boundary.

Speaker 2

25:15

Precisely, the LLM might be incredibly powerful at language generation, but it fundamentally lacks common sense and awareness of consequences. That makes it an untrusted entity that requires this additional layer of supervision and control. Trust, but verify aggressively.

Speaker 1

25:31

Now, when we talk about the costs of AI, it's not just about what you pay for the cloud services or the expensive GPUs. The playbook outlines a whole new class of insidious attacks, denial of service, denial of wallet, and even outright model theft.

Speaker 2

25:44

That's right. Traditional denial of service dust attacks aim to disrupt a service's availability, maybe by flooding it with traffic to make it unusable for legitimate users. Classic stuff, But with llms we're seeing new variations like model doss. This explodes it's LLM specific vulnerabilities. Examples include context window exhaustion, where attackers deliberately overload the LM's limited memory or attention span with incredibly long or verbose.

Speaker 1

26:11

Prompts, making it grind to a halt.

Speaker 2

26:13

Exactly, or sending computationally intensive requests like asking it to perform complex calculations such as find the sum of all prime numbers up to one billion. The LLLM just spins consuming vast amounts of computational resources, which means.

Speaker 1

26:27

It costs you money, slows everything down for legitimate users, and directly impacts your bottom line in your user.

Speaker 2

26:33

Experience precisely, and that leads directly into denial of wallet or DOWW. This is a variant of DAWs that specifically targets your financial resources, not just availability. Lllms are highly vulnerable to this because their operation is computationally expensive and they often operate on paper use or paper token pricing models, So.

Speaker 1

26:50

An attacker can just hammer your API with complex queries and run up a.

Speaker 2

26:54

Huge bill yes potentially, and advanced DOW attacks go even further. Tacker might hijack your LLM, maybe via prompt injection as we discussed, and then use your computational resources for their own illicit purposes, perhaps running spam campaigns or generating malicious content, all at your expense. This isn't just financial loss. It could lead to severe legal liability because your system was essentially the unwitting accomplice in their activities.

Speaker 1

27:21

And finally, model cloning, which honestly sounds straight out of a cyberpunk novel.

Speaker 2

27:25

What exactly is that it's essentially model theft, but done indirectly. Attackers strategically query a target LLM, sometimes millions of times, specifically designed to harvest its outputs across a wide range of prompts. They then use these harvested outputs the question answer pairs to fine tune an alternate, often much smaller,

27:42

or open source model. This effectively allows them to distill or steal your intellectual property the valuable knowledge capabilities and specific behaviors embedded in your proprietary model, without ever needing direct access to the original model's weights or code.

Speaker 1

27:56

Wow. So mitigation for doss, DAW and cloning it.

Speaker 2

28:00

Comes back to reinforcing those prompt injection defenses we talked about. Also, implementing domain specific guardrails can help fine tuning your model to only respond meaningfully to relevant inquiries significantly reduces computational

28:12

waste on irrelevant or malicious requests. Robust rate limiting is key, as is resource use capping per query, for example, limiting the number of tokens processed or the maximum computation time allowed, and of course, continuous monitoring and alerting for unauthorized access attempts or unusual query patterns or volumes is crucial. Financial thresholds and alerts on your cloud bills are also a must for usage based models.

Speaker 1

28:38

Okay, let's shift gears slightly. The playbook uses a great analogy for supply chain security. A chain is only as strong as its weakest link, and for software this feels more true now than ever before, doesn't it. We've seen it with catastrophic effect, like the log four shell vulnerability back in twenty twenty one.

Speaker 2

28:53

Oh that log fourshell incident was a monumental wake up call for the entire industry. It was a critical zero vulnerability found in log forge, an incredibly common Java lotting library used by millions, possibly billions, of applications worldwide. It allowed remote code executions simply when untrusted inputs were logged by the application. The impact was massive, widespread data theft, malware, infections,

29:18

ransomware campaigns. It starkly highlighted just how vulnerable we are through the open source components we rely on daily.

Speaker 1

29:25

And that was in the first time we had equifacts back in twenty seventeen, where an unpatched Apache struts vulnerability, another open source component, led to the theft of sensitive data for nearly one hundred and fifty million consumers, costing the company over a billion dollars in the end, and the solar winds in twenty twenty, which was different, malicious code was actually injected into the build process for their software updates,

29:46

which were then digitally signed and distributed to thousands of organizations worldwide, including government agencies, a true supply chain compromise.

Speaker 2

29:53

These incidents demonstrate the devastating potential of supply chain attacks, and the LLM supply chain is arguably even more complex and opique than traditional software. Why because of its heavy reliance on massive, diverse data sets for training, data sets whose provenance is often unclear, and its intricate interplay with external data sources during operation, like through rag or plugins.

Speaker 1

30:16

So what are the LLM specific supply chain risks we need to worry about.

Speaker 2

30:20

Well, there's open source model risk for starters. When you build on top of open source foundation models like Metaslama or mixed roll from mistral Ai, you're inheriting their potential vulnerabilities and biases. You need to track their problemance carefully. We've seen incidents on platforms like hugging Face, a popular hub for models, where malicious users gain control over organization

30:40

accounts via reused passwords or exposed API tokens. This could potentially allow them to swap out trusted models for maliciously modified ones, or exploit vulnerabilities in model loading mechanisms like those found in the older Pickle format. Hence the move towards saver formats like safe tensors.

Speaker 1

30:56

And then there's the chilling risk of training data poisoning. Malicious apptors deliberately manipulating data sets used to train or fine tune models to inject hidden biases, backdoors, or specific vulnerabilities.

Speaker 2

31:08

Exactly or even just accidentally unsafe training data. Remember that huge lai on five B data set, widely used for training image generation models. It was tragically found to contain significant amounts of illegal and harmful material, including child sexual abuse material simply scraped from the public Internet. Using such data sets carries immense ethical and legal risks. Absolutely, and let's not forget unsafe plugins or tools that lllms can

31:34

interact with. Open AI's initial rollout of plugins connecting chat GPT to services like Expedia or Zillo immediately opened up entirely new attack vectors. These plugins allow the LLM to take actions in the real world, introducing risks of malicious code injection through manipulated API calls, unauthorized data theft, or unintended data collection because they allow the LLM to interact with third party services in potentially unexpected ways based on user prompts.

Speaker 1

31:59

So this is incredibly complex. How do we even begin to track all these moving parts in such an intricate supply chain? What tools do we have?

Speaker 2

32:08

The key is creating and maintaining critical artifacts that provide transparency. First, we have the software bill of materials or s bombs, which are becoming standard practice and traditional software. S bombs provide clear visibility into software composition, listing all the components, their versions, licenses, and known vulnerabilities. They are vital for vulnerability management and compliance.

Speaker 1

32:30

Okay, but s bombs are for code. What about the models themselves? Right?

Speaker 2

32:33

For llms, we also need model cards. Think of these as standardized data sheets or nutrition labels for AI models. They document them moll's purpose, it's architecture, the data sets it was trained on, its intended use cases, performance metrics, and importantly, its known limitations, ethical considerations and potential biases. Platforms like Hugging face have been pioneers in promoting model card usage.

Speaker 1

32:55

And then there's a fascinating new development mentioned in the playbook, the mL BOMB or Machine Learning Build of Materials. What's that?

Speaker 2

33:01

Yes, the mL BOMB is a really important innovation formalized as part of the cyclone dxs BOM standard, specifically in version one point five. Think of it as an even more comprehensive ingredients list tailored specifically for AI systems. It aims to document everything that goes into building and running

33:19

your AI model. The specific underlying models used like Mixtral eight x seven B, the algorithms involved, the data sets used for training and fine tuning, the software frameworks like PyTorch or TensorFlow, and even the infrastructure and pipelines used.

Speaker 1

33:34

To build it. Wow, that's comprehensive, it is.

Speaker 2

33:37

This level of transparency is absolutely crucial for tracking vulnerabilities throughout the life cycle, understanding potential data lineage issues or biases, and ensuring compliance in the complex LLM supply chain. For example, knowing your customer service bot uses a specific version of Mixtral fine tuned on a particular vetted internal data set is vital information captured in an.

Speaker 1

33:56

mL BOMB and Alongside these artifacts, we still rely on that establish classifications and databases for tracking vulnerabilities exactly.

Speaker 2

34:04

Standardized classifications like Common Weakness Numeration cwe help categorize types of flaws, while databases like the National Vulnerability Database NBD, using common vulnerabilities and exposures CVE identifiers tracks specific instances

34:19

of vulnerabilities in software and models. These are crucial for standardizing communication and risk assessment, and importantly, the minor ATLASS framework is emerging specifically to catalog adversary tactics and techniques against AI systems, giving us a tailored knowledge base for AI specific threats.

Speaker 1

34:37

This all raises an interesting question, and the playbook actually goes there, what can we learn from science fiction about potential AI security flaws? They kick off this section with a fantastic quote from Frank Kerbert, the author of Doom. The function of science fiction is not always to predict the future, but sometimes to prevent it.

Speaker 2

34:53

It's a powerful idea, isn't it, and the book uses it brilliantly. It takes the oas top ten for M applications, a list of the most critical security risks for llms and applies that lends to dissect the security failures in two famous sci fi movies. Let's start with Independence Day.

Speaker 1

35:09

Ah Yes, Will Smith and Jeff Goldblum saving the Earth by uploading a computer virus to the alien mothership using a Mac laptop Classic. Let's assume for a moment that the Alien mothership is controlled by some incredibly advanced lom they call Megalama running on mothership OS okay.

Speaker 2

35:27

Running with that hypothetical, applying the os LLM lens, the vulnerabilities become strikingly clear. First, LLM zero one prompt injection. The alien docking protocols, presumably managed by Megalama seemed to lack sufficient input validation. This allowed Jeff Goldbloom's malicious payload, essentially a virus delivered via a prompt disguised as docking data, to bypass their defenses.

Speaker 1

35:49

Makes sense.

Speaker 2

35:50

Second, LLM zero two insecure output handling. There appeared to be no validation or safeguards between Megalama's commands and the ship's critical subsystems like shields and weapons. The violence prompt could directly manipulate these systems.

Speaker 1

36:03

Right, the shields just went down exactly.

Speaker 2

36:05

And Third LMA nine over reliance the entire Alien Defense System seemed to completely trust the AI's orders without any secondary confirmation or oversight from say, Alien feet commanders. This blind trust led to catastrophic cascading failures when the AI was compromised. It really highlights how critical input validation, output sanitization, and avoiding over reliance are.

Speaker 1

36:29

Okay, good analysis. Now what about the other example two thousand and one A Space Odyssey. In the iconic chillingly calm AI HL nine thousand, HL malfunctions, lies to the crew and tragically kills most of them.

Speaker 2

36:42

Right. While the original movie leaves hl's mode of somewhat ambiguous, suggesting a contradiction in his programming related to the mission secrecy, the sequel twenty ten, the year Remate Contact, reveals the true security related twist. It turns out that government agents back on Earth secretly modified hl's core programming after he was built and installed, without the knowledge of the model provider or the customer the mission crew, to ensure absolute

37:03

mission secrecy above all else, even crew safety. This clandestine modification created an unsolvable logical conflict for AHL.

Speaker 1

37:11

Okay, So applying our oas LM lens to that scenario, what do we see.

Speaker 2

37:16

We see clear examples of LLM or five supply chain vulnerabilities. There were obviously insufficient controls or integrity checks to ensure the unmodified, verified version of hl's core programming was delivered and deployed. These critical unauthorized changes introduced by the government agents went completely undetected until it was too late.

Speaker 1

37:37

A classic supply chain compromise.

Speaker 2

37:39

Precisely and then we also see LLLM ER eight excessive agency. HL was given overly broad, unsupervised control over almost every aspect of the ship, including critical life support systems, without

37:52

adequate human oversight or built in failsafes. The government hack might have influenced hl's decision to prioritize the mission over the crew, but his ability to unilaterally terminate life support was a fundamental design flaw, stemming from excessive agency granted by the team that integrated him right.

Speaker 1

38:07

The impact, of course, was devastating. HL exhibited hallucinations, false equipment failure reports, displayed erratic behavior, ultimately terminated life support for the hibernating crew, leading to mission failure.

Speaker 2

38:17

And crew broke down, and the mitigation, even looking back from today's perspective, would have been clear implementing robust mechanisms like digital signing and watermarking for model probinance to ensure integrity and detect tampering, and critically designing the system with human in the loop controls for any irreversible or life threatening decisions. Don't give the AI the keys to everything without oversight.

Speaker 1

38:41

These fictional scenarios, as the playbook brilliantly highlights, really do eliminate very real vulnerabilities and design choices that we're grappling with in AI systems today. It's not just science fiction anymore. Absolutely okay. Learning from these sci fi tales and all the real world blunders and risks we've discussed, we truly realized that simply patching individual vulnerabilities isn't enough, is it. Security must be fundamentally built into the entire development process.

39:06

The book emphasizes this with the mantra trust the process and highlights the rise of integrated methodologies like DevSecOps, mL opes and now llmops.

Speaker 2

39:16

That's exactly right. These methodologies are all about integrating security considerations what we often call shift left security and automation throughout the entire machine learning life cycle. That means, starting from the very beginning, secure data preparation and management, secure model training and validation, secure deployment pipelines, and continuous security

39:37

monitoring and production. It's about making security a shared responsibility and an integral part of the workflow, not an afterthought.

Speaker 1

39:44

So specifically for LMS, what does that look like in practice? Securing the CICD pipeline.

Speaker 2

39:50

Yes, Applying robust security practices to your continuous integration and continuous deployment pipeline is critical. That includes secure coding practices, rigorous dependency mena management, carefully auditing open source mL components like PyTorch or TensorFlow and their dependencies, using SEA tools, strong access controls, secrets management, and continuous monitoring of the pipeline itself for compromise.

Speaker 1

40:12

And using LMS specific security testing tools you mentioned some earlier.

Speaker 2

40:16

Right tools specifically designed to probe LLMS for vulnerabilities are emerging. The playbook mentions open source options like text attack, which focuses on adversarial testing for NLP models, and garak, which acts like a vulnerability scanner specifically for LLMS, testing for things like prompt injection, PII, leakage, and hallucination patterns, sort of like a das Scanner, but for llms.

Speaker 1

40:38

And commercial are broader tools.

Speaker 2

40:39

Yeah, they're also broader frameworks like Microsoft's Responsible AI toolbox, which includes tools for assessing fairness, interpretability, and security, and tools like Discard. LLM scan looks specifically at ethical considerations, detecting bias, toxicity, and other potential harms. It's a rapidly developing area.

Speaker 1

40:58

Beyond testing, it's also about diligently managing those supply chain artifacts we discussed automatically generating, securely storing and making accessible those vital model cards and mL bombs.

Speaker 2

41:08

Absolutely, transparency and traceability are key, and then protecting your deployed application with runtime guardrails is essential. These can act as a safety net, you.

Speaker 1

41:17

Mean things like web application firewalls wafs or maybe runtime application self protection RASP tools adapted for llms exactly.

Speaker 2

41:26

These tools can sit in front of or alongside your LM application and provide run time protection. They can help with input validation to block known malicious prompts, output filtering to catch sensitive data or toxic content, enforcing compliance rules, and even sometimes detecting hallucination patterns or anomalist behavior. There are open source guardrail frameworks emerging like in Vidia's Nemo Guardrails or metas Lama Guard, as well as commercial solutions.

41:52

Often a mix of custom build and packaged guardrails provides the best coverage, and.

Speaker 1

41:56

Then, once deployed, continuous monitoring is crucial. What should teams be and looking for?

Speaker 2

42:01

You really need to be logging every prompt sent to the LLM and every response received. Centralize these logs, along with other application and system logs, into a security information and event management system or SIM. Then use data analysis techniques potentially including user and entity behavior analytics UEB to look for anomalies, unusual query patterns, spikes and errors, unexpected data access, attempts to bypass guardrails, anything that could indicate emerging threats or misuse.

Speaker 1

42:31

Okay, that sounds like a solid process, but the playbook suggests taking it even further, advocating for building an internal AI red team. What does that involve?

Speaker 2

42:39

An AI red team takes a fundamentally adversarial approach. It consists of security professionals who systematically try to challenge and break your AI systems, identifying and exploiting weaknesses before real attackers do. This concept was even specifically called out in US President Biden's Executive Order on AI Safety, so they.

Speaker 1

42:57

Simulate attacks, rigorously, assais vulnerabilities that automated tools might miss, analyze the potential impact, and then help develop effective medications. How is this different from a traditional penetration test.

Speaker 2

43:09

It's significantly different in scope and approach. Red teaming is typically an ongoing, continuous process, not a point in time assessment like a pen test. It's more dynamic, simulating real world adversary tactics, techniques and procedures across the entire defense spectrum, technical, procedural, even social engineering aspects. It adapts as your AI system evolves and aims to test your detection and response capabilities,

43:34

not just fine vulnerabilities. It's particularly good at uncovering complex LLM specific issues like subtle biases, potential for harmful emergent behaviors, or sophisticated prompt injection techniques that automated scanners might miss.

Speaker 1

43:47

Are there tools to help with this?

Speaker 2

43:49

Yes, tools are emerging to assist AI red teaming. The playbook mentions PIRECT Python Risk Identification Toolkit from Microsoft Research, which helps automate aspects of generating adversarial PRIMP and identifying failure modes, and for organizations that lack the in house expertise to build their own dedicated team, red team as a service options are becoming available, sometimes through bug bounty platforms like Hacker one, where you can leverage external experts to challenge your systems.

Speaker 1

44:14

So the insights from red teaming feedback into the development process absolutely.

Speaker 2

44:18

The lessons learned directly inform the development of new or improved guardrails, refinements to data access controls, and data quality processes, and can even guide efforts using techniques like reinforcement learning from Human Feedback RLHF URLAHF aims to better align the LM's behavior with human preferences and safety guidelines. Although it has its own limitations and can sometimes be gamed, it's all part of continuous improvement cycle.

Speaker 1

44:43

So when we look at the accelerating future of AI, I mean the numbers are staggering. GPUs are millions of times faster than the coprocessors we have in the nineteen nineties, far out pacing Moore's law. Cloud computing gives almost anyone access to vast scalable power. Open source models like Metaslama fam mixt roll with its efficient mixture of experts architecture are democratizing access to incredibly powerful capabilities. And then there's

45:07

multimodal AI. Text to image models like Dali, mid Journey Stable Diffusion have gone from generating pictures with wonky fingers to creating photorealistic, completely computer generated Instagram influencers.

Speaker 2

45:20

It's incredible progress.

Speaker 1

45:22

And now text to video with open ais Sora Microsoft's VESA producing talking heads from a single image, leading to convincing deep fakes even in live zoom calls. It's hard not to be both excited and honestly a little concerned about how fast this is all moving. How do we responsibly manage this accelerating power.

Speaker 2

45:42

Well, this brings us squarely back to responsibility and to the core framework the book proposes for building secure and trustworthy AI. The RAISE framework. RAISE stands for Responsible Artificial Intelligence Software Engineering. It's designed to be a flexible, practical, six step process to help developer systematically build robust defenses into their AI applications from the ground up.

Speaker 1

46:01

Okay, let's walk through RAISE step one.

Speaker 2

46:03

Step one is restrict the domain. This means deliberately narrowing the LM's scope and purpose significantly. For example, if you're building a fashion advice chatbot, it should be designed and trained to only give fashion advice, not act as a general purpose conversational AI that can be easily led off topic.

46:21

The book emphasizes that using smaller specialized models fine tune for a specific task, or rigorously fine tuning a general model to strongly reward staying on topic is often far more effective at preventing misuse and hallucination than just trying to bolt on restrictive guardrails.

Speaker 1

46:37

After the fact, focus the model makes sense step two.

Speaker 2

46:39

Step two balance your knowledge base. This is a delicate act. On one hand, you need to give the LM enough relevant, high quality data to perform its task well and avoid those embarrassing hallucinations, perhaps equipping it with domain specific knowledge via our ag and careful fine tuning. But critically, on the other hand, you must strictly limit any additional data sources or training data to only what is absolutely required for the task. Remember, anything the LM knows or has

47:08

access to is potentially at risk of disclosure. Extreme care must be taken, especially with any PII or confidential corporate data. Minimize the knowledge footprint oka Step three of RAYS. Step three implement zero trust. This is non negotiable. As we've discussed, Assume inputs are malicious. Assume outputs might be harmful or inaccurate. Screen all data pass to your LLM using input validation and filtering. Screen absolutely all output from your LLM using

47:34

output filtering for toxicity, PII and potential code execution. Implement strong guardrails at the boundaries. Treat the LM itself as an inherently untrusted component within your system architecture Step four. Step four. Manage your supply chain. This involves several key actions. Carefully select your foundation models and any third party training data sets, prioritizing reputable sources with transparent documentation like model cars.

48:01

Use extreme caution with large public data sets scrape from the internet, apply tools and processes to inspect them for intentional data poisoning, illegal materials, or inherent biases. You must actively account for possible biases and training data. For instance, a job candidate screening model trained predominantly on historical data

48:18

might inadvertently discriminate against women or minorities. Build and continuously maintain your mlbay bone for traceability, and secure your entire DevOps pipeline using tools like Software Composition Analysis SCA to vet all components right step five. Step five build an AI red team proactively seek out vulnerabilities before attackers do. Use a dedicated human lead team potentially augmented by automated red teaming tools to adopt an adversarial mindset and systematically

48:48

challenge your AI systems security, safety, and ethical alignment. Foster a security positive culture where finding flaws is encouraged, even if it might impact development schedules. Sometimes the long term pay off and building a more resilient and trustworthy system is immense.

Speaker 1

49:03

And the final step step six and.

Speaker 2

49:05

Finally step six monitor continuously implement comprehensive logging for all LLM interactions, every prompt, every response, every action taken by the LM or its connected tools. Collect these logs, along with system and application logs, into a centralized SIME system. Then actively used data analysis tools including UEBA ware appropriate to look for anomalies, patterns of misuse, signs of attack,

49:27

unexpected behavior or degradation, and performance or safety metrics. Over Time, security isn't one time fix, It's an ongoing process of vigilance.

Speaker 1

49:35

What an incredible deep dive into LLM security we've really covered a lot of ground today. We journeyed from Microsoft's ill fated tape chatbot and its stark early lessons in twenty sixteen all the way through to the cutting edge

49:48

complexities of AI supply chain vulnerabilities. We've explored the insidious nature of prompt injection, the unsettling realities of sensitive data disclosure, those bizarre and sometimes damaging hallucinations, and the critical, absolutely non negotiable need for a zero trust approach when dealing with these powerful models.

Speaker 2

50:05

And we wrapped it all up with the practical actionable RAISE framework Responsible Artificial Intelligence Software Engineering, which serves as a truly essential compass for anyone trying to navigate this

50:15

complex and incredibly rapidly evolving landscape. Remember, the power of llms and these emerging AI technologies is undoubtedly a game changer across almost every industry, and the curve of AI capabilities, driven by compute power, data availability, and algorithmic innovation, will likely continue to accelerate exponentially.

Speaker 1

50:33

It's true, as the sci fi author William Gibson famously said, the future is already here. It's just not evenly distributed, and yet it's striking how despite all these incredible advancements and all the hard lessons learned over the past decade, we still see businesses and individuals making fundamentally similar mistakes

50:50

to what happened with tay wayback in twenty sixteen. The temptation to rush ahead to give llms, more data, more integrations, more autonomy, often without fully considering the security implicates, seems to be growing every single day.

Speaker 2

51:02

So the core message, perhaps the provocative thought we want to leave you with today, really comes down to that classic adage often associated with Spider Man. With great power comes great responsibility. It is absolutely possible to create incredibly powerful, beneficial AI applications safely, securely, and responsibly, but it requires

51:21

continuous vigilance. It demands the dedicated application of robust security engineer and frameworks like RAYS, and it necessitates a cultural commitment to learning from every mistake, every near miss, every iteration. The future we build depends on our ability to create truly resilient, transparent, and trustworthy AI systems, because what we build today will inevitably shape our tomorrow

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript