The AI That Found A Bug In The World’s Most Audited Code

⁠¶ AI's Early Security Triumphs & Aardvark Intro

00:00

The dataset wasn't in English, it was in Russian, but it wasn't just in Russian, it was in like the Russian shorthand that these 20-year-olds are using to coordinate. It would have taken a diverse analytic team of...

00:11

linguists, technical experts, you name it. I mean, who knows how long it would have taken to import through that data. And, you know, we just suddenly had this alien intelligence that could just do it all day. We found a memory corruption bug up in SSH, which is one of the most highly audited pieces of software out there.

00:26

Anytime you're finding, you know, memory corruption in OpenSSH, like that's super interesting. Think about the blast radius, had that made it into Linux distributions? That backdoors like what, like half the internet? Language models and the tools that we can build on them, we actually have the ability to scale security intelligence to all the places that need it. To give these developers a fighting chance.

00:52

Open SSH is one of the most audited pieces of software on the planet. Security researchers have pored over it for decades, and OpenAI just built an AI that found a memory corruption bug in it. Matt Knight spent five years as OpenAI's CISO. Now he leads Aardvark, an AI agent that hunts for vulnerabilities the way a human security researcher would. It reads code, writes tests, and proposes patches. And it's finding bugs that humans missed.

⁠¶ OpenAI's AI Security Journey

01:18

This conversation traces the arc from GPT-3, which couldn't analyze a simple security log, to today's models that discover flaws in critical infrastructure. Matt and A16Z's Joel De La Garza discuss why defenders might finally be gaining the upper hand and what that means for the open-source maintainers currently outgunning the nation-state attackers.

01:42

Matt, thank you so much for coming back. Really do appreciate you coming on the show. You've done a couple of these with us talking about AI and security. I talked to a few folks at OpenAI, and several of them told me you had the most interesting job at OpenAI.

01:56

Which is saying quite a lot because I think everything that they've done so far has been pretty crazy. So I'd love to maybe just start off by hearing a little bit about you and kind of what you do and what your role is at OpenAI and go from there. Yeah, Joel, thanks. It's great to be back. Great to see you again. I think the last time I was here, I was having a chat with Vijay and Jason, and it's great to have the chance to reconnect. So I'm OpenAI's VP of Security Products and Research.

02:18

where I'm focused on applying AI to some of the hardest unsolved security challenges of our time. And the goal of the program is to create defensive advantage using AI. Prior to this, I was opening eyes, CISO and head of security. I was in that role for five years before moving into this new focus. And you are correct. It's a lot of fun. I get to work on some really incredibly important and interesting challenges with amazing people.

02:43

at a very important time. Awesome. And congratulations on getting rid of the CISO role. That's usually a reason to celebrate. I am so thankful and grateful for my five years in that role. Totally. I mean, it was the, you know, just, you know, I would have... Before moving into this new role, I would have said that I would be proud if that were to have been my life's work, but I couldn't be more excited and engaged by some of the things that we're working on.

03:10

we had the opportunity to put a really big dent in some very important problems. Yeah, absolutely. And so I think we, I mean, I think we last chatted maybe two years ago and the discussion.

03:21

was really focused on two parts. The first is sort of securing AI and kind of how do we deploy that? And I think that's kind of a road we've gone down and we've made a lot of progress on. I think the one that's really interesting and the thing that you've been focusing on is sort of how do we use AI to do security? And so we've seen over the last three years, right? I think we're at the three-year anniversary of the chat GPT moment. Can you believe that, by the way? No, I can't. I can't either.

03:46

I think it's one of those things where if you lay in bed all day, your days drag by. But if you're busy every minute of the day, it flies. And so it's just really flown by. Sure is. What's even more incredible is just that like we seem to have these like every two weeks, there's some new breakthrough.

⁠¶ GPT-4's Groundbreaking Analysis Skills

03:59

where it's like my measurement of how exciting tech is by how many late nights I spend playing with it. And like before ChatGPT, it was probably once every six months. And then after ChatGPT, it's once every three weeks, right? And just the pace of really cool stuff. And so the cool stuff that's happening now to kind of what you're working on is we're actually starting to see AI make a dent. It seems to make a dent in security. And it seems like perhaps it's starting.

04:25

with the more provable use cases like code security, and then probably going to trickle through. But would love to maybe hear kind of like your take on how that's working. Yeah, I appreciate that. Maybe to frame... my conviction in how ai is going to transform defense i think it might be good to walk backward and start from the beginning so you know i joined open ai in mid 2020 it was june 2020 and just to sort of to put a

04:52

To orient us in the AI sort of timeline, the frontier model of the era was GPT-3. I joined right as we were launching GPT-3 on the OpenAI API, which at the time was an alpha or beta research preview. It was very experimental. And I came in with all these grand aspirations of building an AI native security program. We have this incredible alien technology that can...

05:16

you know, do amazing things with, you know, text and language. Boy, I can't wait to see how much I'm going to be able to automate, you know, with this incredible frontier model, GPT-3. Spoiler alert, nothing. The model just was not good enough for real.

05:29

security automation or operational tasks or things that would actually create impact for a security program. There are a number of reasons why. It was very limited in context length compared to models that we have today. The token vocabulary didn't... you know, wasn't well-oriented for a lot of what we wanted to process with. And then it was just like a model with limited sort of world knowledge and horsepower. It just wasn't able to really do that much for us. You gave it.

05:55

you know, multiple choice questions or trivia, it could, you know, occasionally fake its way. But if you gave it, you know, a series of log lines and asked it to review them and classify them or a section of code and asked it to spot the bug, you know, it couldn't do it. Or it would make something up.

06:10

Yeah, or it would make something up. The good news for us, though, is that a lot has changed since 2020. We're here in 2025. The frontier model is GPT-5. We've had many breakthroughs along the way, whether it's the... improvements to RLHF and instruction following that make the models more steerable, things more recently like the reasoning paradigm, which we've seen contribute to a number of breakthroughs for us too, means that we have models that are really incredible.

06:39

And there have been a few points along the way where for me and the team that really forced us to update. And the biggest ones came during GPT-4 training. So again, to bring us back to the AI timeline, that was summer 2022.

⁠¶ AI's Role in Security Operations

06:52

And GPT-4 was a really incredible moment at Opening Eye because it was a true all-hands-on-deck push. We were embarking on training a model that had to one-up GPT-3 and 3.5, which were quite good and quite profound. We had the hypothesis that the scaling laws that we're going to hold, we'd be able to add more data, more compute, and make an even better model. So we had a lot to live up to.

07:18

Everybody across the company, from research to our infrastructure teams, the security teams, everybody was all hands on deck working on this big effort to train this model. And it's not just one and done. Training a model takes time. As the model is baking in the oven, we get these more and more form snapshots of it that we can start to test and sample and see what the final product is going to look like. So we on the security team, we got our hands on a...

07:45

snapshot of GPT for when it was about halfway there. So it wasn't as good as it was going to be, but it was enough to start to get a feel for what this model was going to look like when it popped out of the oven. And we put it through some tests. And, you know, this is mostly us, you know, sort of, as you were saying late at night, like playing with it and just kind of, you know, just being, you know, just playing with it and making first contact with it. And there were two.

08:12

tests that we ran that really kind of wowed us in the moment. The first was we took some of our security logs and we ran them through the model. prompted, you know, pretend you're an expert security analyst reviewing these logs and determine to summarize the behavior and determine if what you see merits escalating to a human for further review.

08:34

Typical tier one, level one triage. Exactly. That sort of workflow. Yeah. It's the sort of thing that every security team does, right? Probably thousands of times a day. You would hope, but then there's the question of, you know, how much logs can you look at? Like how good are your people at it? And the log source that we looked at were interactive SSH logs. So imagine like your bash history or command line level logging that you would get from an employee like...

08:59

you know, either running commands in their terminal on their laptop or like SSH-ing into a server and doing SREs doing prod stuff or something. Exactly. You know, you could argue that having a human review that data, you know, wouldn't just be ineffective. It would be like...

09:14

Cruel and unusual? Particularly a war crime, yeah. You know, just review all this benign stuff and find the needle in the haystack and, oh, by the way, your job and the security of the company depends on it, right? And then, like, honestly, right? And not to distract a bit, but that's always been

09:28

a challenge with security teams, which is that you hire these incredibly expensive, really talented, super skilled people, like a reverse malware binary engineer from the NSA that's done it for 10 years, and then you have them go look at SSH logs. But it's in the details, and that's what matters. Totally, yeah, yeah. So back to this anecdote, right? So we get these log sources, these really kind of pedantic, just rote sources, running through this prompt, and the model just kind of got it right.

⁠¶ Aardvark's Workflow & Zero-Day Discovery

09:56

You give it an example that had, you know, just a benign example of, you know, like somebody configuring a web server or, you know, doing normal tasks. And it says, hey, you know, this is fine. No escalation needed. Since we're here, some tips to maybe make it more secure. Interesting value add there. But then you take those same payloads and you start to turn up the heat a bit.

10:21

You maybe touch secrets that are there. You, you know, sort of in the limit, you know, you open a reverse shell or something like that. And the model said that right there, you know, you should have somebody look at that. And this was, again, just to bring us back to the framing, this was an early version of GPT-4. I'd love to maybe just kind of understand, I think I know the answer, but maybe from you to understand sort of the root cause of like...

10:44

how it's spotting things like reverse shells being bad. Is that because there's so many incident descriptions that are in the training dataset and that it's kind of trained on that behavior? I guess, what do you think, what do you infer as kind of driving that analysis? You know, if we had one of our researchers here, I think they'd be able to think very deeply about this and give you a more scientific reason. Part of it is the world knowledge. Part of it is the training, at least in that era.

11:08

And with things like the reasoning paradigm, we've really been able to push it even further in terms of what models can do and deduce about this behavior. We can have more on that later. So that was like the first example that really blew us away. This was just impossible with GPT-3 or 3.5. It was a totally new capability for us. The second that I want to just briefly talk about from that era too was we had this...

11:32

threat intelligence data set that we got in our hands on. There was this cyber criminal group that had imploded earlier that year. And as part of it, their internal chat logs went up online. So, you know, it was this group when they kind of dissolved. this interesting data set of 60 something thousand you know messages back and forth of these like

11:52

You know, cyber criminals like plotting and doing crimes and stuff winds up online. Yeah. And, you know, they always have great OPSEC. And so there's totally nothing identifying in those chat logs, right? Well, so we got this data set, right? And I think we used Langchain to be able to... like, you know, do like the context stuffing, manage all that. But we ran that through again, GPT-4 and started asking questions of it. You know, who are these guys? What...

12:14

targets are they going after? If we want to defend against these groups, what types of indicators or tradecraft should we look out for? It led us right to it. It told us that they were going after primarily civilian soft targets and things like that. They were going after some transportation companies. We saw some interesting results from that.

12:33

Either O'Day or some sort of access through a security appliance that they were using to get in. And we were also seeing that based on their back and forth that these actors were largely being successful at the company.

12:46

the companies that were being targeted were largely successful at, you know, finding them and kicking them out. You see this because we can see them kind of getting in and then, oh no, we lost access. Getting mad. You know, what makes this, this is interesting, I think for a couple of reasons, but what really.

13:00

I think seals the deal is the data set wasn't in English. It was in Russian, but it wasn't just in Russian. It was in like the Russian shorthand that these, you know, 20 year olds are using to coordinate. And, you know, it would have taken a diverse. analytic team of linguists, technical experts, you name it. I mean, who knows how long it would have taken to pour through that data. And we just suddenly had this alien intelligence that could just do it all day, which was great.

13:29

I mean, that really changed the game for us. We really started to look for opportunities to bring language models into the program. The things that we had the most success with were, frankly, pretty operational. Automating parts of the apparatus that it takes to run a security team.

13:43

So these are things that aren't even that technical, but you need enough technical sort of horsepower to succeed with. They kind of started with that, like, sort of in the operational space. And largely, it was still very much human in the loop.

13:57

Oh, entirely. It was augmenting existing staff. Basically entirely. And I think one of the things that has evolved for me and my understanding has been when we had GPT-4, we had a tool to largely help us with efficiencies, right? There were things that could... Help our teams be more effective, help them cover more ground, help remove some of the toil in their work. One example that is, I think, frankly trivial.

⁠¶ AI Augmenting Human Security Teams

14:26

That's also why I love it, is we built this bot, like a Slack bot that you could use to gather information from employees if you're investigating something. Oh, yeah. Rather than, as you said, you're...

14:38

You're a highly skilled security engineer having to, you know, hello, colleague, my name is Owen. Talk to a marketing person, right? Like, oh, here's going to be a problem. But then do that across all their caseload, right? Be able to just have a bot go and get that information and bring them back and then have them kind of.

14:53

review it and make whatever decision is needed. That helps that engineer be more effective. It also removes a lot of toil from their work of having to do all this juggling. So I started... first thinking about this as really like an efficiency opportunity, but what we've seen with the progress of what we see with more...

15:15

with modern models, with breakthroughs like the reasoning paradigm, with what they've been able to do, is that they're now enabling us to do things that were just previously impossible. And that largely informs what we're working on with Aardvark and some of the work within our program.

15:28

So maybe let's talk about that, right? Because I think the current state of the art, I haven't yet played with Aardvark, so I can tell you if you can hook us up, we'd love to take a look. I might know a guy. There you go. Yeah, let's get that set up after this. It seems the order in which AI is solving problems maps pretty clearly to things that can be machine verifiable. And so it feels like a lot of the security problems that folks have been trying to throw AI at.

15:54

are just not machine verifiable. But code and security is. And so maybe hearing a little bit about, well, first of all, what is Aardvark? Probably getting the cart in front of the Aardvark. So maybe what is Aardvark and kind of what's the approach and starting off there?

⁠¶ Defending Open Source Against Nation States

16:08

Yeah, so Aardvark is our agentic security researcher. It looks at code, it finds zero days in that code, and then it patches those zero days. It generates the patches to fix them. Aardvark... And this is all using... this isn't using it's not running external tools like you're not running

16:25

like a static analysis, like a fuzzer, right? You're doing this with the models that you've built internally. Aardvark takes a very language model forward approach to doing this. And in doing so, maybe a better way to describe it... Similar to how you do code gen or something. Well, I think maybe a better way to analogize it is...

16:40

It looks for bugs the same way that you or I might or an expert vulnerability researcher would. It does so by reading the code, by analyzing the code. It'll write and run tests. It'll actually try to stand up scaffolding and explore it. It really experiments, explores, and attacks the problem the same way that a human might. And if you look at some of the conventional methods for doing vulnerability discovery, to your comment on...

17:07

things being machine verifiable, they were largely human-driven tasks. You have vulnerability researchers who will pore over source code, who will reverse engineer binaries. They'll run fuzzers that will generate thousands or tens of thousands or hundreds of thousands of crashes that they then need to review and sort of map back to the conditions that cause them. What we're seeing with Aardvark is that the language model is able to...

17:33

in the same way that a human might to do that. And then you combine that with the ability to generate and execute code and to test these hypotheses, and you wind up with, I believe, a very compelling capability. The Aardvark workflow is actually very simple. You hook Aardvark up to a code base and the first thing that it will do is it will generate a threat model. Just any good security engineer would, right? It'll take a broad look at the code base. It will...

18:00

determine its assessment of the sort of security objectives and then design an architecture of the code base. Essentially a planning phase. Basically, yeah. That's right. So it looks at that and it sort of builds its own model for, you know, what is this code base and what are the security properties of it?

18:15

It then will look for vulnerabilities within the code base using that sort of exploratory, very agentic way of looking for them. Once it finds a vulnerability, it will actually go a step further and will attempt to verify it. We call it validate within a product where it will, within a secure sandbox, will run the code and will attempt to trigger the condition to verify that the vulnerability is in fact a truly correct vulnerability.

18:43

And then at the output of that step, we obviously lean on our partners. We're doing this at OpenAI, right? So we have access to great tools like Codex and we use Codex to generate a patch. And then we re-scan the patch with Aardvark. Oh, nice. And the happy path here is that by the time a human lays eyes on the finding, you have the finding and then you have the patch. It's all right there. You just read it and you click a button. You've got a patch.

19:11

Started Aardvark initially as a research program within OpenAI. I got linked up with Dave Attell, who's a... A legend. A pioneer of a lot of application security. Dave's amazing. I got linked up with Dave. about a year ago or so. And Dave was largely working on independent research applications of LLMs to security. And he came in and we started this really just as a research project of let's see what we can do here. And the results started coming.

⁠¶ Democratizing and Securing Critical Infrastructure

19:39

So we started scanning OpenAI's own code. In addition to helping to solve some security problems, we got feedback from our development teams that were quite interesting. We found that devs... wanted more from it, not less, which is not something that as a CISO, you know, trying to, you know, protecting a CLC is not something you can take for granted. And are you sliding like all the way left into the devs terminal?

20:03

No, we're not today. So Aardvark... Still living CICD kind of... It's really for security engineers today. That's kind of where we're starting. Gotcha. In part because that's where we think we're going to get the best feedback as we look to hone the capability. Devs don't want to fix security problems. They want to ship cool stuff, right? No, but what was really interesting about the feedback that we got was that

20:22

Developers found that it was explaining these bugs in ways that were helpful to help them understand. That makes a ton of sense though, right? The average security engineer is not typically at the level of a senior software developer, right? But even if they are, right? I mean, you think about, you know, you drop a ticket on somebody that says, you know, this is vulnerable to CDE or 2020. And then you read the advisor and you're like, yeah.

20:45

What do I have to do about this? Aardvark, because it is contextual and the model can understand the context and generate tailored responses, can actually give feedback to the developer that explains in context. not only with the issues, but how to fix it. Yeah, totally. So they really liked that. They started asking for more findings from it. They wanted us to bring their new hires in on it. They really incorporated it as part of their workflow, which...

21:13

to us, we thought was a really powerful proof point that we were on the right track. So, you know, from there, we moved on to scanning some open source projects that we thought were really important. I'm curious kind of what languages are you seeing this be the most effective with?

21:27

You know, it's honestly, it's a mix. So, you know, we're expecting it to be, you know, good at system code. That's where we started. You know, we do a lot of work in sort of modern languages and open AI, but we also have a lot of, you know, sort of memory unsafe code too. I think one of the things that was surprising to us was that Aardbark was performing across a broad spectrum of stacks. So when we moved to open source, we found some memory corruption bugs in some...

21:55

Very interesting, very highly audited targets. Anything written in C that probably runs at the infrastructure level. That's right. In the first place I looked. We found a memory corruption bug up in SSH, which is one of the most highly audited pieces of software out there.

22:11

Anytime you're finding memory corruption in OpenSSH, that's super interesting. And we reported that one, got issued a CVE, patched it, all that. The OpenSSH team is amazing. But so we started sort of broadening the aperture and looking at these open source projects. found that the capability was generalizing pretty well and that we were having success there too. And that's really what gave us the push to see what else we could do with this tool, to start to expand access and put it out there.

22:39

I believe that there are a few things that are novel with it. The first is that it finds zero days. You think about where... Think about the sourcing of those types of issues conventionally. It's from humans, as we said. It's a human vulnerability researcher finding a bug. It's a fuzzer that's generating crashes and humans sifting through them. Things like that. Aardvark can actually find those novel issues.

23:06

And the second is the connection with patching, right? The fact that we can use code gen, the G in GPT stands for generative. We can lean into that to generate the fixes. And in doing so, I think we can just transform the way that software is built and secured. The thesis we had, I think, two years ago here, looking at the, because, you know, there have been a lot of waves of security.

23:27

AI, using AI to solve security problems. Like we're probably in wave three or four right now, just in the last, since ChatGPT. And the thesis we had at the very start of this was, you know, things start to work when you start to see novel. zero days found in highly audited source code bases, right? And so the first wave of this stuff was, oh, look, there's a bunch of cross-site scripting errors that we find in these JavaScript, you know, these JavaScript-based...

23:53

applications that are open source. And now hearing that we're getting to the point where we're finding memory corruption issues and core infrastructure code that's probably written in C and are particularly sharp edge like that, that just seems like it's starting to trigger.

24:06

this stuff is actually working, right? I think we've gone beyond sort of like the, we've overfit these things and it's just looking for patterns that found in previous CVEs. And now we're doing that novel. Is that kind of what you're saying? I think that's a good assessment. That's right.

24:20

That's fascinating. And so that covers, I think, and I think it just makes total sense, right, as OpenAI and the other frontier labs lean into Cogen as Cogen becomes a primary product that these folks are selling.

24:33

This is a feature, right? The security that you generate needs to be, or the code that you generate needs to be secure. And so it makes total sense. One of the other areas, and this goes back to kind of the first use case you talked about, where we've seen a lot of excitement, a lot of energy, is around the idea that...

24:47

these AI agents will start replacing people. And this is, I guess, a continuation of maybe the MDR space or the AI SOC space. And I'm just curious, what has your experience been on that side of the equation? Clearly, as we talked about, these things are very additive to tier one, level one analysts. Are we getting or approaching the point where they actually start replacing that labor or are we still very much in the augmenting phase? Oh, I think there is no line of sight.

25:14

to any impacts there. Like the cybersecurity talent shortage is frankly almost a meme at this point. I mean, we need tools to augment the people that we have because we just don't have enough people, right? I mean, it's an exquisite skill. That's required to be a security engineer. You have to be technical, you have to be operational, and it's a specialization within a specialization. We need as many people entering the cybersecurity workforce as possible, and we need to equip them with...

25:41

the best tools for them to be the most effective. So I feel totally bullish on that. That said, as we were saying at the top, there's a lot of drudgery that goes into doing security work. It's the details, it's the late nights. It's coordination. It's business process. It's not all malware research. Figure out this JSON blob, right? Threat Intel and things like that. It's a lot of business process too, right? And maybe those...

26:09

That gets some people out of bed in the morning, but it's just part of the job. And these are tools that are going to help teams be more effective, more efficient, solve problems that frankly need to be solved, and hopefully in ways that are more ergonomic and sustainable. Yeah, yeah. I mean, it certainly, I think the most recent number I saw was the 3.5 million unfilled security jobs in the US, I think was the number.

26:36

I'm sure it's inflated. It seems completely insane. I think just having tried to hire people in this role, in this field, is always such a challenge that it just always feels like there is a massive shortage. But it seems like maybe 2026 is the year we start to break the hump, right? Maybe these tools get us to a point where we don't need to, we need not necessarily need less bodies, but we can do more with the bodies that we got.

27:03

There's also something to be said for the on-ramp it can provide, too. These tools are incredibly useful for learning new concepts and exploring. My pathway into security was through research, and I was doing research on wireless systems.

27:15

you know, these were like kind of hard brainy topics. There was not a lot of, you know, public domain information about it. And, you know, the ability to, I just think about how much that work would have been transformed with a tool like ChatGPT. I think there's a lot that can be said for that too. Yeah, 100%.

27:29

I guess we've talked about, it's funny, the order of this discussion has been sort of like we started with the least sexy things first, sort of like sec ops, then code. You know, the thing that people always talk about, the focus, the thing that always gets the most clicks.

27:43

And the thing that gets the most attention is always the hacking. It's always the offensive stuff. It's the offensive cyber penetration testing, all these sorts of things. We've seen with some of the other labs, and you guys have published as well. Nation states using tools, all this kind of the usual, what you would expect to see, right? If you've built a great product, of course, nation states are going to use it.

28:02

Also, I'd like to point out that just seems like a really huge win for the American Frontier Labs, like sort of that foreign adversaries are using our tech as opposed to their own domestic ones. So congratulations. We're still in the lead. One way to look at it, I guess. Hey, look at that, you know.

28:16

They're not running deep seek models. So awesome. I guess the question would be like, it seems the nature of the attacker is that you have infinitely many shots on goal and you only need to be successful once. And so it makes sense that if I have this. Mechanical Turk that can run infinitely, provided I have infinite money, I can get this thing to find a bug. And so it seems to be that there's an alignment there. But I'm curious from your perspective, we've seen...

28:43

A lot of automated pen testing companies, we've seen a lot of sort of more offensive cyber research people, threat modeling, etc. Does it seem like these models are better situated for doing the offensive work? Or are they going to be... Are they more situated for the blue teamwork? I'm just kind of curious your take on how this space is developing. So, you know, you say that if, you know, the attackers get to, you know, run all these mechanical Turk shots on goal.

29:08

So do we. That's what I think is really exciting here. I guess the difference is that you have to be 100% right on the blue team. Attackers can fail 99%, but they get the one. I think that's debatable. I'm sure you get some access, but then, you know, anyway. You get booted out. Sure. Yeah, yeah. I mean, it's cat and mouse. Right. Really, like the history of human conflict, right, is defined by this sort of evolution and balance.

29:35

I think we don't have to bring AI into the equation to look at some of the security shortfalls that exist in the ecosystem. The state of modern software security is quite uneven. Some might even say bad. The scale and complexity of it has reached such a state that we need tools like this, frankly. We need the ability to scale security expertise to all of the developers and organizations that need it. One of the reasons why...

30:04

why I'm personally really motivated to work on Aardvark is to do something and give something back to the open source community. And we should come back to that. But just to speak to the threats real quick. So, you know, we... OpenAI publishes through our reports. We try to share what we're seeing so that other labs and stakeholders can learn from the activity. I think you guys were the first lab to start putting stuff out there. Just to speak about that.

30:30

We published our first threat report. It's a joint effort with Microsoft's Mystic, their threat intelligence team, back in, I think it was early 2023, where we used some of the... We worked together and we were able to identify threat actors from China, Russia, Iran, North Korea, learn what they were doing and then kick them off and get rid of them. And we found that they were experimenting but largely weren't being that successful.

30:58

And since then, we put out more threat reports that study, and we've also built a team and a whole apparatus to do this at scale and be really good at this. And what we're seeing is that these adversaries are interested and they're motivated. My editorial, though, is that you look at what defenders are doing today and the balance of trade or the value is accruing far in their favor. I think if you were to poll most CISOs, they would agree with that. Yeah. I mean, I think it's just the hiring.

31:27

If you want to hire a red team or you want to hire a pen tester, you get hundreds of applications. But you want to hire like a really good blue team engineer, you'll find like three, right? I just think, I think there's just an attractiveness to that.

31:39

to the offensive side that just sort of unbalances, makes the equation look more unbalanced than it is. To your point, I think that's why the value for these tools will more readily accrue to the blue team because the blue team is consistently the one where I think you find the biggest labor shortage. Could be. I mean, you have to play offense to play good defense, right? I mean, that's how you...

31:58

But red team always wins, right? That's always, it's sort of the easy side of the equation. The defense is the hard one. Yeah, but you do it again. I guess you get to do it more, right? It's a program and a journey, right? Do you think we get to a point where we just have sort of...

32:12

I know this is a bit of a loaded question. Do you think we get to a point where we just have kind of continuous testing? Because all the things we do in security are always shots in time, right? It's, okay, this was last week. Okay, this was this week. Do you think this actually gets us to a point where we can do this?

32:26

All the time? Well, that's what Aardvark is. Aardvark is sort of continuous and proactive. It's always on. It's auditing changes to your code in basically near real time. And one way of framing it is that it's a... And it's a senior AppSec engineer who's always there just checking your work and it's going to tap you on the shoulder if there's a problem there. And that's just one surface, right? We're just looking at code. Yeah.

32:54

And, you know, you can think about how this could generalize to other parts of the enterprise and other places where it's needed. Maybe it's a, I want to go back to the open source topic because this is one that I. I'm very impassioned about and believe in that, you know, software is so complex. The open source community gives us so much. They are also under-resourced. They're overworked. And, you know...

33:18

OpenAI has the privilege of being able to hire and staff a security team, but even some of the best resourced projects struggle to build security programs. And then you go further down. Yeah. Frankly, the story of the year for me last year was the XUtils. Oh, yeah. Oh, yeah. It was incredible. And it wasn't none of the tooling in the tool chain caught it. It was some overly obsessive.

33:45

Principal-level engineer that was frustrated about latency and keystrokes, right? We got lucky as hell with that engineer. If you're listening, thank you. Yeah, totally. I'd buy you a beer. So just for the benefit of the listeners. maybe set the stage on this. So XE Utils is an open source. It's a library that implements the XE compression algorithm and was maintained by a solo developer.

34:11

This library was used in a number of places. Notably, I think it was Systemd, which is a component of Debi and other major Linux distributions. This developer, I think they got socially engineered, right? Basically took on a maintainer who turned out...

34:25

not to be who they said they were and had the intents that they said that they were. They added a sort of stealthy and discreet backdoor right before a major release, which made it into a pre-release version of SystemD, which is ultimately where it got caught. And just, you know, think about the blast radius, had that made it into Linux distributions like that, that backdoors like what, like half the internet? Oh yeah, totally. Probably. And, you know, that's the one that we found. Yeah. Right.

34:52

There's been a lot of speculation as to who was behind the attack, but what chance do these open source developers, these heroic volunteers who are spending their time to contribute their knowledge and give back? deliver these building blocks that enable the industry to do so many amazing things. What chance do they have against a full force of a foreign intelligence service or a really determined criminal actor group who's going to...

35:21

Those package maintainers probably have 15 to 20 other packages they're maintaining. It's just ridiculous. We talk about this all the time. You look at the NPM issues that we've been having. Yeah, it's creating a real problem. Language models and the tools that we can build on them, we actually have the ability to scale security intelligence to all the places that need it. To give these developers a fighting chance, the tools that they need to...

35:49

to really be empowered in security here. And with Aardvark, we're hoping to do something really big around open source. We haven't exactly figured out what, but we're in private beta now and we... Open call to open source maintainers. If you want to be on the beta, please reach out. We'd love to hear from you and we'd love to make Aardvark a tool that works very well for you. There you have it, Aardvark for open source maintainers. I think we'll get a...

36:13

You'll get a big inbound from this one. Yeah, it's a topic I'm very impassioned about this. My journey into security, I mentioned I was doing wireless research. I started doing security research, largely using open source hardware and software. tools like the GNU Radio Project and different open source SDRs and things like that. I frankly have to thank that whole community for my career because who knows if I would have gotten into security without that.

36:40

And, you know, it's just such, you know, amazing, talented and, you know, passionate people who just volunteer their time, you know, all day, every day. And, you know, I hope that Aardvark, you know, I hope that with Aardvark, you know. the team and I are able to put our finger on the scale and deliver something that really helps these developers. Absolutely. Well, I mean, I think, and I think the dream is right. It's sort of the.

37:03

Security has always been kind of the rich person's game, right? Like it's the global banks, right? The big defense contractors that have unlimited budget hire all the people, they buy all the cool toys. And it just seems like this is a wave where we're going to start to democratize some of this stuff. And it'd be great if my local mom and pod dentist had the same security profile as my investment bank, right? It just seems like that's a much better world.

37:27

where everyone can get access to this kind of stuff. I hope you're right. And I mean, we look at sort of critical infrastructure writ large, and there's a lot that we can do here. Yeah. Yeah. It's the kind of thing where... It's funny, right? Because I think this is very much a mid-2010s kind of attitude, which was like, we got to stop using security as a competitive advantage.

37:48

It's sort of like a world in which we win because we don't get hacked is not a good world to live in, especially when you start talking about things like nuclear power plants and airplanes. You know, just throwing back to the podcast we did with BJ and Jason a couple of years ago.

38:02

you know, these guys are great. And, you know, we, you know, no matter what, you know, gets reported on in terms of, you know, no competition in the ecosystem, you know, we all were facing the same threats and challenges. And, you know, we, you know, Very sort of aligned and united in that, which I thought was a really sort of great and appreciated relationship. Absolutely. Well, thank you so much. This has been a wonderful conversation.

38:25

Great to hear about all the progress and congratulations on having the coolest job at OpenAI. That's quite a title. Thanks, Joel. A lot of fun. Great conversation. Thanks for having me. Thank you. Thanks for listening. If you enjoyed the episode, let us know by leaving a review at ratethispodcast.com. We've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only.

38:49

should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com forward slash disclosures.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript