Amazon is betting on agents to win the AI race

⁠¶ Intro / Opening

00:00

Support comes from ServiceNow. We're for people doing the fulfilling work they actually want to do. That's why this ad was written and read by a real person, and not AI. You know what people don't want to do? Boring, busy work. Now with AI agents built into the ServiceNow platform, you can automate millions of repetitive tasks in every corner of your business, IT, HR, and more, so your people can focus on the work that they want to do. That's putting AI agents to work for people.

00:27

It's your turn. Visit servicenow.com. Support for this show is brought to you by CVS Caremark. You know the saying, less is more. Well, with CVS Caremark, it changes to more for less. With more care, more guidance, and more expertise, CVS Caremark helps your plan members spend less on their prescription drugs. CVS Caremark leverages their scale to negotiate lower net costs for medications every day.

00:54

And that's exactly what your members can count on from CVS Caremark. More ways to maximize their benefits. Go to cmk.co slash stories to learn how we help you provide the affordability, support, and access your members need. Support for this show comes from Robinhood. Wouldn't it be great to manage your portfolio on one platform?

01:18

With Robinhood, not only can you trade individual stocks and ETFs, you can also seamlessly buy and sell crypto at low costs. Trade all in one place. Get started now on Robinhood. Trading crypto involves significant risk.

01:32

Crypto trading is offered through an account with Robinhood Crypto, LLC. Robinhood Crypto is licensed to engage in virtual currency business activity by the New York State Department of Financial Services. Crypto held through Robinhood Crypto is not FDIC insured or CIPIC protected. Investing involves risk, including loss of principal. Securities trading is offered through an account with Robinhood Financial LLC, member CIPIC, a registered broker-dealer.

⁠¶ Introduction to AI Agents

01:57

Welcome to Decoder. This is Alex Heath, your Thursday episode guest host and deputy editor at The Verge. One of the biggest topics in AI these days is agents. Or the idea that AI is going to move from chatbots to reliably doing things for us in the real world. The problem with agents right now is that they aren't really that reliable at all. There's a lot of work happening to fix that.

02:22

Which brings me to today's guest, David Luan, the head of Amazon's AGI Research Lab. I've been wanting to chat with David for a long time. He was an early research leader at OpenAI, where he helped drive the development of GPT-2, 3, and DALI. After OpenAI, he co-founded Adept, a research lab focused on agents. And last summer, he left Adept to join Amazon.

02:46

where he now leads the company's AGI lab in San Francisco. We recorded this episode right after the release of GPT-5, which gave us an opportunity to talk about why he thinks progress on AI models is slowing down. The work that David's team is doing is a big priority for Amazon, and this was the first time I've heard him really lay out what he's up to. I also had to ask him about how he joined Amazon.

03:13

His leaving adept was one of the first of many deals that I call the reverse acquihire, where a big tech company all but actually buys a buzzy AI startup to avoid antitrust scrutiny. I don't want to spoil too much, but let's just say that David left the startup world for big tech last year because he knew where the AI race was headed. I think that makes his predictions for what's coming next worth listening to.

⁠¶ David Luan's AI Journey

03:52

David welcome to the show. Thanks so much for having me on the show. I'm really excited to be here. It's great to have you. We have a lot to talk about. I'm super interested in what you and your team are up to at Amazon these days. But first, I think the audience could really benefit from hearing a little bit about you and about your history and how you got to Amazon.

04:12

You've been in the space for a long time and you have a pretty interesting career leading up to this. And yeah, could you walk us through a little bit kind of your background in AI and how you got to Amazon? First off. I find it absolutely hilarious that I've been around the field for a long time because it's true in relative terms. This field is so new.

04:32

And yet, nonetheless, I've only been doing AI stuff for about the last 15 years. So compared to many other fields, it's just so new. Well, that's an eternity in AI. years it is an eternity in ai years i remember when i first started working in the field i worked in it just because i thought it was interesting i thought having the opportunity to be able to build systems that could

04:52

think like humans and hopefully deliver superhuman performance and things was such a cool thing to do. And I had no idea that it was going to blow up the way that it did. But my personal background, I led the research and engineering team at OpenAI from 2017 to mid-2020, where the teams did GPT-2 and 3 and Clip and DALI. And every day was just so much fun because you would show up to work.

05:16

and it's just your best friends and all trying a bunch of really interesting research ideas. And there was none of the pressure that exists right now. Then after that, I led the LLM effort at Google, where we trained a model called Palm, which was a... quite strong model for its time, but...

05:32

a bunch of us shortly after that decamped to various startups and my team and i ended up starting adept it was the first ai agent startup we ended up inventing the computer use agent effectively there was some good research beforehand we had the first production one and amazon brought us

05:46

to go run agents for Amazon about a year ago at this point. Great, and we'll get into that and what you're doing at Amazon. But first, given your OpenAI experience, and we're talking now less than a week from the rollout of GPT-5, I'd love to hear...

⁠¶ Reflecting on GPT-5 and AI Convergence

06:01

you reflect on the release of GPT-5 and what it says about the industry, what you thought when you saw it. I'm sure you still have colleagues at OpenAI who worked on it. But yeah, what does that release signify? I think it really signifies a high level of maturity at this point. The labs have all figured out how to reliably tape out increasingly better models. One of the things that I always harp on is that your job as a frontier...

06:28

model lab is not actually to train models. Your job as a frontier model lab is to build a factory that repeatedly churns out increasingly better models. That's actually a very different philosophy for how to make progress.

06:42

All you do is you think about, you know, let me make this tweak. Let me make this tweak. Let me try to glom on people to get a better release. If you care about it from the perspective of a model factory, what you're actually trying to do is you're trying to figure out how you can build all the systems and processes. infrastructure to make these things smarter.

06:59

But with the GPT-5 release, I think the part that I found most interesting about it is that a lot of the frontier models these days are converging in capabilities. I think in part, there's an explanation that one of my old colleagues at OpenAI, who's now a professor at MIT came up with called the platonic representation hypothesis. Have you heard of this hypothesis? No. So the platonic representation hypothesis.

07:21

is this idea, similar to Plato's Cave, which is really what it's named after, that there is one reality, but we as humans, for example, only see a particular rendering of that reality. Like in Plato's cave, it is the shadows that you see on the wall of the cave, right? And so that's the same thing for LLMs. LLMs see slices of this reality by the training data that it sees. YouTube video of, for example, someone going for a nature walk in the woods somewhere is all

07:52

ultimately generated by the actual reality that we live in. And as you train these LLMs on more and more and more data, the LLMs become smarter and smarter. They all converge to representing this one shared reality that we all have. And so if you believe this hypothesis... What you should also believe then is that all LLMs will converge.

⁠¶ AI Benchmarking and Human Attachment

08:10

to the same model of the world. And I think that's actually happening in practice from seeing Frontier Labs deliver these models. Well, there's a lot to that. I would maybe suggest that a lot of people in the industry don't necessarily believe we live in one reality. When I was at the last Google I.O., Sergey Brant and Demis were on stage, and they both seemed to maybe believe that we were existing in multiple realities. So I don't know if that's...

08:36

I don't know if that's a thing that you've encountered in your social circles or work circles over the years, but not everyone in AI necessarily believes that, right? I think that hot takes above my pay grade. I do think that we only have one. Yeah, we have too much to cover. We can't get into multiple realities. But to your point about everything converging. It does feel like benchmarks are starting to not matter as much anymore and that the actual improvements in the models.

09:03

like you said, are commodifying, everyone's getting to the same point and GPT-5 will be the best on LM arena for a few months until, you know, Gemini three comes out or whatever and so on and so on. And if that's the case, I think what this release has also shown is that maybe what is really starting to matter is how people actually use these things and the feelings and the attachments that they have to them. So OpenAI bringing back.

09:31

4.0 because people had a literal attachment to it as a thing that they felt. And people on Reddit saying it's like my best friend's been taking it away. And so it really doesn't matter that it's better at coding or that it's better at writing. It's your friend now. And that's freaky. But I'm curious, when you saw that and you saw the reaction to GPT-5, did you predict that? Did you see that we were moving that way? Or is this something new for everyone?

10:01

There was a project called Lambda or Mina at Google in 2020 that basically was ChatGPT before ChatGPT, but only available to Google employees. Even back then, we started seeing employees start developing personal attachments to these AI systems. Humans are so good at anthropomorphizing anything, right?

10:20

And so I wasn't surprised to see that people form bonds with certain model checkpoints. But I think that when you talk about benchmarking, the thing that stands out to me is benchmarking is really all about, at this point, people are just studying for the exam, right?

10:34

know what the benchmarks are in advance everybody wants to post higher numbers it's like the megapixel wars right from the early digital camera era like they just clearly don't matter anymore they have very loose correlation with how good of a photo does this thing actually take and i think the question

⁠¶ Beyond Chatbots: The AGI Vision

10:48

And the lack of creativity in the field that I'm seeing boils down to AGI is way more than just chat. It's way more than just code. Those just happen to be the first two use cases that we all know work really well for these models. There's so many more useful applications and actually useful base model capabilities that people haven't even started figuring out how to manage.

11:10

And I think the better question to ask now, if you want to do something interesting in the field, is what should I actually run at? Why am I trying to spend more time making this thing slightly better at creative writing? Why am I trying to spend my time trying to make this model X percent better at the International Math Olympiad when there's so much more left to do?

11:32

about what keeps me and the people that really are focused on this agent's vision that we have going is looking to solve way more breadth of problems than what people have done so far.

⁠¶ AGI for Amazon: Universal Teammate

11:44

Okay, that brings me to this topic. I was going to ask it later, but yeah, AGI, you're running the AGI Research Lab at Amazon. I have a lot of questions about what AGI means to Amazon specifically, but I'm curious first for you. What did AGI mean to you when you were at OpenAI helping get GPT off the ground? And what does it mean to you now? Has that definition changed at all for you?

12:06

The OpenAI definition for AGI we had was a system that outperforms humans at economically valuable tasks. And while I think that was an interesting, almost Doomer North Star. back in 2018, I think we've gone so much past that as a field. What gets me excited every day is not how do I replace humans at economically valuable tasks, it's how do I ultimately build towards

12:32

like a universal teammate for every knowledge worker. And just like what keeps me going is the sheer amount of leverage we can give to humans on their time. If we had AI systems that you could ultimately end up delegating. a large chunk of the execution of what you do every day too. And so my definition for AGI, which I think is very attractable and is very much focused on helping people is

12:56

The first most important milestone that would lead me to say we're basically there is a model that can help a human do anything they want to do on a computer. I like that. That's actually more concrete and grounded than a lot of the stuff you hear. It also shows. how different everyone feels about what AGI means. I was just on a press call with Sam Altman for the GPT-5 launch, and he was saying now he thinks of AGI as a model that can self-improve itself. And I guess maybe that's...

13:24

related to what you're saying, but you're grounding it more on the actual use case, it sounds like. Well, the way that I look at it is self-improvement is interesting, but to what end, right? Like, why do we as humans care if the AGI is self-improving itself? Like, I don't really care personally. I think it's cool from a scientist's perspective.

13:44

I think what's more interesting to me is how do I go build the most useful form of this super generalist technology and then be able to put that in everybody's hands? And I think the thing that gives people tremendous leverage is if I... can teach this agent that we're training to handle

14:02

like any useful tasks that i need to get done on my computer because so much of our lives these days is in the digital world right so i think that's like it's very tractable going back to our discussion about benchmarking right the fact that the field cares so much about, you know, MMLU, MLU Pro, you know, humanities last exam, AMC 12, etc. Like, we don't have to live in that box of that's what AGI does for me.

14:31

I think it's way more interesting to look at the box of the space of all useful knowledge worker tasks. How many of them are doable on your machine? How can these agents do them for you? So it's safe to say that for Amazon, AGI means more than shopping for me, which is the cynical joke I was going to make about.

14:49

what AGI means for Amazon. I'd be curious to know when you joined and you were talking to the management team and Andy Jassy and still to this day, how do you guys talk about the strategic value of AGI as you define it for Amazon?

⁠¶ Strategic Value of Agents for Amazon

15:03

broadly because Amazon is a lot of things it's really a constellation of companies that do a lot of different things but this idea kind of cuts across all of that right I think that if you look at it from the perspective of computing, right? So far the building blocks of computing have been

15:23

Can I rent a server somewhere in the cloud? Can I rent some storage? Can I write some code to go hook all these things up and deliver something useful to a person? The building block of computing is changing, right? At this point, the code's written by an AI. Down the line, the actual intelligence and decision making is going to be done by an AI. And so then what happens to your building blocks, right? So in that world, it's super important for Amazon.

15:49

to specifically be good at the agents problem because agents are going to be the atomic building block of computing and when that is true i think so much economic value will be unlocked as a result of that and it really lines up well with the strengths that Amazon already has on the cloud side and putting together ridiculous amounts of infrastructure and all that. I see what you're saying. I think a lot of people listening to this, even people who work in tech,

⁠¶ The Promise and Reality of Agents

16:15

understand conceptually that agents are where the industry is headed. But I would venture to guess that the vast majority of the listeners to this conversation have either never used an agent or have tried one and it doesn't work. I would say that's...

16:28

pretty much the lay of the land right now. I'm not actually sure of like, what would you hold out as this is the best example of an agent? This is the best example of where things are headed and what you can expect. Is there something you can point to?

16:41

So I feel for all the people who have been told over and over again that agents are the future and then they go try the thing and it just doesn't work at all. So let me try to give an example of what the actual promise of agents is relative to how they're pitched to us today.

16:56

Right now, the way that they're pitched to us today is, for the most part, it's just a chatbot with extra steps, right? It's like, you know, Company X doesn't want to put a human customer service rep in front of me, so now I have to go talk to a chatbot and maybe behind the scenes it clicks a button.

17:10

or you've played with a product that does computer use or something like that that is supposed to help me with something on my browser, but in reality, it takes four times as long, and one out of three times it screws up. This is kind of the current landscape of agents. Let's take a concrete example of I want to do a particular drug discovery task where I know.

17:30

there's a receptor that I need to be able to find something that ends up binding to this receptor. If you pull up ChatGPT today and you talk to it about this problem, it's going to go and find all the scientific research and write you a perfectly formatted piece of markdown of what the receptor does and maybe some things you want to try. But that's not an agent. An agent in my book is a model and a system.

17:55

that actually literally you can hook up to your wet lab and it's going to go and use every piece of scientific machinery you have in that lab, read all the literature, propose the right optimal next experiment. run that experiment, see the results, react to that, try again, et cetera, until it's actually achieved the goal for you. And the degree to which that gives you leverage is so, so, so much higher than what the field is currently able to do right now. music music

18:34

Support for this show comes from .tech domains. What's in a name? Quite a lot, actually, especially when you're starting a business. And you probably took the time to craft the perfect name that communicates your business idea clearly. But when it comes to checking the .com, you might find the names already taken, or at the very least, priced like rent in Palo Alto. And sure, you can settle for an odd spelling or extra numbers, but with .tech domain, you don't have to compromise.

19:02

Get the startup name you actually want on .tech. Absolutely no compromises. What's more, when you use .tech, you signal to your customers and investors that you're building tech with just your domain name. So if you've got a name in mind, search for it now with .tech on a trusted platform like GoDaddy or visit get.tech slash decoder to grab it. That's get.tech slash decoder.

19:34

Let's be honest. Are you happy with your job? Like really happy? The unfortunate fact is that a huge number of people can't say yes to that. Far too many of us are stuck in a job we've outgrown, or one we never wanted in the first place. But still, we stick it out, and we give reasons like, what if the next move is even worse?

19:56

I've already put years into this place, and maybe the most common one, isn't everyone kind of miserable at work? But there's a difference between reasons for staying and excuses for not leaving. It's time to get unstuck. It's time for Strawberry.me. They match you with a certified career coach who helps you go from where you are to where you actually want to be. Your coach helps you get clear on your goals, create a plan, build your confidence, and keeps you accountable along the way.

20:29

So don't leave your career to chance. Take action and own your future with a professional coach in your corner. Go to strawberry.me slash unstuck to claim a special offer. That's strawberry.me slash unstuck. Avoiding your unfinished home projects because you're not sure where to start? Thumbtack knows homes.

20:50

So you don't have to. Don't know the difference between matte paint finish and satin? Or what that clunking sound from your dryer is? With Thumbtack, you don't have to be a home pro. You just have to hire one. You can hire top-rated pros. see price estimates, and read reviews all on the app. Download today.

⁠¶ Overcoming LLM Limitations for Agents

21:19

Do you agree, though, that there's an inherent limitation in large language models and decision making and executing things when I see how LLMs even still, you know, the frontier ones still hallucinate? still make things up, confidently lie. It's terrifying to think of putting that technology in a construct where now I'm asking it to go do something in the real world, interact with my bank account, ship code.

21:44

work in a science lab. Like when ChatGPT can't spell right, that doesn't feel like the future that we're going to get. And so I'm wondering, are LLMs it or is there more to be done here? So we started with a topic of how these models are increasingly converging in capability. So while that's true for LLMs, I don't think that's been true to date for agents. And it's because...

22:13

The way that you should train an agent and the way that you train an LLM are actually quite different from each other. So LLMs, as we all know, the bulk of their training happens. from doing next token prediction, right? I've got a giant corpus of every article on the internet. Let me try to predict the next word. And if I get the next word right, then I get a positive reward. And if I get it wrong, then I'm penalized, right?

22:36

In reality, what's actually happening, in the field we call it behavioral cloning or imitation learning. It's the same thing as cargo culting. right? The LLM never learns why the next word is the right answer. All it learns is that when I see something that is

22:53

similar to the previous set of words, I should go say this particular next word. The issue with this is this is great for chat. This is great for creative use cases, right? Where you want some of the chaos and randomness from hallucinations. But if you want it to be an actual successful decision, making agent these models need to learn the true causal mechanism right it's not you know just cloning human behavior it's actually learning if i do x the consequence of it is y and so

23:20

The question is, how do we train agents to be able to learn the consequences of its actions? And the answer obviously cannot be just doing more behavioral cloning and copying text, right? It has to be something that looks like. actual trial and error in the real world.

⁠¶ Amazon's Self-Play Agent Training

23:37

That's basically the research roadmap for what we're doing in my group at Amazon. My friend Andre Karpathy has a really good analogy here, which is, you know, imagine you have to train an agent to go play tennis, right? You wouldn't have it spend 99% of its time. watching YouTube videos of tennis and then 1% of its time actually playing tennis, you would have something that's far more balanced between these two things. So what we're doing in our lab here at Amazon

24:01

is we're actually doing large-scale self-play. And so if you remember the concept of self-play, what it was was technique that really DeepMind made popular in the mid-2010s when they beat humans at playing Go. For playing Go, what they did was they spun up a bajillion simulated Go.

24:21

environments, right? And then they have the model play itself over and over and over again. Every time they found a strategy that was better at beating a previous version of itself, it would effectively get positive reward via reinforcement learning to go do more of that strategy in the future.

24:35

spent a lot of compute on this in the Go simulator and actually discovered superhuman strategies for how to play Go and then ended up, when they played the world champion, making moves that no human had ever seen before. and contributed to like the state of the art of that whole field. What we're doing is rather than doing more behavioral cloning or watching YouTube videos, what we're doing is we're creating a giant set of RL gems.

24:59

And each one of these gyms, for example, is an environment that a knowledge worker might be working in to get something useful done. So here's a version of something that's like Salesforce. Here's a version of something that's like an ERP. Here's a CAD program. Here's electronic. medical record system, here's accounting software, here's, you know, every interesting domain of possible knowledge work is now a simulator. And now instead of training an LLM just to...

25:25

do tech stuff. We have the model actually propose a goal in every single one of these different simulators, try solving that problem, figure out if it's successfully solved it or not, and then get reward and feedback based on, you know, oh, did I... do the depreciation correctly? Or did I correctly make this part in CAD? Or did I successfully book the flight to choose a consumer analogy?

25:47

Every time it does this, it actually learns the consequences of its actions. And we believe that this is one of the big missing pieces left for actual AGI. And we're really scaling this recipe up at Amazon right now.

⁠¶ Uniqueness of Amazon's Agent Approach

26:01

How unique is this approach in the industry right now? Do you think the other labs are onto this as well? If you're talking about it, I would assume so. I think that what's interesting is this field, ultimately... You have to be able to do something like this, in my opinion, to be able to get beyond the fact that there's a limited amount of free-floating data on the internet that you can train your models on.

26:24

The thing we're doing at Amazon is because this came from what we did at Adapt and Adapt has been doing agents for so long, we just care about this problem way more than everybody else. And I think I've made a lot of progress towards this goal. You called these gems, and I was thinking physical gems for a second. Does this become physical gems?

26:42

You have a background in robotics, right? I've also done robotics work before. Here we also have Peter Beal, who came from Kliperian and is a Berkeley professor that basically created, or his students ended up creating, the majority of the RL algorithms that work well today.

26:56

It's funny that you say gyms because we were trying to find an internal codename for the effort. We kicked around Equinox and Barry's Bootcamp and all this stuff. And I'm not sure everybody had the same sense of humor, but... But we call them gyms, actually, because at OpenAI, we had a very useful early project called OpenAI Gym. And what it was was this was... way before LLMs were a thing. And what OpenAI Gym did was that was a collection of video game tasks.

27:26

and robotics tasks like can you balance a pole that's on a on a cart and can you train an rl algorithm that can that can keep that thing perfectly centered etc what we were inspired to do is now that these models are smart enough Why have toy tasks like that? Why not put in the actual useful tasks that humans do on their computer into these gyms and have the models learn from these environments? And I don't see why this wouldn't also generalize to robotics.

⁠¶ Agents Framework and AWS Deployment

27:50

And is the end state of this an agent's framework system that gets deployed through AWS? The end state of all this is a model plus a system. That is like rock solid, reliable, like 99% reliable at all sorts of valuable knowledge work tasks that are done on a computer. And this is going to be something that we think is going to. be a service on AWS that's going to underpin effectively so many useful applications in the future.

⁠¶ Browser Agents and Product Form Factors

28:21

I did a recent episode with Arvin, the CEO of Perplexity, and his Comet browser. A lot of people on the consumer side think that the browser interface is actually going to be the way to get to agents at scale on the consumer side. I'm curious what you think of that, this idea that it's not enough to just have a chatbot. You really need to have ChatGPT or whatever model sit next to your browser, look at the webpage, act on it for you.

28:49

learn from that. Is that where all this is headed on the consumer side? I think chatbots are definitely not the long-term answer, or at least not chatbots in the way we think about it today. if you want to build systems that take actions for you. The best analogy I have for this is, so my dad is a very well-intentioned, smart guy, spent a lot of his career working in a factory, and he calls me all the time for tech support help.

29:14

He's like, David, something's wrong with my iPad. You got to help me with this. And we're just doing this over the phone. And I can't see what's on the screen for him. And so I'm trying to figure out, oh, you have the settings menu open. Have you clicked on this thing yet? What's going on with this toggle? Chat is such a low bandwidth interface. That is the chat experience for trying to get actions done with a very competent human on the other side trying to handle things for you.

29:37

So one of the big missing pieces, in my opinion, right now in AI is... our lack of creativity with product form factors, frankly, right? We're so used to thinking that the right interface between humans and AIs is this like perpendicular one-on-one interaction where I'm delegating something or maybe giving me some news back.

29:57

or I'm asking you a question, et cetera. One of the real things we've always missed is this parallel interaction where both the user and the AI actually have a shared canvas that they're jointly collaborating on. I think if you really think about building a teammate for knowledge workers or even just the world's smartest personal assistant, you would want to live in a world where there's actually a shared collaborative canvas for the two of you.

⁠¶ Amazon AGI Lab's Independence and Impact

30:21

Speaking of collaboration, I'm really curious how your team works with the rest of Amazon. Are you pretty walled off from everything? Do you work on Nova, Amazon's foundational model? How do you interact with the rest of Amazon?

30:35

What Amazon's done a great job with for what we're doing here is we're allowed to run pretty independently. And I think there's a recognition that uh some of the startup dna right now is really valuable for maximum speed if you believe agi right is two to five years away some people are getting more bullish some people are getting more bearish it doesn't matter that's not a lot of time in the grand

30:58

scheme of things, you need to move really, really fast. So we've been given a lot of independence. We've also taken the tech stack that we've built and contributed a lot of that upstream to the Nova Foundation model as well. So does your work, for example, is it already impacting Alexa Plus or is that not something that you're part of in any way?

31:19

That's a good question. So Alexa Plus has the ability to, for example, if your toilet breaks, it's like, oh man, I really need a plumber. Alexa, can you get me a plumber? Then what happens is Alexa Plus spins up a... a remote browser powered by our technology, basically, that then goes and uses Thumbtack like a human to go get you a plumber to your house, which I think is really cool. It's the first production web agent that's been shipped, if I remember correctly.

31:46

Yeah. And, you know, the early reception to Alexa Plus has been that it's a dramatically... for alexa but still brittle there's still moments where it's not reliable and i'm wondering is this the real gym is this the at scale gym where alexa plus is how Your system gets more reliable much faster. You have to have this in production and deployed to, I mean, Alexa has millions and millions of devices that it's on. Is that the strategy or?

32:14

Because I'm sure you've seen there's the early reactions to Alexa Plus are it's better, but still not as reliable as people would like it to be.

⁠¶ Internal Agent Adoption and Nova Act

32:23

Alexa Plus is just one of many customers that we have. And what's really interesting about being within Amazon is... Going back to what we were talking about earlier, web data is effectively running out, and it's not useful for training agents. What's actually useful for training agents is lots and lots of environments and lots and lots of people doing reliable multi-step.

32:43

workflows. And so the interesting thing at Amazon is that in addition to Alexa Plus, basically every Fortune 500 business's operations are represented in some way by some internal Amazon team, right? Like there's one medical, there's everything happening on supply chain. and procurement on the retail side. There's all this developer-facing stuff on AWS.

33:02

And agents are going to require a lot of private data and private environments to be trained. And because we're in Amazon, that's all now 1P. So there's just one of many different ways in which we can get reliable workflow data to train the smarter agent. Are you doing this already through Amazon's logistics operations where you can do stuff in warehouses or the robotic stuff that Amazon is working on? Does that intersect with your work already?

33:28

Well, we're really close to Peter Beal's group on the robotics side, which is awesome. Some of the other areas, we have a big push for internal adoption of agents within Amazon. And so a lot of those conversations or engagements are happening. I'm glad you brought that up. I was going to ask.

33:43

How are agents being used inside Amazon today? Again, as we were saying earlier, because Amazon sort of has an internal effort for almost every useful domain of knowledge work, there has been, you know... a lot of enthusiasm to pick up a lot of these systems and we have this internal channel called uh actually i won't tell you what it's called but you know code name dash interest which is related to the product that that we've been building

34:10

And it's just been crazy to see teams from all over the world within Amazon, actually, because one of the main bottlenecks we've had. is we didn't actually have availability outside of the US for quite a while. And it was crazy just how many international Amazon teams wanted to start picking this up and then using it themselves on various operations tasks that they had. This is your just agent framework that you're talking about. This is something you haven't released publicly yet.

34:34

We released Nova Act, which was the research preview that came out in March. But as you can imagine, we've added way more capabilities since then, and it's been really cool. The thing we always do is we first dog food with internal teams. Yeah, your colleague, when you guys released Nova Act, said it was the most effortless way to build agents that can reliably use browsers. Since you've put that out, how are people using Nova Act? It's not something that...

35:00

you know, in my day-to-day I hear about, but I assume companies are using it. And I'd be curious to hear what the feedback is that you guys have gotten since you put it out. Yeah, so a wide range of enterprises and developers are using Nova Act. And the reason why it's not something that you hear about is because we're not a consumer product.

35:18

If anything, the whole Amazon agent strategy, including what I did before at Adept, is sort of doing norm core agents, not the super sexy stuff that works one out of three times, but super reliable, low level workflows that work 99 plus percent of the time.

35:38

that's the target. Since Nova Act came out, we've actually had a bunch of different enterprises end up deploying with us where they're seeing 95 plus percent reliability, which is, as I'm sure you've seen from the coverage of, you know, other agent products out there is like

35:54

a material step up from the average 60% level of reliability that folks see with those systems. And I think that reliability bottleneck is why you don't see as much agent adoption overall in the field. And we've been having a lot of really good luck specifically. by focusing.

36:09

extreme amounts of effort on reliability so we're now used for things like for example doctor and nurse registrations right or we have another customer called nivan which is formerly trip actions which uses us basically to automate a lot of of backend bookings for travel for their customers. We've got companies that basically have like 93 step QA workflows that they've automated with a single act script, et cetera. So I think the early progress has been really cool. Now what's up ahead.

⁠¶ Future of AI: The Agent S-Curve

36:38

is how do we do this extreme large-scale self-play on a bajillion gyms to get to something where there's a bit of a GPT for RL agents moment and we're running as fast as we can towards that right now. Do you have a line of sight to that? Do you think we're two years from that? One year?

36:55

Honestly, I think we're sub one year. We have line of sight. We've built out teams for every step of that particular problem. And things are just starting to work. It's just really fun to go to work every day. And you realize that one of the teams has made a little...

37:08

Very useful breakthrough that particular day. And the whole cycle that we're doing for this training loop seems to be going a little bit faster every day. Going back to GPT-5, people have said, you know, does this portend to slow down in AI progress? 100% I think the answer is no, because when one S-curve peters out, right, the first one being pre-training, which I don't think is petered out, by the way, either, but it's definitely at this point, like...

37:34

less easy to get gains than before. And then you've got RL with verifiable rewards. But then every time one of these S-curves seems to slow down a little bit, there's another one coming up. And I think Agents is the next S-curve and the specific training recipe we were talking about earlier. is one of the main ways of getting that next giant amount of acceleration.

38:06

This month on Explain It To Me, we're talking about all things wellness. We spend nearly $2 trillion on things that are supposed to make us well. Collagen smoothies and cold plunges, Pilates classes and fitness trackers. But... What does it actually mean to be well? Why do we want that so badly? And is all this money really making us healthier and happier? That's this month on Explain It To Me, presented by Pure Leaf.

38:37

Hey, everybody. It's Andy Roddick, host of Serve Podcast for your fix on all things tennis. The U.S. Open's coming up, and we're covering it on our show. Can someone knock off Al Garazin Center? Can Coco Gauff win her second U.S. Open title? Can Sviatek win her second Grand Slam title in a row? Can Sabalenka break through and win her Grand Slam in 2025? You can watch our coverage of the US Open on YouTube or listen wherever you get your podcasts. Brought to you in part by Amazon Prime.

39:07

Support for the show comes from Mercury. What if banking did more? Because to you it's more than an invoice. It's your hard work becoming revenue. It's more than a wire. It's payroll for your team. It's more than a deposit. It's landing your fundraise. The truth is, banking can do more. to do more for their business. Banking that does more.

⁠¶ Amazon's Leapfrog Strategy in AI

39:54

It sounds like you and your colleagues have identified the next turn that the industry is going to have. And that starts to put Nova as it exists today into more context for me because. Nova as an LLM, it's not an industry-leading LLM. I mean, it's not in the same conversation as Claude or GPT-5 or what have you or Gemini. Is Nova...

40:21

just not as important because what's really coming is what you're talking about with agents and that will make Nova more relevant? Or is it important that Nova is the best LLM in the world as well? Or is that not the right way to think about it? The right way to think about it is that every time you have a new upstart lab trying to join the frontier of the AI game, you need to bet on something that can really leapfrog, right?

40:47

What's interesting is every time there's a recipe change for how these models are trained, it creates a giant window of opportunity for someone new who's starting to come to the table with that new recipe. instead of trying to catch up on all the old recipes. Because the old recipes are actually baggage for the incumbents. So to give some examples of this, at OpenAI, of course, we basically pioneered...

41:13

giant models, right? The whole LLM thing came out of GPT-2 and then 3, of course. But those LLMs initially... They were text-only training recipes. And then we discovered RLHF, and then they started getting a lot of human data via RLHF. But then in the switch to multimodal...

41:29

multimodal input, you kind of have to throw away a lot of the optimizations you did in the text-only world, and that gives time for other people to catch up. I think that was actually part of how Gemini was able to catch up, was that they bet on certain interesting ideas on native multimodal that turned out well for them, right? But then after that, with reasoning models, right, they gave another opportunity for people to catch up. That's why DeepSeek was able to surprise the world because they.

41:52

straight quantum tunnel to that instead of doing every stop along the way and i think with the next turn being agents especially agents without verifiable rewards if at amazon we can figure that recipe out earlier, faster, better than everybody else. With all the scale that we have as a company, it basically brings us to the frontier at that point. I haven't heard that articulated from Amazon before. That's really interesting. It makes a lot of sense. Let's end on...

⁠¶ AI Talent Market and Reverse Acquihires

42:19

the state of the talent market and startups and how you came to Amazon, actually, I want to go back to and adapt. So adapt when you started it. Was it the first startup really focusing on agents at the time? I don't think I had heard of agents until I saw Adept. Yeah, actually, we were the first startup focusing on agents because when we were starting Adept, we saw that LLMs were really good at talking but could not take action. And I could not imagine a world in which that was not.

42:49

crucial problem to be solved so we got everybody focused on solving that but when we got started the word agent as a product category wasn't even coined yet. So we were trying to find a good term. And we started, we played with things like large action models and action transformer. So our first product was called action transformer. And then only after that, did agents really start picking it up as being the term.

43:11

Walk me through the decision to leave it behind and join Amazon with most of the technical team. Is that right? I have a phrase for this. It's a deal structure that has become common now with big tech and startups and AI, the reverse acquihire. We're basically... The core team like yourself and your co-founders, they join the rest of the company still exists. But, you know, the technical team goes away and the acquirer, quote unquote, I know it's not an acquisition, but.

43:39

pays a licensing fee or something and shareholders make money. But the startup is then kind of left to figure things out without its founding team in most cases. You know, the most recent example is Windsurf and Google. And then there was scale AI and meta before that. This is a topic we've been talking about on Decoder a lot. The listeners are familiar with it. But you were one of the first of these such reverse acquihires. Walk me through.

44:02

when you decided to join Amazon and why? So I hope in 50 years, I'm remembered more as being an AI research innovator rather than a deal structure innovator. Well, first off. humanity's demand for intelligence, right, is way, way, way higher than the amount of supply. And so therefore, for us as a field to invest ridiculous amounts of money in...

44:25

building the world's biggest clusters and bringing the best talent together to drive those clusters is actually perfectly rational, right? Because if you can spend, you know, an extra X dollars to build a model that has plus 10 IQ points.

44:38

and can solve like a giant new concentric circle of useful tasks for humanity, that is a worthwhile trade that you should do any day of the week. And so I think it makes a lot of sense that all these companies are trying to put together critical mass on both talent and compute right now.

44:52

And from my perspective, for why join Amazon, it's because Amazon... knows how important it is to win on the agent side in particular and that agents are a crucial bet for amazon to really build one of the best frontier labs possible and To get to the level of scale, you're hearing all these CapEx numbers from the various hyperscalers. It's just completely mind-boggling, and it's all real. It's over $340 billion in CapEx this year alone, I think, from just...

45:22

the top hyperscalers. Yeah, it's an obscene number. That sounds about right. And Adept, you know, we raised 450 million, which at the time was a very large number. And then today is... It's chump change now. It's chump change. That's one researcher. Come on, David. That's one researcher, right?

45:40

It's one employee. So if that's the world that you live in, it's really important, I think, for us to partner with someone who's going to go fight all the way to the end. And that's why we came to Amazon. Did you foresee that consolidation and those numbers going up when you did?

45:55

The deal with Amazon, you knew that it was going to just keep getting more expensive, not only on compute, but on talent. Yes. And why? What did you see coming that at the time it was not obvious to everyone? Two things I saw coming. One. If you want to be at the frontier of intelligence, you have to be at the frontier of compute. And if you're not on the frontier of compute, then you have to pivot and go do something that is totally different.

46:21

And my whole career, all I want to do is build the smartest and most useful AI systems. So the idea of turning Adept into an enterprise company that only sells small models or turns into a place that does forward-deployed engineering to go help you deploy an agent on top of someone else's model, none of those things appeal to me. I want to figure out, here are the four crucial remaining research problems left to AGI.

46:46

How do we nail them? Every single one of them is going to require two-digit billion-dollar clusters to go run at. So how else am I going to be able to have and this whole team that I've put together who all are motivated by the same thing. How are we going to have the opportunity to go do that? If antitrust scrutiny did not exist for big tech like it does, would Amazon have just acquired the company completely?

47:07

I can't speak to general motivations and deal structuring again. I'm an AI research innovator. You know I have to ask. Well, you know, OK, well, then maybe you can answer this. What are the second order effects of these deals that are happening? And I think will continue to happen. What are the second order effects on the research community, on the startup community?

47:30

I think it changes the calculus for someone joining a startup these days, knowing that these kind of deals happen and can happen and take away the founder or the founding team that you decided to join and bet your career on. That is a shift. That is a new thing for Silicon Valley in the last couple of years. There's two things I want to talk about. One is, honestly, the founder plays a really important role.

47:56

The founder has to want to really take care of the team and make sure that everybody is treated pro rata and equally, right? The second thing is It's very counterintuitive in AI right now because there's only a small number of people with a lot of experience. And because the next couple of years is going to move so fast.

48:19

And a lot of the value, the market positioning, et cetera, is going to be decided in the next couple of years. If you're sitting there responsible for one of these labs and you want to make sure that you have the best possible AI systems. you need to hire people who know what they're doing. And so there's the market demand, the pricing for these people is actually totally rational just solely because of how few of them there are.

48:43

But the counterintuitive thing is that it doesn't take that many years, actually, to find yourself at the frontier if you're a junior person. Some of the best people in the field were people who just started three or four years ago. And by working with the right people, focusing the right problems, there's like working really, really, really hard. They found themselves at the frontier. Like AI research is one of those areas where if you ask four or five questions.

49:06

you already discovered a problem that nobody has the answer to. And then you can just focus on that and focus on how do I become the world expert in this particular subdomain. And so... I find that really counterintuitive that there's only very few people who really know what they're doing. And yet it's very easy in terms of the number of years to become someone who knows what they're doing. How many people actually know what they're doing in the world?

49:28

From your definition. This is a question I get asked a lot. I was literally just asked this on TV this morning. How many people are there? I think it depends on how generous or tight you want to be. Who can actually build and conceptualize. training a frontier model holistically? The number of people who can, who I would trust with a giant dollar amount of compute to go do that is probably sub 150.

49:56

150. Yes, but there are many more people, let's say another 500 people or so, that would be extremely valuable contributors to an effort that was populated by... a certain critical mass of that 150 that really know what they're doing. But the total market, that's still less than a thousand people. I'd say it's probably less than a thousand people. But again, I don't want to trivialize.

50:21

I think junior talent is extremely important. And people who come from other domains like physics or quant finance or, you know, have just been doing undergrad research.

50:33

These people make a massive difference really, really, really fast. But you want to surround them with a couple of folks who have already learned all the lessons from previous training attempts in the past. Does the fact that these... this already very small group of elite people, does the fact that they're building something that inherently is designed to replace them, maybe you disagree with that, but I think super intelligence conceptually would make some of this.

51:00

redundant? Does it mean there's actually fewer of them in the future making more money because you only need, you know, some orchestrators of other models to build more models or does the field expand? Do you think it's going to become thousands and thousands of people? The field's definitely going to expand. There's going to be more and more people who really learn the tricks that the field has developed so far and discover the next set of tricks and breakthroughs.

51:27

But I think one of the dynamics that's going to keep the field smaller than other fields like software is that unlike regular software engineering, Foundation model training breaks so many of the rules that we think we should have. In software, let's say our job here is to build Microsoft Word.

51:48

I can say, hey, Alex, it's your job to make the save feature work. It's David's job to make sure that cloud storage works and someone else's job to make sure the UI looks good. You can factorize these problems pretty independently from each other.

52:01

The issue with foundation model training is that every decision you take interferes with every other decision because there's only one deliverable at the end the durable at the end is your frontier model it's like one giant bag of weights right so what i do in pre-training what this other person does in supervised fine-tuning this other person does an rl and this other person does to make the model run

52:19

fast they all interact with each other in sometimes pretty unpredictable ways so it has one of the worst diseconomies of scale with number of people of anything i've ever seen except maybe even sports teams right maybe that's the one that's the one other case where you don't want to have like 100 mid-level people you want to have 10 of the best right um and because of that the number of people who are going to have a seat at the table at some of the

52:42

best funded efforts in the world, I think is actually going to be somewhat capped. Oh, so you think the elite elite stays relatively where it is, but the field around it, the people that support, the people that are very meaningful contributors expands. I think people who know how to do super meaningful work will definitely expand, but it will be still a little constrained by the fact that you cannot have too many people on any one of these projects at once.

53:08

What advice would you give someone who's either evaluating joining an AI startup or a lab or even an operation like yours in big tech on AI and their career path, how they should be thinking about? navigating the next couple of years with all this change that we've been talking about? First off, tiny teams with lots of compute

53:30

is the correct recipe for building a frontier lab. That's what we're doing at Amazon with SFN and my team. It's really important that you have the opportunity to run your research ideas in a particular environment. If you go somewhere that already has 3,000 people. you're not really going to have a chance. There's so many senior people ahead that are all too ready to try their particular ideas. The second thing is I think people underestimate the co-design of like the product.

53:58

and the user interface and the model. I think that's going to be the most important game that people are going to play in the next couple years. And so going somewhere that actually is very strong product sense. And a vision for how users are actually going to deeply embed this into their own lives is going to be really important. And one of the best ways to tell is

54:16

Are you just building another chatbot? Are you just trying to fight one more entrant in the coding assistant space? Those just happen to be two of the earliest product form factors that have product market fit and are growing like crazy. I bet when we fast forward five years and we look back on this period, there will be six to seven more of these crucial product form factors that will look obvious in hindsight, but no one's really solved today.

54:43

And if you really want to take an asymmetrical upside bet, I would try to spend some time and figure out what those are now. Spend some time in the gym. Thanks, David. I'll let you get back to your gyms. Cool. Nice. Thanks, guys. This was really fun. Thanks again to David Luan for joining the show and thank you for tuning in. If you'd like to let us know what you thought about this episode or what else you'd like us to cover, drop us a line. You can email us at decoderatheverge.com.

55:10

We also have a TikTok and an Instagram. Check those out at DecoderPod. If you like Decoder, please share it with your friends and subscribe wherever you get your podcasts. And if you haven't already, don't forget to subscribe to The Verge, which gets you access to all of our stories and newsletters, including the one I author called Command Line. Decoder is a production of The Verge and is part of the Vox Media Podcast Network.

55:31

Our producers are Kate Cox and Nick Stat. Our editor is Ursa Wright. The Decoder music is by Breakmaster Cylinder. See you next time.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript