Agent Wars: The Hype, Hope, and Hidden Risks with Nate B. Jones

⁠¶ Welcome and Guest Introduction

00:06

Welcome and thank you for joining us here today on AI Explained for Fiddler AI. I'm Josh Rubin. I'm the head of AI science at Fiddler. And I'm going to be the host today. I am super pumped to have Nate join us. I've been enjoying his content. for for several months and i think he's just incredibly insightful i think he's one of the most thoughtful voices today on how enterprises are navigating agentic ai his stuff ranges from you know high level advice for organizations to some really

00:39

insightful technical ideas about things like prompting and agentic application architecture. So just thrilled to be able to have him here and bounce some questions off of him. The topic today is agent wars, the hype, the hope, hidden risks. We'll talk about how agent adoption actually stands.

01:06

how architectural decisions are playing out, what it takes to build agents that hold up in production, how your companies can get value out of that. So we've got about 35 minutes just to deep dive and riff with Nate around some of these topics.

01:18

There's 25 minutes after that for your live questions. So go ahead and put those in the chat somewhere. There's a box where you can enter your questions. We'll get to those at the end. We're also going to record and send the session to all of the attendees.

⁠¶ Defining AI Agents and Core Elements

01:33

I don't know. Let's kick things off. So maybe we just start out by talking about, you know, like, what are agents? What is an agentic application anyway? A friend of mine wants to know. How do you think? I'll give you the usual sort of one sentence answer. And you and I both know we could really open that box up and go for two and a half hours. But I usually say an agent is a large language model, plus tools, plus guidance.

02:06

And if you have those three things together, you have the core ingredients for an agent. That's great. I like the succinctness there. You know, I've heard folks, there was one technical contributor from Anthropic that I heard recently who, you know, his take was looping was important. Like he said, and I presume he also means sort of.

02:29

conditional branching like maybe it doesn't always do the same thing every time um i thought that was one of the really complex things about agents right is that depending on the way you've architected the system You may be in a position where you want the consistency, but you want the consistency within a policy guide set.

02:51

deliberate creativity where you don't want the consistency. You want a creative response from the agent because the agent's job is to come up with blog post ideas, right? I don't know. And that's part of what makes it hard to talk about. Do you think that reasoning is an essential component? Also, I hate to take your succinct version and throw all these other pieces in it, but it seems like a lot of these applications involve some sort of planning or reasoning.

03:18

And now I'm second-guessing myself now because I'm thinking through customer examples. It does seem like a kind of a broad... scope of things that can happen. I think that's implicit in tool use. You would need some kind of inference compute to do tool use. Okay. Yeah, I think that's fair. Cool.

⁠¶ Current State of Agent Adoption

03:41

I don't know. So, you know, what are you hearing? Like, where are we in terms of organizations adopting these things today? Like, are you seeing the whole spectrum? That's a good question. I feel like, if I'm being really honest, We are in a little bit of, you know, the famous adoption curve. We're in a little bit of the trough of disillusionment with agents when I talk to companies right now.

04:12

The hype was really big. I know Jensen Huang started everybody off in 2025 with his big speech. This is the year of AI agents. Venture capital firms have been big on this being the year of AI agents. We've had startups launching agents left and right. The last major agent launch. noticed was this morning. Lovable launched agent mode in their system. And so it's not that we're short of software jumps. It's not that we're short of hype for it.

04:38

It's that successful implementation of agents takes a tremendous amount of skill and work in current enterprise workflows. It's not plug and play yet. um unless it's a very simple use case and then you know there there are some plug-and-play tools for that but that s curve where like you see how easeful it is to get into the idea you see how fun it looks in the in the in the

05:06

video demo that you see on LinkedIn or whatever. And then you hit the S curve of, oh, wow, this is really hard. This is not easy at all. I think that's where a lot of people are at right now. And they don't necessarily have the tools and they don't necessarily have the skill sets to know how do you get through that S-curve to the point where you actually have mature adoption and that organizations that are able to get there.

05:29

Well, they're actually realizing ROI. And they're the reason why everybody else is chasing this. But it's not easy right now. Interesting, interesting.

⁠¶ Challenges and Unrealistic Expectations

05:41

So I'm looking right now at our, so we just world a poll. I guess everybody out there has voted on, but where is your team in the agentic adoption journey? So there's a lot of just exploring. That's the dominant answer. And there's some prototyping and experimenting. So this is super surprising. We have, you know, 6% are running in production. I wonder if you feel like...

06:03

I think one kind of failure mode is assuming everything is vibe coding and that just works. Because I think in a way we've all been dazzled by how dynamic generative AI is when you sit down with a chat GPT and have a conversation. You know, how much of that, like, the failure mode is sort of unrealistic expectation versus just lack of experience? across everybody with building applications like this versus projects that are not fully thought out or spec'd out well in terms of what kind of

06:43

ROI you're looking for and what the application is supposed to ultimately be and do, like how to measure its success versus not having that? Where do you think, is it all of those things?

⁠¶ AI's Workflow Impact and Difficulties

06:54

you know what's interesting is i kind of go back to our own mental models of work and creativity pre-ai the blank page is notoriously a really hard place to be if you're a writer the blank whiteboard is a hard place to be if you're an engineer um you have to architect the whole system from scratch if you don't have a blank whiteboard as an engineer that's sometimes even worse because like i've been in the position where you have the

07:19

incredibly outdated stack and you like fill the whiteboard with it and now you have to add this new feature and now what do you do right and again you're sort of facing this blank space where you have to architect something what ai does is it reverses the complexity It's so easy to get started now. The blank page is not a problem anymore. It will give you an idea. And people think because that was the hard part before that the rest of it will be easy. And in fact.

07:47

it is still quite difficult to get to done. And part of why I am bullish on the value of technical skills in the AI age is that As much as I see tremendous progress on AI intelligence solving some of that initial get work started piece, AI intelligence helping us to come up with wider ranges of ideas, helping us to do some of our data transfer.

08:14

more broadly in ways we we just didn't have time for before you know pulling out contracts technical details and giving them to the engineering team things like that we I am not seeing the intelligence gains translate into reliably getting stuff done as easily as we get started. That is a place that seems like a wicked problem.

⁠¶ Agent Behavior and Essential Guardrails

08:37

Yeah, yeah, yeah. I think that's a really, really thoughtful point. I like the idea of really inverting the whiteboard, right? Like I think the thing that we see with generative AI is that it kind of fills whatever glass you put it in. Unlike staring at the empty glass in the old days of software engineering where you had to put every piece in there, you kind of, you know, you give it a container and it will do its best to fill the container unless you...

09:05

figure out how to properly constrain the system to do exactly the right things. We were joking the other day. We just had our hackathon two weeks ago. We do this kind of like once a quarter. And I was talking to one of my colleagues and he was talking about how basically he had messed up in the implementation and forgotten to wire the...

09:27

you know, the LLM to the data source. So like, you know, it was a full software failure where no information was going from the database into the LLM. You know, and the LLM was just riffing with it. Like it was, you know, from... Unless you knew the topics it was supposed to be talking about, it was happy to just kind of fill the gaps and pretend like it was getting information and make up answers as though it had received information. There was no...

09:52

You know, no exception thrown anywhere. You know, the software was correct. But, you know, as we were saying, like the agent kind of just filled in the space with what it thought it was supposed to do.

10:05

You know when without kind of the right guardrails there or the right constraints on how the agent is supposed to behave You know, it could be some time before you realize I mean that was kind of an egregious kind of a problem it was sort of ultimately pretty easy to catch but but you could imagine for a system that was complicated that had many components many data sources you know things that would have typically caused

10:30

an error in a traditional software workflow. You know, it might just look like a system that's underperforming. Or occasionally in some weird corner cases, it produces some sort of answer or... takes an action that's, you know, not at all intended. So I think that's sort of scary and really emphasizes this kind of headspace change we need to make.

⁠¶ AI Inference Versus Hard Knowledge

10:57

um in order to get things working um i think one of the biggest questions in business right now that that's a human question is asking ourselves Where do we need AI to infer and where do we need AI to know? Because those are two very different things. And we're not very good right now at even instructing AI to follow one of those two pathways, let alone architecting systems that expect inference in particular context or expect...

11:27

hard knowledge in other contexts. Yeah, yeah, totally. It looks like we have a request from out there that it sounds like your voice is a little bit low. My voice is a little bit low. Do I get close to the mic? Is that working better? I think that's the question that you had. Happy to get closer to the mic. Love it. Okay, awesome. So...

⁠¶ Architecting Successful AI Agent Systems

11:51

I don't know. There are a lot of different, as we've said, there's a lot of different architectural approaches, lots of different ways you can build these things. Say I'm an organization, I'm an enterprise, and I'm trying to solve some sort of problem. you know, what, what's the kind of list of tools and how do I start thinking about getting the right experience even like, like what's, what's, what's the on-ramp look like to you? Like what's the, if you had to do a Nate Jones kind of.

12:19

Simple recipe for success. Starting from Greenfield. What does that look like for you? It starts with problem framing, I think. I think so much of where agents tend to go wrong is you do have to give them that guidance. You have to give them that constraint and that clarity.

12:36

But if the business problem doesn't have constraints and clarity, you're not going to get that for the agent. That's not happening. And so I challenge folks that I talk with to ask themselves if they really know what they want solved. And if it does get solved, is it going to be worth the effort it will take to install an agent? Because agents generate disproportionate payoff, but they require disproportionate investment at the moment to get to done.

13:05

And that disproportionality is bigger for organizations that are new to agents, which so many are these days. And so if you've never done one before, you're learning. enough about AI to build an agent at the same time as you're building your first agent. And you have to factor all of that into the ROI calculation. And so I say, is this worth it to you? Do you have the impact assessment where you know there's a 10x yield on this?

13:32

that would give you real value and if you do that's great and if you constrain the business problem that's a step forward and then we say or what i what i suggest is you look at the data piece first let's let's like leave ai to the next step

13:46

you have to understand the kind of data and business operation that you're trying to run here. What are the business rules that you're using? How is the data currently encoded? Is this unstructured data? Is this structured data? What does it look like as it moves through the business? What are the decisions that are made against it? the data tell me the story of the data and if you really understand that a lot of the architectural decisions fall out of that data story

14:11

Because you can then say, well, semantic schemas with RAG would work well here, or, wow, that's a really terrible idea. You don't want to semantically associate with this because of the kinds of questions you're expecting to ask. And people often start with that.

14:25

architectural question too early. And they say, well, should we use a rag? And I say, I don't know. What's your data? Tell me your data story. And then we'll get to whether you should use a rag and how an AI agent interacts with that rag, et cetera, et cetera.

⁠¶ Learning to Work with AI

14:38

Yeah, that's super interesting. I think it's interesting that we're at a point where basically no one has a lot of experience, right? We're sort of as an industry. if not a civilization, sort of learning a new way of thinking about interacting with machines. You know, if you, from the kind of enterprise perspective, like...

15:07

I don't know if you, you may have last week when we chatted mentioned kind of a crawl, walk, run strategy. If that's not your perspective, I'm lifting it from somebody else. No, I think I do. What do you think about, does that de-risk some of those decisions also? Is that another way to think about it in terms of getting experience and getting real traction on implementation before you're overcommitted?

15:31

I think that that principle really does hold in the world of AI. And I think one of the pieces that... maybe is a little different from the traditional crawl walk run of management theory is that because ai is so experiential because you have to learn it for yourself in your organization in your context to do it well

15:51

you have to really lean much harder on getting through the crawl phase to get anywhere else. It is harder to get through the crawl phase with AI, and the payoff is bigger if you can get to the walk and the run. And so... I find in practice, a lot of organizations underestimate how much challenge there is in properly doing that initial prototyping. And they often.

16:13

they often try to shortcut it they often try to say well it's it's really it's gonna be fine we're just gonna start building the system um and you can do that but you know your batting average goes down it's a much higher risk proposition and so i i challenge people to think of it as less a feature that you're bolting on to a product in your business and more a cultural change effort first where you are adding an intelligence layer to the business and you have to think about

16:43

all the change components that go with that and rewire the business accordingly and if you're committed at that level to change you're much more likely to realize value yeah i think that's a super point um What about measure? Well, so I guess my feeling there is that, you know, more than most technologies that I've experienced so far, what do I mean to say here?

17:11

If you start with something simple and it's not solid, if you don't feel like this simple thing is robust, you really run into trouble when you start adding things on top of it.

17:24

like i think you mentioned earlier like it can get out of hand very fast you know some combination some of that is experience some of it is um sort of uh You know, the structure of what you build, the complexity level of the thing you build, I think some of it is sort of instrumentation and what tools, and we can get into this sort of stuff later about what it takes to run these things in production.

17:50

reliably and with the right telemetry in place. But, you know, I think all of those things fold together where if you don't have a lot of clarity around each of those things, it can get away from you pretty fast in ways that are... It's not just an application that throws an error when somebody goes to the web page. It's an application that can potentially do misleading things, tell you misleading things, take dangerous actions. Yeah.

⁠¶ Multi-Agent Systems Versus Simplicity

18:20

AI is an accelerant. And so if you make bad initial decisions, they run faster. I don't know if you want to talk a little bit more about something like that. You've mentioned a couple of patterns in the past that are... you know, sort of multi-agent systems versus kind of single agent with a lot of tools and a little more constraint. I don't know if you want to talk about like the difficulty trade-off. I don't know if I'm talking about it. But I think that's an interesting...

18:45

that's kind of like you know sort of the level two of the what we were just talking about with you know kind of first steps um i i do think that oftentimes you know People jump to sort of trying to use all the tools in the box to solve the problem before getting the simple ones just crushed. Yes, I have actually heard people who have never done agents before ask, you know, tell me how to build my multi-agent system.

19:13

Well, it's a big question, right? And it feels like it's jumping and putting the cart in front of the horse. I think when I get asked the question about what kind of Asian architecture makes sense, I... walk back to the problem, and I haven't seen this articulated a lot of other places, but I walk back and I say,

19:34

how hard is this problem and how token fungible is the problem? So is this a problem where throwing more tokens at the problem will linearly make it more likely that the problem will be solved? Is it a business problem where you know that if you could get 100,000 tokens of thinking time on it, you absolutely, with 95% confident, you're going to get to a correct answer every time?

19:57

Or is it a situation where it doesn't really respond to that and whatever you're going to get, you're going to get and you have to deal with it. And so. When you separate the problems out like that, and people sometimes have to actually just practice and try and see, and sometimes they use ChatGPT to do it, sometimes they do other things, sometimes they talk to an engineer and ask, is this token fungible?

20:19

But once you have a little bit of a sense of how likely it is that throwing more AI at it will help... you can then decide what kind of architecture makes sense to get you those tokens. And the reason I start to think that way is I just, you know how you sometimes read... white papers and they just sort of live rent free in your head for a bit because there's some like resonance to it that sticks with you one of the ones that did that for me is anthropics write-up of using multi-agent systems

20:50

and this was in the middle of the sort of the clapback they did to cognition because cognition released a white paper basically saying multi-agent systems are unreliable they're brittle they don't last well that's why we use one agent for devin and like

21:05

I don't know who runs the PR at Anthropic, but someone decided they were going to clap back to that. And they almost immediately published a sharp white paper that said, this is how we do multi-agent systems at Anthropic, and this is why they work. The thing that resonates and lives rent free in my head about their white paper is not all the details about architecting the agents. It's the simple assessment that they did where they said.

21:30

what is it about multi-agent systems that helps us solve problems? And what they realized is multi-agent systems are proxies for spending more tokens on a problem. And they explained roughly 80, 90% of the value of the system as a function of multiple agents spending more tokens on the problem.

21:48

And so that has just been resonating for me. And I've been thinking about it a lot. And then I now look at it and I'm like, what are the problems where spending more tokens matters? And in those cases, it may be worth it to architect a multi-agent system. you want to default to something simpler i wonder how um that's super super interesting i did that i wonder how um You know, is breaking the system up into multi-agents, does that help us as designers more properly?

22:29

partition the problem so that we can spend those tokens more efficiently? Like, is that a problem? Are we solving a human problem by architecting it that way versus feeding in some of those, you know? things that you might try to do in parallel into a prompt for a single agent or some other kind of mechanism for pulling in. prompt in context. Like, I don't know if you have a feel for that. Yeah, this is a really good question. Actually, I'm so glad you asked that. This sort of gets back.

23:01

to like engineering principles and separation of concerns and how we think about architecting systems that humans can maintain and i think that one thing that i haven't seen sort of talked about anywhere is this idea that these agentic systems are software

23:15

there's software we will have to maintain we have to think about how we maintain that software over time and as much as we can talk about you know what i just said where you want to like architect the system so it solves these problems with more tokens

23:29

In practice, I see more of what you talk about as a rationale for multi-agent systems. I see people saying, well, I need the agent to go check the inventory, and then I need the agent that can be the master agent that formulates the response back to the customer, and then I need the agent that...

23:43

can go check for the refund policy, and they need to be able to come back and report to the master agent too. And so effectively, humans are articulating a separation of concerns that helps them think through the flows. Yeah, it kind of seems like that, right? It's like we're trying to impose some sort of, you know, not, well, you know. Yeah, it seems like somehow we're, I mean, I do wonder sometimes because it's like, okay, we're going to call this thing a separate.

24:09

you know, a separate agent or a separate component in the system. But really, it's maybe just a different prompt going to the same model running on the back end. And we just can handle it better in our heads. Yeah. Or, and this is the one caveat I tend to give when people do this. I say, that's great. I'm glad it works for you. Be open to the idea that it may not need to be an LLM. Oh, that's interesting. Yeah.

24:33

Because an inventory check, we've solved that. We had inventory checks long before LLMs. I don't think LLMs are the best tool for the job. Just do it the regular way. Yeah, please. Please don't complicate it. But you can bolt down in a traditional way. Please do those that way unless you get something out of the agent. I totally, totally feel that.

⁠¶ AI Solutions: Build Versus Buy

24:54

What about build versus buy? Where are we in terms of, you know, do you think we are in terms of what solutions are just, there are products now that, like some of the rag stuff, there are, you know.

25:07

And companies that you can just hand your customer support database or whatever and will digest it and index it and make it available to some commodity LLM and then it's just a... a hosted solution like where which problems do you think are are already pretty solved by specialists so that's a complicated question i kind of want to give you a split answer i am very very bullish on buy for tools that enable the 47 million developers that we have across the globe

25:43

to build ai systems in their companies i think those are going to do very well i the classic example of course is cursor cursor for this cursor for that is the whole story of yc this year but The idea that developers need to essentially be re-skilled and re-equipped at scale is monetizable. Like, I think we are looking at a 10x increase in software costs per developer that people will happily pay.

26:11

And there's a massive opportunity on the table there. I am much less bullish on finished tools that you can buy as a company because I find if you're in the agent space, you have really... complex business context that you're trying to process. That's why you want the LLM. That's why you see the value. And that's really hard to just stamp out. And I'm aware that some of the value in

26:39

AI-powered SaaS and services is that you can extend software in ways that I was taught to never do as a PM in the 2010s, where it's like, no, you got it. This is what you buy, right? We're not going to customize it for you. This is what you buy. PMs say no, right? That's what we were all taught.

26:52

but now you can say yes now you can customize now you can extend it so i know that these footprints are edging out that sas businesses are getting smarter about sort of customizing stuff they are getting better at domain expertise but i think that there's a difference between ai software and AI agents. And AI agents are, I have yet to see, a good domain vertical example with enterprise level complexity.

27:20

where you can just say, here, have your AI agent and you're just buying it soup to nuts against all of your data. It doesn't matter what your initial data setup looks like, we'll make it work. That's super interesting. Do you think that is because...

27:35

of the differences between verticals? The naive part of me says every company that may not have any business trying to engineer some... large-scale GNI application probably wants to have some internal tools that, I mean, I guess, you know, you do see things like Microsoft Copilot and all of these internal. things that scrape your slack and help you bring context into it. So for you, you think of those as being sort of AI-powered applications. You don't think about that as a...

28:11

Yeah, like if we're really building an agent, to me, that feels like it needs to be something that solves a really meaningful business problem. I think you're right. I think the simpler agents are going to get commoditized out really, really fast. It scrapes the slack and it delivers.

28:25

that's done, right? Yeah, I'm sure you can buy it off the shelf, but also your developer can code it in the afternoon. Like there's just not going to be a ton of meat on the bone there. Whereas... if you're building an agent that is designed to handle dispatch for your fleet of trucks and the agent has to be aware of the weather in 15 different cities and it has to be aware of the maintenance records of the trucks and the schedules of the drivers

28:50

That's not something like either you're buying the SaaS application designed to do that. It's a lot of point and click and it's a lot of traditional software or you're building an AI agent.

29:00

that helps you do that autonomously. But I don't think you're buying the agent for that. The agent isn't a product. An agent is something that works under the hood of a... It's almost like it's a new class of... business asset it's an entity that you develop within the business that previously like it sort of straddles the line between software and employee

⁠¶ AI as Smart Glue

29:22

Yeah, I think that's really interesting. I think we tend to think about it like it's, you know. This bespoke internally built software solution stuff and and in a way like it reminds me a little bit more of like I Don't know I want to say like

29:42

you know, Excel macros or something like that. Like it's this kind of when properly used, very powerful kind of smart glue. Smart glue is such a great frame. Like I. i was reading a note from dan shipper who runs every this morning and what he noted is that everyone in the business and they have a small team but like everyone in the business is going to be able to use claude code to commit and build features on their products even if they can't code.

30:15

and now that's taken a fair bit of setup they've had to set up the file structure for that they've set up the system rules for that they have an engineering architect or something on staff that helps to kind of keep everything roughly in order um

30:26

But that's an example of where you start to like weave an intelligence layer across the business. And all of a sudden you unlock capabilities that would have been unthinkable two years ago. Yeah. Yeah. Yeah. I think I was listening to something. I'm like. commute yesterday. It was just a cloud code tutorial. And it's super fascinating to me. And the idea of cloud code, and they were talking about using cloud code basically in a sort of fire and forget.

30:54

it basically just becomes this like super intelligent Linux command line tool. And you put away all of the like interactive conversation about code base and stuff. And you just say like, you know. of all the thousands of GNU tools that run on our command prompts, this is like a magic tool that you just invoke and...

31:18

You know, you can have it do something totally pedestrian or something sophisticated, but it has brains to plan and solve any sort of complicated problem that exists within prompt terminal space with all of the... you know, rights and privileges that Prompt and Terminal has. And, you know, sort of blew my mind, right? Like, this is just a, you know, the first smart command line tool that...

31:44

can do sort of a very broad scope of things. Yeah, it's remarkable. And I think that they sort of misnamed it, calling it Cloud Code, because it's good for so much more than code. Yeah, it's for sure. I don't know if it was a stealth move, to call it Claude Code, or if it's... I think it's going to be important on fandoms. It's fun. It's for sure fun to play with.

⁠¶ Observability in Agentic Applications

32:10

So, I don't know. I think, you know, we're an observability company, so I'd be remiss if I didn't ask you about what your thoughts are on... you know, instrumentation of agentic applications. Like what is necessary from your perspective in order to make sure that some generative AI solution is operating well and continues to do so in a kind of... production basis i i think that's one of the things that companies tend to under invest in to be honest with you because traditionally with software

32:47

QA was that gate. You have the QA step prior to launch. and the if you look at sort of the investment matrix your 80 20 rule the 80 is on making sure the software is right before you launch it and the 20 is on sort of observing the software and making sure the bugs don't crop up in unforeseen ways because you're making deterministic

33:06

Not anymore. Now you're making probabilistic software effectively. The AI agent will behave in ways you cannot predict you have to i was talking with a director at microsoft a few months ago and he was observing that like it's entirely flipped how pms do work because pms have to discover the capabilities of the tool they're not determining the capabilities of the tool

33:27

Yeah. And so when you think about it that way, it means you're 80-20 rule flips and you have to spend a little bit of time on making sure that the worst stuff isn't getting into production, but you have to spend a lot.

33:40

on making sure that you have ongoing evaluation ongoing observation of what is going on with your ai agents in actual production and most people like if i say well have you been sampling queries against your agent like do you have a stable of queries and you observe how they work and you have red lines and you have a regular update cadence to the prompt you have versioning on your prompts you make sure that your agents actually can be rolled back where necessary

34:04

They look at me like I'm speaking Greek. Yeah. Like, no, no, no, this is really important. Like, just like we had to invent a language instead of processes for QA, we have to do that for agents. We have to take it seriously that we are putting these into production. We have to care about them. Yeah, that's great. Yeah, I like the idea that this 80-20 rule being inverted is a really interesting point. You've now got this sort of like...

34:31

sigmoid-like onboarding before you get to the, you know, large returns part of the, you know, at least while we're discovering what these do. That's super powerful because I think people are usually in the headspace that... you can get a lot of value out of the thing you can do immediately. And we all just planned a low rate of return kind of warmup period for this stuff, at least until we know it better. And this is your last...

⁠¶ Discovering and Evolving AI Capabilities

34:58

comment raised so many things in my head. I think one of the things that really stuck out when I was listening to the Claude Code thing last night from Anthropic was like, they were talking about it like, in terms of discovering what it can do rather than building for it to do a thing, right? Like Claude Code is very much an explanation for, an exploration for Anthropic.

35:22

Even as much as it's a product or a set of features that they're designing. That's right. Like they don't even, you know, they don't even know all the things it's going to be able to do and how to make the most of that. And I think if Anthropic doesn't have their... you know if that's what they're experiencing i think we should all expect to have some uh some experience like that i i think that gets at one of the core

35:47

attributes of this age we're in that we don't discuss enough. These labs that have produced AI models are almost without exception research labs. They came from these like PhD researchers who were just trying to figure out machine learning problems and stumbled upon something remarkable. And now they've let it out into the world. We're all using it.

36:10

But they are discovering it as they go, just as we are. And one of the things that sort of keeps me humble or keeps me reflecting on the unpredictability of the future is that we really genuinely don't know. how much magic there is left in this incredible innovation around reinforcement learning and transformer architectures. We're still learning. We're still learning. So far, scaling laws seem to hold. But even if scaling laws hold...

36:37

We genuinely don't know what jagged intelligence futures look like. Are we going to keep getting smarter on very specific verticals very rapidly, but then we'll have these weird glue work areas where we're not getting as smart, which seems to be what's happening in 2025?

36:53

Or is it going to start to even up very rapidly where we'll hit some emergent point and suddenly things will start to even up and we'll get a smooth intelligence curve? I'm a little skeptical of that one, but it's a possibility. I have to be humble about it. Yeah, I think maybe it was one of your recent posts where you talked a little bit about the possibility that, you know, the fact that generative AI is so good at code is partially because it's made by engineers and that, you know.

37:21

I think you were talking a little bit about the, you know, the announcements recently on the... Oh, the Math Olympia. Yeah, the math from OpenAI that they sort of gold-medaled and like... Yeah. There is this really interesting... hypothesis, I guess, that if you can bring in domain experts from different domains and figure out... I mean, it does help that we have a GitHub out there with so much code that...

37:48

is fertile training ground. But, you know, it'll be really interesting to find out if it's a jagged intelligence or a smooth one, as you put it. Yeah. Yeah, no, and I think that's actually one of the stories, like one of the lucky, the reasons I feel lucky living through this moment is that these stories will have endings we will see. We will know.

38:09

whose bets were correct we will know how these stories turn out because everyone is making date specific bets in the next couple of years yeah yeah yeah well should we flip over and do do some questions

⁠¶ Digital Twins and Unstructured Data

38:22

It looks like we have a question on digital twins, which is a place I would love to take this conversation to anyway. So let me read this. So this is from Brad Derry out there. Can you speak to ROI with digital twins? I guess we'll have to introduce the concept a little bit, but which sector has the lowest hanging fruit, quickest return by sectors? I mean, medical, education, transport, et cetera.

38:47

Do we want to introduce Digital Twins first and then get into that? No, I think we should probably talk a little bit about it. Brad's clearly familiar with some of the stuff that you've been talking about recently. Yeah. Why don't you go ahead? I mean, I think the simplest way to talk about digital twins is you have this base idea of AI agents, and typically we assume they do things. But what if instead we assume they modeled things?

39:10

And that's the fundamental difference. So the value is not in the execution of a task. The value is in the ability to model multiple timelines and to explore multiple options for a future. And so... the the question that comes up like i i think what's really interesting to me is that We are talking about this now in a software context, but I have managed and led PMs who come from advanced manufacturing contexts, and...

39:38

They've been talking about that for longer. The idea that like John Deere would have a digital twin for tooling in their factory is not particularly new. And so in that sense, I think some of the low hanging fruit. has been harvested in the advanced manufacturing and robotics areas for a while now and so it's up to us to think about

40:00

this concept now with LLMs involved, how can we start to model things that would previously not have been modelable? So in the era when you were building a digital twin for your locomotive at Burlington Northern Santa Fe Railroads, which they also did, great. didn't necessarily need an LLM to model that. It was still machine learning. It's still AI, but it's a different kind of AI. Well, now you have LLMs. You can model other classes of problem.

40:24

what can you model that's susceptible now? And so I think that in that sense, we are dramatically under-invested in problems that use unstructured data. because LLMs are very good at unstructured data. So look around sectors that have a lot of unstructured data and ask yourself, is there a way we could model different ways of attacking this problem? Yeah. Yeah, that's super good. I think we were talking a little bit before we signed on officially to this about the...

40:54

expecting AI tools to solve a much larger domain of problems, as much more flexible problem solvers. And the drawback there is as the domain gets bigger, the... space of inputs and outputs becomes exponentially larger. And in order to make sure things are truly robust, and I think this is...

41:19

This is a lesson that we're learning from, like as you say, from robotics, like where people have done a lot of domain generalization using simulation. I think we're just realizing now how this is sort of our only opportunity to... explore this really large domain of possibilities I can tell you from our own work at Fiddler as we're developing our tools for observability around agentic systems you know traces and spans and aggregate

41:46

calculations on performance of components. We're building example code to exercise all of this so we can see how diagnostics behaves. To do it right, it's forcing us to generate synthetic inputs that are... you know really widely explore the available space and and and you know you try to find ways to throw it into domains that you didn't even necessarily think of as the engineer like you're depending some level on the creativity of

42:20

uh the model you give it some seeds right you say like uh you turn up the temperature that's right turn up the temperature imagine imagine you're this thing in this situation and interacting with this other thing uh go And hopefully you've thrown it far enough out into unfamiliar space that you're exploring the domain of... Things that as an engineer, you wouldn't necessarily think to wrap a unit test around or data that you wouldn't have in hand to get labeled to test with for an eval.

42:54

And that's turned out to be a really interesting dimension. And to my taste when, and sorry to rant a little bit, but, you know, I think ultimately that kind of stress test ends up being part of, you know, what. again, is another one of these things that enterprises should be thinking about in terms of, you know, making sure that

43:14

They don't end up in the news having issued somebody an airplane ticket for $2 or given away a truck on a non-existent promotion for free. Yeah, and I think that people... read the headline stories and they assume that stuff will be fixed determinatively in QA because they don't understand how AI agents works. And what you're describing, when you're generating synthetic data, you're throwing things at the model, you're simulating model responses.

43:43

is much closer to what is actually needed to make sure that you hedge that risk. Yeah, yeah, yeah. So there's a question here

⁠¶ Internal Models and Privacy Hurdles

43:55

Companies dealing with, so this is from Lisa, my company has internalized models where we can't use open source tools. Any thoughts on how to work within a large company to deal with privacy hurdles? Do you have any? feelings about how to navigate. Yeah, the most popular variant of this is I have to use Copilot and I hear Copilot is terrible. What do I do? And I actually wrote a whole guide for that. So like if folks want to check that out, they can. But the long and the short of it is people.

44:26

lean into big brand names on intelligence. And we forget that in the absence of those big brand names, in the absence of ChatGPT, for example, we would be over the moon about a product like Copilot. We would be so excited that it exists. So instead of sort of focusing on what you miss out on and what you don't have, think about it as most people underutilize the intelligence they have on the table anyway.

44:53

And a well-executed sort of high-utility team with Copilot is going to beat a badly executed install of ChatGPT all day. So it's not about raw intelligence. It's about whether your team has figured out how to move from sort of individual productivity silos to working across multiple teams or working within the team more effectively. And so as an example,

45:21

A lot of people think about it like the classic example is the CEO writes and says, hey, today we've launched Copilot. Copilot is a great tool for enhancing your productivity to get started. Try writing an email with Copilot. Like I've seen that happen over and over again. And just that simple example can be flipped on its head, and you can think about it differently. And instead, you could say, we are going to have a team conversation as a sales team.

45:46

about who is best at writing nurture emails that follow up on deals that are stuck. Jenny is fantastic at this, actually. Jenny, can you show a few of your templates? And then you start to pull out your copilots around the table. You start to feed those templates in. You start to compare and contrast with your own work. You start to learn. Your copilot starts to learn the styling that the team wants. And now you're looking at teams.

46:08

level productivity gains where you're actually lifting up the whole team making the sort of the follow-up from sales more consistent and you have those kinds of opportunities all across the business and so I don't think people are as stuck as they think. I think documentation has been a gap. And that's part of what I wrote up what I did, because I feel like a lot of people just say, well, I'll wash my hands, right?

46:32

You're using Copilot. There's nothing I can do. And that's just not helpful to anybody. Do you think measurement helps in that problem? This goes back to when we were talking about how do you ensure that there's ROI? You think sort of building in metrics to quantify lift of these systems gets people out of the headspace of, you know, we're very limited in what we can do because of our security posture. Like if you were to just take the tools that.

46:58

you know, matched whatever your CISO said was sort of reasonable and secure for your org and your security posture. Like, I don't know. Is that an interesting question? Like, I think... What I have observed is most meaningful is if organizations at the team level set specific goals that matter to them as a team to get better at and then start to track to that.

47:26

Because what I've found is if you try and do something across the org as a whole, you usually settle out to hours saved. And hours saved is a super squidgy-widgy metric at the org level. Like I've talked to people who've done multi-thousand person installs of a chatbot and they do the little survey and everyone reports hours saved. In theory, they're saving 6,000 hours a week, an hour per person. Where is that time going?

47:52

Yeah. Nobody can say, is it going to coffee? Is it going to other stuff that's higher value? Nobody knows. Sure. And so then you end up with like, yeah, we can measure it, but we have no idea what the measurement means. And so I think that having some team level metrics.

⁠¶ Measuring AI Impact and ROI

48:07

increases ownership and skin in the game and makes it much more useful gotcha gotcha um let me here's here's i think this is sort of an interesting question um so there's there's one from a while ago from 15 minutes ago um from philip that was you know how would you invest to implement a next step for an organization developing like a sort of java product a legacy database information which you'll probably provide some metrics

48:34

run by a team of 10 devs? Are they currently using some limited AI models, loose optimization experiments, but not really thinking about RAG agents, Langchain? He's asking how much resources would you appoint? But I think there's a question here about what the role of generative AI is in interfacing with more legacy systems. What you see is Greenfield versus something we already have, and how do we interoperate with it in the most efficient, effective way?

⁠¶ AI Integration in Legacy Systems

49:10

No, there's a lot of really brownfield opportunities. And I actually think sort of there's a lot of opportunity there, but it's not easy to uncover. And that's why there's margin in those businesses that can figure out how to do that. Yeah.

49:24

The most successful approaches I've seen look at the problem first as a talent problem and second as a technical problem. Because if your talent on the table... doesn't know how to architect an AI system, you are unlikely in a brownfield environment to negotiate all the complexities and get to a really successful high rollout.

49:51

roi driven project it's not impossible it's just lower probability and so i would say you don't need to replace the team but you need at least one person that you can bring in who really knows what they're doing with AI engineering, who really knows what they're doing with architecting and building systems, and you bring them in.

50:12

And then they become the seed of a DNA change, of the talent upskilling on AI. And your first goal is just to get everyone to a level of comfort with AI engineering where you don't have to go back and learn it over again. And so as much as you want to get started on the project, the project will work better if your talent upscales a little bit first.

50:34

And so I would allocate a little bit of time there. And then once you have a talent base that works, I think you can approach the problem again and say, okay, now that we all have some fluency here.

50:44

As we look at this problem with a fresh lens, you have the classic software engineering questions. Like, does it make sense to refactor this entirely? Is there a... sort of piece of the data and business operation that we can bite off in silo and use like as an ai test bed and those are going to be unique questions like nobody can answer that until they look at the particular software stack that you have

51:11

Yeah. They're classical engineering questions with a layer of AI that the team can't answer fluently without that AI understanding. Yeah, I think that's great. Yeah, I love this as a sort of both an engineering problem and a sort of... Humans learning problem, the dimension, like figuring out what's possible for all of us. Like, it's like, you know, some sort of new magic that's been uncovered. And we know it's good for some things. And there's this whole space of.

51:39

How do we get better at working with it? So it seems like there's a lot of engagement in our chat thread. We'd love to hear from you guys if we should. try to twist Nate's arm into doing some sort of follow-up or something like that. This is certainly a fun conversation for me. Yeah, it's been fun. So do let us know if you guys want to hear more from Nate in the future. Let's see.

⁠¶ Future of AI Standards and Data

52:06

There was a question here about standards, which is, and I think we've all experienced this, like we're just in this sort of. real-time kind of hype field where, you know, there's new tools and frameworks coming out every week. You know, someone was asking specifically about MCP and A2A, like...

52:28

Is your sense that we're going to converge on some set of standards for specific things? Or do we end up in this very heterogeneous environment for a long time? And this is a real pain point for us at Fiddler in that... you know we're trying to develop tools that interoperate with the most commonly used standards and frameworks and you know there's real work involved in you know not just the cognitive map of you know

52:52

what is the right representation of this information that spans all of this software stuff, but also the real work of sitting down and implementing, you know, instrumentation that... uh you know drops into a lang graph or you know whatever your whatever your framework is of choice where do you think we are in the standardization um

53:15

Yeah, this was in I think the note I put on Substack yesterday, there was a little throwaway piece that I put in around sort of how data and privacy incentives are fighting with each other right now. And so from a technical perspective, I think we are missing a massive layer of data middleware that should be there to help us feed data to LLMs.

53:41

It is weird to me that we are still in a world where LLMs are so siloed versus data. It's not a technical issue. We can absolutely chunk and get the data ready. It's that the data that we would like to have available is so often locked down by boards or by leadership teams that say we want to protect our data.

54:02

we've been told to protect our data the first thing we're told is to protect the data from ai and so those incentives collide and so i think that one of the challenges right now is We need to be in a position where we can articulate from a data value perspective.

54:21

What is the incremental value of investing in additional integration, additional data access, given that data privacy landscape? And I think that one of the nice things about MCP is it gives us a protocol where you can sort of get a hold of data.

54:36

technically relatively easily, but it's also dependent on just like apis the ability of the other side to write the service well right like the mcp actually has to be useful and not all of them are and i think that's part of why uh perplexity has been talking about agentic search because They don't want to be bound to the MCP standard if they feel like they can get more data by using agents to search instead.

55:03

So to me, what that suggests is that we are in for, sadly, more chaos before we all solidify, even though there's going to be frameworks that people are using because we have this tectonic. battle between the privacy incentives and the tech itself that wants the data and is hungry for the data and that we have the capability to chunk the data for? Good answer.

55:28

Well, I think we're pretty close to time here. So I think we'll just outro. This has been a total pleasure for me, Nate. I've been totally digging your stuff. You guys can find Nate B. Jones on your sub stack and all over TikTok, I see. YouTube also. Is there any other place that we can do it? I think those are the three. We can go with those three.

55:53

Thanks a lot, Nate. Thanks, everybody, for coming and listening. This has been a blast. I had so much fun, Josh. Thanks for chatting with me for a bit. All right. Well, you take care. Everybody out there. Have a great day. Bye-bye, guys.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript