The rapid experimentation of AI agents (w/ Yohei Nakajima)

00:00

Welcome to The Analytics Engineering Podcast featuring conversations with practitioners inventing the future of analytics engineering. Today, I had the pleasure to speak with Yohei Nakajima. Yohei is an investor by day and coder by night. In his day job, he invests in early stage companies as a GP at UNTAP to capital. His hacker has been focused of late on the applications of AI. In particular, one of his projects, called Baby AGI, got a ton of attention about a year ago.

00:37

Baby AGI is an AI agent framework, creating a plan, execute loop. If you give it a goal, it will create a plan to achieve this goal and then go execute on this plan. All of this plays out in a long chain of LMAPI calls and you can observe every step along the way. It's fascinating. In our conversation today, you'll hear us both just trying to figure things out. Yohei certainly knows a lot more than I do about the state of the art in AI agents.

01:07

But the truth is that this is an extremely experimental space. Depending on how strict you want to be with your definition, there aren't a lot of production use cases to point to today of AI agents. When you watch the demo videos on Yohei's Twitter feed, you can immediately see the promise. I know I learned a lot in this conversation and hopefully you will too. Without further ado, let's get into it. Yohei, welcome to the Analytics Engineering Podcast. Hey, thanks for having me.

01:48

I want to start by having you give a little bit of a background to our audience. I think that the thing that is so surprising to me, I think that you spend most of your time as an investor, but most investors that I know don't have repositories that have more GitHub stars than the companies that they invest in. What do you spend your time on these days? I'm a VC by day, builder by night, is a tagline.

02:12

But during the day, I run on top capital in the early stage VC firm, which means my day of Zoom meetings, emails, travel, and ducks and whatnot. And then at night, as a hobby, I like to build. It's always been a hobby I've always had on my side. I've never been an engineer by trade, but I've been coding as a hobby since high school. I actually became more of a no-coder for a while just because with the limited time to build, no code was easier.

02:36

But when I started using AI, I realized that I could pump out some pretty cool code in a matter of two to three hours. So I switched back to code about a year after I started using OpenAI. So that's where all my GitHub projects come from. Very cool. I want to talk to you about agents because I know you've spent a lot of time in this space. But before we get there, maybe I could just ask you some high-level questions about what you're seeing. I think you invest mostly at the early stage.

03:03

Is that right? The earliest stage is pre-seed. And are you exclusively focused on AI? Companies? No. Most investors, but I do go deep in sectors one at a time. So I've done no code, web3, climate. And right now, AI is definitely top of mine. And so every company we're looking at, we're thinking, how does AI impact this industry? But we're not exclusively AI. Got it. When you think about the AI-focused deals that you are seeing right now, I do some angel investing.

03:35

You see a lot more than I do, but I do some angel investing. And maybe 18 months ago, something like that, I saw a bunch of AI infrastructure. Companies like Langchain and this type of, and I haven't seen quite as much. There's still been some of that, but it's clear that even over the past, let's say you're in a half of the real AI boom, there's already these kind of small waves that they're cresting. What's your narrative for what entrepreneurs are focused on right now?

04:07

I mean, I think to some extent, you're right, there are many waves of AI companies that launch. I think part of that is that, as these models evolve, new things become possible. That weren't possible before, which means we can build new stuff on top of it. And so when GPT-4 came out, there was a whole slew of things that became possible that weren't. And so that's when we saw kind of infrastructure that took advantage of GPT-4 so on.

04:36

So are you saying new capabilities like now we have Sora and you can create videos? Are you just saying the reason capabilities are better and so it's more relevant for more use cases? I think it's both. A little bit of both. I think on one hand, right? I mean, on the earlier example, I remember when I think there was GPT-3, they went from DaVinci to DaVinci, three and suddenly they could ride. But now we're talking about, like now they can do good JSON output.

05:02

And if you can do JSON output, that means you can use the LLM output, call a tool, like a function and then get the result. But that wasn't really possible a year ago. So now if you can call a skill or a tool or function and that's, you can make that relatively reliable, then there's a whole slew of infrastructure that you can build on top of that capability, which again wasn't possible a year ago, which is why you'll see these like, you know,

05:26

and then I think another example might be GPT-4V would be the vision model, before GPT-4V, it was a little bit trickier to do vision-related apps because you would usually have to do some sort of first segmentation to like separate the object and then on an object by object basis, do recognition.

05:45

Now I can build a mobile camera app that opens up the mobile camera and just like say, just take that picture and just send it to GPT-4V and ask it a question in natural language and get a response back. So I can now build kind of vision-based tools on top of GPT-4V. So it's both on the model capability but also on the new modalities as well that will enable new apps.

06:07

Do you want to take a minute to plug a company or two that you've invested in recently, just as an opportunity to further the audience to see what you're seeing? I mean, one of the tools that I use every day, which is the easiest plug as a company called Wokello, Wokello.ai, they do AI due diligence for market industries or for companies.

06:27

And so when I'm interested in a company, on woklo.ai and if I'm interested in a company I can just, or an industry, I can just type in the company name and select it or type in an industry and natural language and just click generate a report. Thirty minutes later I get a 20 to 50 page report that includes market insights, recent news, competitors, management profiles, all that kind of stuff. I mean, really as the kind of work you would ask Nailist to do and it would take him two weeks to do.

06:56

But it's in 30 minutes. Huh, that is incredible. It's funny. I maybe five years ago did a consulting project for a company that was effectively doing that but it was crowdsourced. It was like they had researchers on the team and they broke up any researcher request into like ten steps and there were editors and they were all and now it's just, it's the same thing but it's a whole bunch of LM calls and API calls to grab the data.

07:23

Right. That business I believe has been well and thoroughly disrupted by this point. Okay. That's neat. And now I've got that up in a browser tab and I don't see a pricing page but I can see clear business value there. I'm going to check that out. Based on the deal flow that you're seeing, what do you think I've done a decent number of interviews from kind of mainstream business and tech reporters on what I think the future is going to look like in an AI first world.

07:52

Everybody's got their thoughts on this and what is pretty clear to me when I do these types of interviews is that mainstream journalists are only able to go so far down this rabbit hole. Sometimes when the world is changing so much, if you go too far away from like what things look like today, then maybe there's a risk of, you start sounding crazy if you report on something like this in a super wide, wide, wide mainstream publication.

08:18

But I think you're getting to see a lot further into the future. What do you think are some under reported on things that will be true about the world in two or three years that no one's paying attention to? That's a good question. I mean, I've done a couple of talks on autonomous agents and we're going and some of the stuff that I think gets more of a reaction just because it hasn't been talked about as much as the idea of an agent to the CEO, for example, or agent managing people.

08:44

I think when people think about AI right now, the initial use case people think of is, oh, I am going to use AI. And so you imagine AI at the bottom of the run, but an example I give is Amazon Mechanical Turk, or even Upwork. These are technology platforms that help you manage people.

09:02

And now imagine taking this capability of LLMs and rebuilding something like an Amazon Mechanical Turk, you can imagine how you could build an agent that can manage a college ambassador program, in which case you're, you just have to find a new college students every year and communicate them to do the exact same thing. You can imagine an AI that can do that more efficiently than a human in theory.

09:23

And if you extend that, then why can't you have a CEO who's available 24-7, has access to every single piece of data in the company, can in parallel take feedback from every single employee and synthesize it, and has bias that's at least measurable. So measurable transparent so you can adjust it versus, you know, wondering if they're being unfair. I love that. I'm ready. One of our values as a company is we move up the stack. And so it's this invitation to always replace yourself.

09:51

And if there is a very effective AI CEO, I'm ready. Bring it on. The initial idea around baby AGI actually was to prototype an autonomous startup founder. I don't think AI is far from being able to build an innovative startup that becomes a unicorn. But if we pick something that's a little bit more straightforward and purely digital, like an e-commerce drop shipping business, I think it's within reach to build an AI system that can run and manage something like that.

10:22

I saw the video of you kind of giving exactly this prompt to, you were giving this prompt to baby AGI. I was. I think that was the first prompt. It was like come up with a sustainable business and just like build a business around there. Okay. Cool. So let's use this as an opportunity to go there. Talk about baby AGI. What is it and what have people built with it?

10:43

I think the reason baby AGI is a people in viral in April last year was that at that point, I mean, chat JPT was still relatively new. So people were still really kind of building on top of that chat interface. Baby AGI was one of the first kind of popular open source projects to basically loop in LLM. But I added some capabilities such as so when you give an objective, I would first have a task list creation agent, generate a task list based on that objective.

11:09

And then I would use code to parse that out and then send each task one by one to an execution agent to execute that. And then somewhere in there, I was also grabbing some of the input and then embedding it and storing it in a in Pineco. So that when it was generating new tasks where I had a task prioritization agent which would review past results and update the task list, it would then check for the most similar tasks first so that it would try to generate kind of new kind of task.

11:35

And what happens when you press run, like, you know, build a business, it would just come up with things to do. And at this point, it was just LLM calls. It was just generating tasks. But when I started to start a business, I was like, I need a marketing plan. I need to, you know, build website, copy, I need to come up with product ideas and it just kind of kept going one by one on new ideas like the do around double with this. And maybe a GI itself is how many lines of code?

11:59

It was 105 lines of code, I think 140 including comments. That's the part that blows me away. And that's what it was, right? It was just the simplicity of, it was the idea itself. People saw that and it was the first time you could really just give in an objective and it would just spit out infinite tokens. It was a great way to waste money. But it was so simple that I think that it inspired people to be like, oh, wait, everybody had a different idea on where you could take it.

12:25

And I think that was always most exciting about it. This has become commonly referred to as an agent. This idea that like there's an AI that operates in a loop and it plans and it executes. It can maybe to a certain extent take actions in the real world. I think that this is not a new idea, but it is an idea that maybe its time has come. Is that a fair way to think about it? I agree. It's not a new idea to let a software program autonomously run.

12:55

I think it was the, yeah, I think with LLM capabilities, you could finally get it to reason, do similarity search and do it in a way that was much more robust that a pure logic based agent. I realize that this is work that you do in your hobby, not your day job, maybe to a certain extent. But I invest in autonomous agent companies too. So it's very much a hobby that supports my day job. Got it.

13:19

Okay. Can you talk at all about agent architectures, their research papers on this, there's Langchain has written about this recently. Are there better and worse ways to construct agents? Yeah. I did it talk recently, talked about how, when you technology emerges, there's a period of rapid experimentation. If you look at old cars, there's three wheeled cars, there's steam, you know, there's steam engine cars.

13:42

But if you look today, a lot of the cars look the same, same with phones, flip phones, brick phones, today all the phones look the same. So autonomous agents, I mean, we're today, I think what we're in the rapid experimentation phase where there's a whole bunch of different frameworks, a whole bunch of different architecture, a whole bunch of different approaches. And I think it's hard to say which one's going to stick, but ultimately I think it will be some consolidation.

14:03

Similar cars in a sense, right? There's a lot of different parts to an autonomous agent. So I think it helps if you're thinking about architecture, just like look at them each one by one. I think task planning, I think is probably one of the main ones in terms of kind of the components. And for task planning, I think that two, I say the two major ways is kind of the react style, which is like do one thing at a time and reflect on it.

14:24

And then there's the, you know, the baby AGI style, which laying Shane implemented, so that plan and execute, which is generators taskless first and then go through them. But of course, there's a fuzzy line because you can then, you can generate a taskless first and then reflect on each result to like update the taskless. So it's not like what are the other, but those are kind of the two major approaches that you can combine.

14:44

Sure. Anytime you make analogies with the human brain and human thought, it's tenuous. But when I go to plan out a task list, I don't tend to generate 100 steps at once. I tend to think maybe, I don't know, three, four steps ahead. Is there like an appropriate amount of head thinking when you generate your taskless? Like probably the right answer isn't 100 and it maybe it isn't also one. What tends to work best?

15:12

I think these are the questions that autonomous agent builders are asking themselves as they're designing the orchestration. Those are the questions I was asking myself as well. Actually, a funny story. I gave baby AGI as a skill to baby AGI. And the first time I ran it, it just recursively generated more baby AGIs without actually ever completing the task, which I just thought was like, how it's like predictable. I think that's like the perfect example of procrastination. We all do that.

15:47

And then some people said like, oh, maybe you should just limit it at one layer. So you can only do up to one layer of baby AGI. But that would be one limited solution, which is like you can generate maybe four steps and then like each step can then break down into up to another 10 steps. But I think one of the challenges with building these is that I mean, at least in my mind, ideally, it's extremely flexible in that I'm not hard coding how many layers it is or how many steps it can be.

16:13

Ideally, it can naturally decide how many, like over time it gets figured out how many steps it should start with and then how many subtest can do it, how many layers appropriate. But that feels a lot more complicated. I'm interested in what you have seen this be good at so far. So like there's the example of the, you know, starting a business and you kind of said that maybe it would be capable today of doing some kind of generic drop shipping sell something. And it would be good.

16:42

I said it's within reach to go. Got it. Got it. Got it. To me, there's this difference between it can construct a set of tasks that get you to kind of a median output, but in many spaces and capitalism is one of these spaces, median is actually not successful. You have to actually like follow this idea chain and be best or in some way the best at like one particular thing. How should we think about where we will see agents impacting our lives first? The way I think about it is in three buckets.

17:14

I think of the first bucket as what I call handcrafted agents where you are writing each problem and you're changing it specifically with API calls and it's like a very specific flow. Some people actually would call that not autonomous because it's a human generating a task list in your lot AI running it autonomously.

17:31

That being said, I mean, with the idea of well, if I can give an AI a company and it's going to give me back a 30 50 page report from a usage standpoint, it feels very autonomous to me. And then there's what I call the specialized agent, which is the, I think where it becomes truly autonomous, it's dynamically generated its own task list or recursively figuring out what to do, but it's within a specific set of skills and tools.

17:52

So you can imagine a coding specialized agent like Devon where we see specialized agent that just knows how to like look up crunch base and like calendar, for example. And then there's the general purpose full like fully generalized autonomous agent, which you can ask of you. Handcrafted agents are useful today.

18:07

I mean, what kills like one specific example, but I know many companies that are doing handcrafted agents again, this is humans generating the task list and they're charging lots of money and they're, you know, with low churn and they're creating value. Specialized agents, what I'm seeing right now is I'm seeing pretty interesting demos with a lot of promise. Those companies are raising capital.

18:28

They're starting to have conversations with enterprises because they're interested and they are starting to line up pilots because these companies are willing to test it. But I haven't seen anything that's like bloom me away in terms of like reliability, value creation, like charging millions of dollars. I haven't personally haven't seen it. And then when it comes to general autonomous agent, I haven't seen anything remote, equestrily reliable.

18:47

I don't think of them as three fully separate buckets, but actually more of a kind of starting here and slowly moving to here. And that's based on my building experience because I started in the middle trying to or started it essentially on the very far general autonomous station. And you know, I've done a couple of my mods of baby, giant, continue to experiment with it.

19:09

And one of the ones, one of the experiments that worked well was I called it the foxy method, but it was kind of my idea of how to do self-improvement. What I did was when I asked the baby, I'd run an objective, it would generate a task list and there'd be a final output. I would have a reflection agent reflect on the output and the task list and then write notes on how it thought the task list was and I stored those.

19:29

So as I use it, I have the stored reflection plus task list plus output plus objective. And I embedded each objective. And anytime I ran a new objective, I did a vector search against similar objectives in the past. So it would look for solar objectives that's tried before. It would pull in the task list and reflection on it. And then based on that, it would generate a new task list, which means it would take into account this reflection on the past time it's done something.

19:53

I know this is getting a little long. So no, it's good. One of the things I figured out, for example, when baby, I came out, I had a lot of reach out to companies. I wanted to pick my brain and learn about how they could use AI. I think this is great. I'll jump on these calls. And I started using baby, giant. It prepped me for the calls. And one of the things I like doing was coming into the meeting with specific ideas on strategies that they should consider that leverage large language models.

20:15

Now I could just ask an AI and say, hey, come up with a whole bunch of LLM strategies that NASDAQ, PHP, you pick a company. No, we could definitely do that. But in my case, I actually asked baby, GI2, research the company, research its business units, research its revenue drivers, and research its cost drivers, a separate search unit, and then I had to do a lot of research, combine that into a comprehensive report.

20:38

And then based on that, generate three large language models, strategies that leverage large language models that would be impactful to their bottom line for every single business unit. It was a very specific flow that I like. When I was first doing it, I had to type that flow in each time to get it to do it. When I added the self-improvement method, I only had to ask it to do that once. And then it would store that task list.

21:01

And then next time, I could just not dis-specify and say, hey, you know what, find me, start out, strategies that leverage large language models for VISA. What it would do would do a vector search against, you know, the last objectives is done that's more similar. It would grab one where I did specify the flow. And it saw the task list and it saw that it was a good task list so that it would copy that task list from a previous run without being specified this time. That makes sense.

21:23

And so, yeah, yes, the point of what I was trying to say is that building these handcrafted agents, right, is essentially data that you can feed to a specialized agent to help it be just as good as a handcrafted agent. So that's why it's more of a, the more handcrafted agent you build, the better a specialized agent and that space is going to be. And then once we build a whole bunch of really good specialized agents, we can just roll up that knowledge to build a generalization.

21:47

To draw the parallel to human processes, the way that people have kind of learned to interact with chat GBT feels very foreign to the way that humans actually go about thinking through an idea. And it is like, you try to think of all the different failure modes and you try to incorporate in the prompt like, no, don't do this, make sure to do this and you just like have this massive prompt that you're, and that's actually not at least how my brain tends to process thoughts.

22:16

You like, you go down a path and you say, was that productive? Oh, I'm going to come back. I'm going to go down another path. I'm going to, you kind of spiral through the idea space because you can only hold so much stuff in your head at any given point in time. And the way that you're describing the process that an agent goes through, it feels much more like the thought process that I would have in my head. And especially maybe we can go here.

22:38

My understanding is that in any given agent, you can use multiple different models. And to me, that feels a little bit like there are multiple different regions of your brain that are good at different things. Yeah. And you can use different system prompts, right? I think as we, as these get more natural, is there's going to be dynamic system prompts that like as you're conversing with it, it's going to try to, it's going to better understand your intent.

23:04

And then if it notices that like your intent shifted, it would dynamically update the system prompts so that it's taking into account that new and that it captured in a earlier part of the conversation. Can you talk at all about why agent designers switch between different models for different tasks? Yeah. Different models have different strengths. This is the shortest answer. Some models are better than others at writing code.

23:28

Some models are better than others that doing writing long pieces in an eloquent manner. I think Claude, even before Opus was a lot of people are talking about Claude being really good writer. I knew a lot of developers reason, Claude, to write long pieces, articles or blog posts. And it's constantly shifting, right? Then that was even before Claude Opus and Haiku came out. And now Claude Opus and Haiku came out. And Haiku's really high quality and really fast and really cheap.

23:54

Opus is higher quality, but it's more expensive. So developers are looking at the cost, the speed and the quality of each model. And when I say quality specific to the use case. So with an agent system, you have parts that need to write code, parts that need to respond to the user. So depending on the specific need, you're swapping out models to see which one is optimal for your needs. And generally true that the planning stage requires the biggest model, the most capability to think.

24:25

Yeah. I think the task planning, again, depending on how you do it, in my case, I was tracking dependencies between tasks as well. And I was even having at least an earlier version. I was even having it select the tool to use at the task creation stage. So that one required a lot of reasoning. And so I always needed the most powerful model for that. But again, there's ways to go about it, right? I just described generating a task list, the dependencies and skills in one call.

24:55

But as a developer, you can choose to break that up. You could say, hey, first generate a task list, then look through the task list and map out dependencies, then look through a skill list and then assign a skill to each task. If you did it that way, because each call requires less reasoning, you might be able to move to a lower model. Because you can use Hikeu for all three, it might be cheaper than using GPT-4 for all three combined.

25:18

And again, those are the kind of discussions that developers might have. Yeah. Can you give me a sense of what is the longest task chain you have ever triggered yourself and what the compute bill was on that? What kind of order of magnitude are we talking about here? I mean, I haven't run anything back crazy. I think I probably had one that ran up to like 30 or 40 bucks because I forgot to turn it off.

25:42

But when I was running my original BabyAge UI, the original BabyAge UI just looped through until you stopped it. And so it did start looping at some point in some cases. I mean, though, there was that vector store memory. So that was probably the longest running one. I quickly switched over to objectives with a close end because I realized that we were far from just do this and run this forever. How long did it take to run up $30, $40 charge? I don't know. I was just throwing out a number.

26:08

I feel like I probably the longest I've actually let it run. It was like probably four to five hours or so. Yeah. But there's a very big difference between I forgot to spin down my EC2 instance for a day versus I left an agent running. The costs associated with recursively running against open AI's APIs or anybody else's is non-trivial. And I think that one of the things that needs to be true over time is that the cost curve needs to come down on this. And I think we're seeing progress there.

26:39

Can you comment on that? Yeah. I mean, the cost is definitely going down. It has continued to go down significantly. So that's been promising. I think some people are, I mean, I've heard a lot of people feeling pretty opinionated saying like cost is essentially going to be the cost of energy. I don't have that opinion so strongly. But I see that as a possible future. That's because of model design. That's because of hardware design or just competitive pressures or everything. Oh, I'm not sure.

27:06

I'm the right person. That's that question. Sure, as the short answer, I can guess. But we can kind of take as if what we're doing is thinking about the space of innovation here, we can kind of take that as for granted that like this will continue to come down over the coming years. I think, I mean, for me, my understanding, at least what makes sense to me is that, I mean, this capability is so valuable that we will continue to invest in the infrastructure to provide more of this capability.

27:38

And as we invest in that infrastructure of a cost of serving this capability, we'll continue to go down. And if you follow that logic, it will get as low as we can possibly to serve it. Now, whether or not how much it costs to end users more of a business model, to decision, but if it's impactful enough, and what we're seeing is lots of companies launching, lots of companies trying to just beat the models. So if we continue with that competitive trend, then then cost should continue to go down.

28:01

One of the things that I think is most fascinating about the agent space is that it is when you start to plug in the capability of taking actions and the kind of reasoning loops where you plan and then you execute, a lot of times the execute is just run some Google search, get some more information and feed that back into the plan. Okay, cool. Like that's important.

28:27

But is there any real limit on now that language models can write code can formulate their outputs and inputs as JSON or other structured data? Is there any particular limit on what types of actions they could take in the world? The first thing that popped in mind was anything that requires a physical body will be hard for an additional agent to do on this. You give them the robotic parts to do so. But again, I wouldn't say that's a limit.

28:54

Like we can build the robotic parts and we can give them division capabilities. We can do those things. I think what's interesting to me is messaging. Are there agents in the world today that are connected to the Twilio API or to SMTP and sending emails? Definitely. I don't know how many. There are tons of emails that are automatically sent. And I think the question you're asking is, is there any that's fully dynamically managing it on its own? Probably.

29:19

I would guess that there's a lot of email automations on leverage large language models. But I don't think there's an AI system. I mean, there might be someone could have built it with just like an objective on how to run the email address and just like it just decides what to do. I don't know for there yet, but it doesn't seem that far. No, it seems like so adjacent to where we're at.

29:38

But I as a person who receives many emails and on way more email lists than I would like to be, I have not yet become inundated under a wall of AI generated spam. Or at least I'm not aware that I have. I mean, the spam filters are still pretty good. That's I just baked into our email system, fortunately. It seems like at some point if you have the kind of personalization capabilities, you were just saying how effective it is to generate these research reports.

30:10

Imagine that every time you sent any email, you first generated a research report on this person and their priorities at this company, what their job was. And I mean, man, you could write it really freaking good out about the email. Yeah, you could. Actually, one of the things one of the automations I have is every single email I receive is automatically summarized. And then I also have a separate, I have a separate action that extracts bullet points of facts from the email address.

30:33

And then I roll this off to the email address level. So for every email address I interact with my error table, there's a cell that has a summary of interaction, which is, you know, they email you on June 2nd asking for this. You replied on June 6th doing this. And then it rolls up to the domain level.

30:49

So for every single domain name I interact with, there's a, there's one error table cell that has a summary of interaction, which is like one sentence on like basically every email that's gone back and forth between me and that organization, as well as one cell with all the bullet point, a list of facts as a huge bullet point list.

31:07

And so when I am finalizing diligence in a company, part of my step is actually just copy basically that bullet point list of every fact that a founder has sent me, then just asking chat to keep to organize it for me and then combining it with my notes. And again, asking chat to keep to be organized for me. If you like this podcast, you'll love coalesce, the analytics engineering conference built by data people, for data people, join data practitioners and leaders in Las Vegas this October.

31:37

Register now at coalesce.getdbt.com for early bird tickets to say 50%. The sale ends June 17th, so don't miss out. That's coalesce.getdbt.com. We'll see you at coalesce. When you think about the different professions and how impacted they can be today by AI, it is not totally surprising to me that VCs are so well augmented by AI today that makes a certain amount of sense. It's a lot of qualitative work and a lot of reasoning as a VC. And there's an infinite number of things to do.

32:20

So there's a, I think there's a massive opportunity to just do more as a VC, even if it's on a shallower scale, but just on a much more massive scale by building into automations. My brain automatically tends to go to marketing use cases for this stuff. So there's like emailing. It also feels like a camp based marketing or like using paid channels to like very much where narrowly target individual messages to individual or small groups of humans.

32:48

Have you found people trying to come up with innovative marketing strategies based on this type of, yeah, definitely seen people do things like take a list of you URLs, scrape them, and then write a summary of each company. And then based on that summary, write an introduction sentence to that company selling your product, for example.

33:11

And then you end up with a spreadsheet where like you have that opening line that's personalized to that company based on what how they're communicating on their website. So it's kind of a low hanging through that you can just plug into your current mail merge strategy. I think it's great for also just like enrichment. It's just like lead, lead qualification, lead building.

33:31

I mean, for us specifically, there's these big lists of family offices that kind of float around in the venture world would like, you know, $5,000 to the $1,000 family office website. We're just like, we're not going to spam that many family office websites. Like, what do I do with the list? I think I actually just had had an agent loop through every single website, scrape it, and look for the terms VC venture capital startup.

33:55

And then I spit back out a short list of like 350,000 offices that like use those terms in their web site, which for me is a much more targeted list as someone who's raising a VC. You described that entire thing in natural language, and it was able to do that for you. There's a little back and forth on fixing the error.

34:12

Yeah. Yeah, it wasn't like I wasn't like going and done, but you know, as an agent, early Asian builder and user, I think the best way to get use case out of an agent is to just realize that I sometimes fail and try to figure out how to get it working. And because if you can't do that, it's also part of the build process. So it wasn't, it wasn't purely like here, here's the CSV right it, but I did do it, it was a little bit of the back and forth to eventually do it.

34:33

And then eventually figured out how to loop through a CSV, spit out a new CSV with check mark fix, all the federal offices that should be on a short list. Just out of curiosity, from beginning of that task to like you have a list, a CSV with check marks in it, total elapsed time. Probably about an hour or two. I remember it was about an hour or two, but the actual work itself is about like 20 minutes. Hmm. So the time it was kind of doing its thing. Yeah, I didn't like do parallel stuff.

35:05

I just kind of like set it up and ran it. So it took me about 20 minutes of like finnaling the code until I, you know, going back and forth with until it started running. And then I think it like the first two times aired outside like jump back in and like tweak the code a little bit. Like once, you know, ask ask to do some things to a different once or twice, but then it eventually figured out. And then once it was working, I just had to remember the code to reuse it again.

35:28

And so when I had a friend who was raising for a climate company or a post-cultivated company, a climate company, I just ran it through the same program. But this time I had it look forward to like climate sustainability. And so I had a short list of family offices that used those chairs on their own page. So I can just share with our portfolio company.

35:45

It's particularly funny use case to me because my co-founder Drew and I started one of things that we were working on before starting DBT Labs was finding all of the e-commerce and SaaS businesses in the Alexa 1 million. And we actually, this is back in like 2014 or something like that. And we built a process that we fed in the Alexa 1 million and we scraped all of the HTML for those pages. And then we did some particular search terms and whatever. And that took, I don't know, a couple of weeks.

36:23

The idea that took about 20 minutes of year. I mean, you start to get into the thought process of, you know, higher order thinking is still very valuable. A lot of times the lower skill work that maybe a college intern once did is more automatable. I think my most powerful workflow is when I think to myself, I was with a VC and he is like, oh, I had this cool idea and I just asked my portfolio company if he thought it was possible.

36:50

And for me, when I have an idea, I'm like, I wonder if any I can do this. First thing I do is ask chat GPG, hey, is this possible? And then I'll like, explain why it's possible why it might not be. Most of the time, it says, oh, that's a cool idea. Like, you know, here's how you might go about doing it. And then my next thing is like, can you prototype it for me? And it just spits a copy paste into the template. I go back and forth, like, work it out the errors.

37:13

And then I have like a prototype of something that's interesting that I was curious if it was possible. In some cases, like, it was possible other people have done it. But if you do it enough, in my case, you know, I did it 70 times before baby Asia, I came out with something like truly, truly unique. But even on a smaller scale, I think it's such a powerful, powerful workflow to just like wonder if you can do something's possible. And then just ask a high to like, try it for you.

37:37

You very effectively wired your brain to work in this way. And my guess is that I don't know how old you are, but certainly you didn't start your career a year and a half ago. I'm finding myself sometimes, and I'll admit this in public here, sometimes feeling a little bit like what I imagine my parents felt like in the early days of the internet, where it's just like, it's not my default response to go Google something.

38:02

And I have to like retrain my brain to get it to use this new set of tools and capabilities in a different way. Have you been aware of this tendency in your own brain? Or are you just like always excited to do this? Yeah, I've noticed that before. And there's plenty of tools I start using, and it doesn't stick. And there's plenty of tools I start using. And like, this is mind blowing. I can't stop using it.

38:25

Yeah. And as I think I've always enjoyed, I think I've always been somebody who jumped on and was always willing to try stuff. And I enjoy it. I have this part of the reasons I think I like being a VC. It pairs well with the job. But I do also make a proactive effort to try to continue using tools that I know are valuable, that might not be. I don't just rely purely on my instinct and desires. Like, there is some proct of this around making sure I use tools.

38:51

And you know, using automations is one example of that, right? I quickly started setting up Zaps on Zapier that used OpenAI. And those are great because I can just sit down and spin up a ZAP in 30 minutes. And it'll, depending on the ZAP, it just runs all the time based on whatever triggers there are.

39:09

You have a, you know, for example, we have a ZAP where if a organization is added to our air table and it has a website and it does not have a description, that specific scenario will trigger a ZAP that will then go scrape the website using scraping B, which is baked in the ZAPier. It will then send the scraped website content to OpenAI. Again, there's a ZAPier integration. And then take that summary and then add it to the description column on air table.

39:37

And that's just, you know, that's just a four steps of trigger, a scraping B call, an OpenAI call and then an update air table call. And that's probably 10 to 15 minutes to set up. Once you set it up, suddenly you always have a description on every single company in your, you know, in your CRM that has a URL.

39:53

It's so neat that architectures like ZAPiers where you get to configure all this stuff yourself are so flexible to this type of thinking, this type of like reimagining workflows, whereas something like Salesforce requires a lot more work to try to quote unquote integrate AI workflows into, you know, but what you're describing, it just feels incredibly natural. I mean, I could do all of those things tomorrow.

40:19

But like if you're going to ask me to go in there and write Salesforce code to do this stuff, like that's a pain in the ass. Yeah, ZAPier is powerful. I honestly, right after I started using OpenAI, I quickly started, I was already running, I think like 10,000 Zaps a month before AI. So it was a heavy no-corritor with lots of automation. So this was very much like I was inclined for this kind of stuff.

40:40

It was great was that I could literally just add an OpenAI call in like existing workflows and suddenly like my automation started AI capability. And initially I was using, using ZAPier's webbook capability because there was no OpenAI integration. So I was just doing a webbook to the OpenAI call. But actually at some point setting up the webbook became a really repetitive process because I was using it so many times.

41:02

And I realized that if I built an integration into OpenAI, then I could skip that repetitive part because the whole purpose of an official integration is that that has all the URLs faked into it. So I actually had built the first unofficial OpenAI integration into ZAPier as like an integration that anybody could use. And I had the link published and I was supporting it for a month before.

41:23

Tap JPD came out when I had reached out to ZAPier and offered for them to like make it an official one which they did. So the official, the current OpenAI integration in ZAPier is one that I initially had built for myself and was just supporting as a fan. That's pretty cool. Maybe last question for you. We have at DBT Labs, we have an interesting persona.

41:45

It is, we call them analytics engineers and they are very frequently, they start out as data analysts and they're folks who want to bring in some more technical capabilities into their traditional data analyst practice. And because of that, they can build much more mature pipelines and all the data practices around them. This hybrid persona, I think is a really interesting one when things change in an ecosystem.

42:15

The fact that these folks are close to the business means they understand business value and what's actually going on. But maybe they don't need to be so technical. Anyway, let's talk about AI and agents. Do you think that most AI workflows are going to be created by software engineers? The founders you're talking to, software engineers?

42:36

Or because the capabilities of AI are so powerful in the realm of writing code, are they actually going to be created by less technical folks who are closer to the business? Again, this is a little bit more of a hypothesis. I don't know. I would guess based on my experience building it, that you need a really good core framework. That has to be built by the engineers.

43:02

And the whole point of the framework should really be around a good agent should be one that the more you use it, the better it gets. And so I think in the future, when you say who's building the tastest, I don't think it's the engineers. It's the users who are going to ask me, I do something. And then when AI doesn't do it the right way, it'll give it guidance. And they will remember how to do it that way and keep you getting better.

43:23

So the engineers will be building essentially how the brain works itself, but just like how you and I have gotten better at things. The AI is going to get better by working for somebody and getting feedback from that person. So the interface by which a given kind of self-improvement loop goes through is going to be important because that self-improvement loop will likely need to be interacting with somebody who's less technical just needs to get something done. I think so.

43:50

I think it's a reasonable goal to build something that can learn. And honestly, if I can give guidance on how to do a task once and it will always do it that way from that on, that's extremely powerful. I don't need it to know how to do it before I explain it. This has been a fascinating conversation. Thank you so much for joining me. The question that we use to wrap every episode is what do you hope is true about the data or in this case the AI space in five years? This is a hope question.

44:20

You don't have to know that this is going to be true. I would hope that we would, that I'll see more use cases of AI with the goal of helping people better understand each other. This one's less like monetizable so it's less of an obvious like, hey, I want to find the company and invest in it. But I think the opportunity is so immense that I would love to see. I would love to look back five years and go, oh, look, like I'm so glad we're using the AI this way.

44:48

And now people can better understand why they disagree with somebody else. This is like creating empathy between people who disagree. Huh. I love it. Is this a thing that you've experimented with or written or spoken about? I actually did a TED talk on it. That was actually a big part of my TED talk was that idea that yeah, I can help us better understand ourselves and each other. All right. Both through the usage and building of it. We'll link to them, the show notes.

45:17

Yo, hey, thank you so much for joining us. This has been a lot of fun. Thanks for having me. The Analytics Engineering podcast is sponsored by DBT Labs. I'm your host, Tristan Handy. Email us at podcastatdbtlabs.com with comments and guest suggestions. Our producers are Jeff Fox and Dan Poppy. If you enjoyed the show, drop us a review or share with a friend. Thanks for listening.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Episode description

Transcript

The rapid experimentation of AI agents (w/ Yohei Nakajima)

Episode description

Transcript ✨

Transcript