¶ Intro / Opening
Recording is starting. Hello to everyone in the future. Cool. So I want to welcome everyone to our first ever live recording of the Supra Insider podcast that I co-host with Mark. Mark runs a product leader community called Supra. And I'm a member of Mark's community. That's how we got to know each other. And almost two years ago now, we started our podcast past like 80 episodes.
And it's been amazing. Lots of fun to run. And Jacob has been on the podcast before. And when we thought about diving into a new podcast. topic with Jacob around hiring AI agents, we said, let's use this as an opportunity to test something we've been wanting to do for a while, which is live podcasts. So this should be fun. And I hope everyone enjoys.
The conversation, it's going to be different than maybe some usual presentations that you've seen on Lightning Lessons on Maven, but we aim to learn and enjoy. and just have a good conversation. And hopefully you all walk away a lot smarter on how you could leverage AI agents by hiring them. And it's kind of an interesting framework. I'm excited to dive into it with Jacob and Mark. Yeah, maybe anything else you want to kind of...
Share about what you're doing right now in your current role, Jacob, as kind of like an intro before we start diving in. My day job is that I'm the founder of Relay.app, which is a platform for building AI workflows and agents. But don't worry, this is not an ad for Relay.app. Much of my work is actually teaching classes.
on ai agents this includes specific techniques like how to create voice agents or how to use mcp or how to use knowledge bases it includes teaching classes on particular use cases like how to build a user research agent or how to build a meeting follow-up agent and so i I spent a lot of time teaching content, working with customers on AI agents. And so I'm hoping to share a snippet of the most valuable PM use cases I've seen here. And Jacob, I'm curious, how much time do you spend?
on the teaching side versus you actually building workflows and tinkering yourself? How do you think about that time distribution? A lot of both. I'm the number two active user on our platform. The only person ahead of me is our head of product. So I spend, I would say, at least three hours a day, every day, building workflows for myself and then another probably two or three hours with customers. And then in the other 50% of my time, I'm teaching. Love it. Yeah. So.
You know where all the bodies are buried with the AI workflows. I've failed with many and succeeded with many. Cool. Well, we're excited to dive in together. The agenda of what we want to talk about today, we kind of wanted to break this down into a few sections. So first, why we think, and I think Jacob's really going to... have a chance to get on his pedestal and uh on his i'm sorry his podium uh jacob don't get on your pedestal jacob and get on your podium and tell everyone why
In the future, people are going to have teams of AI agents. What's the world we're heading into? I think that's going to be really important. Why you're going to need to get really good at figuring out the job descriptions for these AI agents, almost as like writing a job description for a human that you're going to hire is going to be a really important skill to have in building these workforces. And ultimately...
If people are trying to finish the session, walking away with some very concrete ideas. for what kind of AI agents they might want to hire, what some of the maybe low hanging fruit or the default standards are that you're seeing most helpful for product people these days.
and that should hopefully give people a good starting point for what they can do next. So at this point, I'm going to stop sharing my screen, and we're going to shift into conversation mode. So yeah, let's start with... I think, Jacob, the thing I'm most excited about kicking off with is you've got, I think you had a post go viral around these org charts of AI agents that you have working for you at the moment.
It was a lot. How many AI agents do you have? And then how did you decide to have an org chart of agents? And why do you think that's important to have? So right now, our team is 10 people total, 10 human employees. Everyone on the team is a product builder. We have six engineers, two designers, a product manager, and myself. My background is also in product.
¶ Why this session focuses on "hiring AI agents like employees"
And we have well over 300 AI agents working for us. We have no full-time employees or no employees of any sort in marketing, sales, customer support. customer success, user research, competitive research, data analysis, finance, operations, HR. All of those functions are staffed by AI agents that are either managed by me in most cases or managed by another member of the team.
And I know, you know, we've all heard that 2025 is the year of the AI agent and everyone was using the word agent to describe everything in their product, whether it was a chatbot or a copilot or a browser tool. And so I think we're We might have crested the max hype cycle of AI agents and are now down in the valley. But what I want to share with people, if you take nothing away from this call, it's that it's real.
I promise it's real. I promise it's real. Because if we were building an old style company, we would need to have at least two times as many people, maybe three times as many people. And for me, there was a huge... shift in mental model that I had about nine months ago that unlocked all of this potential to understand what an AI agent actually is, as opposed to other AI tools that we use.
how to think about what jobs are appropriate for an AI agent, and then how to actually train and build the AI agent to do that job. And I'm not saying every single AI agent has i've built has worked but a lot of them have and um it's really amazing and one of the other things i wanted to note here is that i think product managers in particular as a function are way behind in AI agents, way behind. I'm seeing marketers and salespeople and support people adopt AI agents way faster.
I don't know exactly why that is because I think product managers have been very fast at adopting other AI tools like chatbots or co-pilots like every product manager i know is using gamma for making slide decks or lovable or v zero or figma make or magic patterns for making prototypes and everyone's using catchy bt and claude and projects and many people are using cursor and cloud code so i don't
I don't wanna say that product managers are behind on AI overall, far from it. I think product managers are leading the way in using AI tools, but are behind. in this mental model shift to AI agents. And that's what I'm excited to talk about here, like why I think PMs are behind and then how we can catch up to the functions that are actually embracing AI agents faster than we are. Maybe to make this more concrete, Jacob, do you have maybe an example of like what an...
work chart might look like with AI agents, like maybe from your company, either researcher or marketing, just so we can be like, what is this? Let me just show you, I'll screen share quickly so people who are live can see it. I'll send this out afterwards for people who are just listening. But let me show you a few examples. The one that went most viral about four or five months ago is my marketing org chart.
My marketing org chart at the time had about 40 AI agents. Now it has about 70 AI agents. And it's going to be a little hard to see, but let me just walk you through it from side to side. I've divided our marketing org chart. similarly to how many marketing organizations would divide their org chart by channel. And so we have community, we have education, we have email marketing, we have our template gallery and user-generated content, we have social media marketing.
which I've split out into LinkedIn, Reddit, YouTube, and X. We have an active SEO motion, and I think there's one more on the right. Oh yeah, which is our live webinars, like this one. Each of these cases... I had a moment where I was thinking,
I really wish we were doing more on Reddit. I wish we were doing more marketing on Reddit. Reddit's so important for SEO, for AI engine optimization. It's one of the few places where people actually share their real feelings about products as opposed to like overly curious. fake reviews. I wish I could hire a marketer to help me just with Reddit.
And then I thought about, okay, what would that person do for me? What would a Reddit-focused marketer do? They would scan the relevant set of subreddits for mentions of our product to see if people are saying good or bad things about our product. If they're saying good things, we probably want to upvote. If they're saying bad things, we probably want to respond.
clarify and help the user out. I'd want the Reddit marketer to scan for mentions of other products in our space so they can see threads that are relevant and how people are reacting to our alternatives or our competitors. I'd want them to look in relevant subreddits for our ICP users who are expressing a need, like, ah!
I wish I could automate my podcast notes. And then maybe we want to chime in and say, oh, here's a template of how you can automate your, your podcast notes. And then maybe I'd want to draft. I'd want my Reddit marketer to draft original content on AI explainers. And I thought, oh, well, instead of actually.
immediately reaching to go out and hire and interview a contractor for that. Could I build an AI agent to do all of those things? And you're not going to be able to see it exactly on this chart. But what I ended up building was a set of workflows under this Reddit AI marketing agent that do exactly the four tasks that I mentioned. And it works incredibly well. And so that's kind of the mindset I brought to... Sorry?
Just to interrupt for a second. So like if we think about this Reddit use case, so how many boxes did you have under that Reddit and the org chart? So in the Reddit org chart, I had five boxes. And the way I think about it, and I've kind of... adapted my mental model over time because there's all this confusion about what is a workflow and what is an agent and what are the levels at which you think about the work AI is doing for you. But the basic way I think about it is...
To create an AI agent, you start with a job description. And the job description of an AI agent is not exactly like the job description of a human. For those who have been managers and written job descriptions, you usually have to write stuff like, based in North America and has a Bachelor of Science and has six plus years of C++ experience like that.
A, I don't think those should be in human job descriptions either, really, because those are not really relevant to the work. But B, you definitely don't need those in AI agent job descriptions. AI agent job descriptions are very simple. They include a one phrase, like two or three word title, like Reddit focused marketer or...
In the context of PM, maybe you want to hire a user researcher or maybe you want to hire a data analyst or maybe you want to hire a competitive researcher or maybe you want to hire an executive assistant. So you write a phrase of, I wish I could hire an executive assistant or I wish I could hire a project manager.
Then you're going to write down the responsibilities of that agent that you're hiring. And when you write down a responsibility, the framework that I like to use for these responsibilities is what? And then...
When, how? So for example, every PM that ever worked for my team when I worked at Google was like, hey, can I get an EA? I really wish I had an EA. I'm in meetings all the time. I wish I had an EA. And then I would ask, what would your EA do for you? Why do you think you need an EA? And the...
And even when I was talking to people who were thinking about working with a human employee, I would always say, what, when, how? So for example, the first thing everyone asks for their EA to do is schedule my meetings. when I get a new scheduling request from someone that I meet with, go look at my calendar to find available slots this week, and reply with a polite email.
Sharing a few slots. So what schedule meetings when in response to scheduling requests coming in how look at my calendar and suggest good times another example triage my emails when i get a new incoming email if it's an important use ai to determine whether it's an important one i need to reply to and if so label it appropriately or maybe ping it to me in slack to make sure i don't miss it and so that's
I mean, it sounds so stupidly simple when you put it that way, but it turns out that if you layer on these very simple task-oriented individual responsibilities and you have four or five of them, you end up... with a pretty high functioning EA because you say it's going to schedule my meetings, it's going to triage my emails, it's going to brief me before meetings, it's going to write follow-ups after emails, it's going to remind me about prioritizing my tasks and
Pretty quickly, even though each of those individual workflows is so simple and can be articulated in one what, when, how sentence, you end up with a very powerful teammate like agent that you've created. So if I think about that executive assistant use case that you just mentioned, there's a few different tasks, right? Like one is like the scheduling, one is like the email triage.
So in the org chart analogy, would each of those tasks get their own box or would the EA be a box that does multiple tasks? Yeah. And this is where I've gone back and forth. I originally really recommended people have task-specific agents that were workflows for specific tasks. So an agent in my world of six months ago did not correspond to the human employee, but corresponded to the specific responsibility of
replying to my emails or triaging my emails. Now the way I think about it is you have a top level agent that maps to the human role that exists within a company, and then you put workflows or responsibilities under that agent. And a typical agent might have anywhere between three and 10 responsibilities. Makes sense. Maybe now following almost the analogy of AI agents as human employees, it sounds like you've grown your agent team from like 40 to 70.
And the question that came up to mind for me is like, how do you think about like performance management, if you will? Like, do you, how often do you check not just if the task is done, but if it's actually like reaching the outcome that you want it to be right? Like, for example, like.
You know, the EA is maybe saving you time, right? Or the Reddit one is, you know, driving more traffic to your website. How do you think about managing performance and how often does that happen? And how often are you firing agents that are not? getting the job done how do you think about that part of the job yeah it's a very good point so
The overall number, and maybe I should be more careful about the phrasing here because the overall number has grown from 40 to 70. Those are the task specific workflows and maybe I've grouped them into like eight or nine top level agents. So I'd say. The net number has grown from 40 to 70.
But about half of the original 40 are no longer relevant, either because we stopped focusing in that area. Like I turned down a bunch of my SEO workflows because we're not focused as much on SEO right now. Or in some cases, I found that some. weren't working as well as I anticipated or some have evolved into other use cases. And so I'll give you some very concrete examples of the kinds of maintenance I do once I've created these things.
One of the first agents I created for myself was the executive assistant. One of the first tasks I gave it was to triage my emails and draft replies. I found, personally, that it was excellent at triaging the emails. Every day, the things in the to reply label were exactly the things I needed to look at first. The things that it gave the to read FYI label for and snooze till 6 p.m. were exactly the things I wanted to read at the end of the day.
And so it performed very well at the triaging task. I did not find the auto replies saved me time. By the time I would have to review the draft and edit it to be in my voice and pull in some context that the AI couldn't have possibly known that I knew, the AI drafting didn't save me time. And so that was an example where I tried to push the boundary and have...
the AI agent do as much of the task as it could for me, as much of my email manager could. And then I had to pull back because I found that it wasn't that the fact that it was drafting replies wasn't actually. saving me time it was just adding friction and this is a very very common experience i've had where i try to push to do the complete use case and i realize that it can't go all the way and then i pull one step back another example would be um
I have an agent, and this is useful for not just marketers and founders, but also PMs. I have an agent that listens to the subreddit. Like I mentioned, it listens to subreddits where people are talking about automation. And if it finds a post that I might have something useful to say, I might want to respond. So the first thing I tried was to go all the way end to end. Detect the post. See if it's something I should reply to. Draft.
a reply and then post it automatically and I just couldn't get those replies to meet my quality bar. And certainly, you know, on Reddit, if you've ever interacted on Reddit, the quality bar for commenting on Reddit, it's not like LinkedIn. The quality bar is super high. You will get banned from subreddits very quickly if people suspect it's like low quality AI slop.
And so I decided to just pull back slightly and say that I was doing a great job of monitoring the subreddits. It was doing a great job of filtering down to the conversations that were interesting. It was doing a great job of suggesting a couple bullet points I might want to mention in my reply, but I took it all the way.
to writing the reply. That's a very common scenario for me where I find that it can't get 100% of the way there. It might get 90 or 80 or 70% of the way there. And then I use myself as the human in the loop to complete the task. Got it. One thing you said earlier is it's real, which I think implies that there's a bunch of people, maybe the status quo belief is that this stuff is not real.
I just wanted to hang out on that for a sec. What do you think is the reason for this kind of skepticism that maybe necessitates the statement that this is real? Because no one has to say this is real about, I don't know, an iPhone app. Like, why would they have to say that about agents in end of 2025? I think the biggest reason agents have been a difficult conversation or why there's skepticism about them is because...
There's no shared definition. Everyone has projected their hopes and dreams onto AI agents and people speak about them in very different ways. So for example, OpenAI launched a product called Agent. which took action synchronously in the browser based on a prompt and didn't really work for anything beyond the demo use cases that they showed. So that not only I think gave people a misleading view of what an agent was, it really...
caused a lot of skepticism of what AI agents are capable of doing. And so what I've found to be most helpful is to define AI agents. in relation to the other kinds of AI tools we're using successfully. Because it's really a huge mental model shift. This is another reason why I think people haven't gotten there yet.
I'm going to make sort of a bold statement here, which is, I don't think using ChatGPT is that big of a mental model shift for people. We've always had a website called Google that you go to and you type words into a text box and you get very good answers. from the world and now that text box was super limited and the kinds of queries you can put in and often the answers required you to go click on some blue links and do the research on your own
But we fundamentally, when we have a question, know how to visit a website and type text into a box. And don't get me wrong. ChatGPT is an amazing box. It can do many, many, many things that Google couldn't do. It's expanded our minds into what we can do in a text box, chat-based interface. But we all have a mental model of going to visit a text-based research tool. Similarly, we all have a mental model of going into an IDE.
or going into a prototyping tool or going into a web building tool or going into a slide building tool like before we had gamma we had google slides before we had cursor we had you know visual studio or jet brains or whatever we know what it's like to go sit in a synchronous productivity tool and do something and now don't get me wrong again the ai empowered versions of these are amazing like they are a multiplicative impact on productivity
but they still fit the same mental model. And I think that made them very easy for people to adopt. If you're a PM who is used to using Figma, it's super easy to adopt Figma Make. It's in the same place. You know, you already have a mental model. If I go to...
a tool to make websites or to make mocks or whatever it is, and now I just use natural language-based prompting instead of clicking around with buttons. Agents require a significant mental model leap that I don't think everyone... has had exposure to, which is an AI agent is not, and this is where I think OpenAI did the world such a disservice by calling their thing an agent and why I think other products that are calling their chatbots agents are doing the world such a disservice because...
That's not, to me, going to a place and chatting with no knock on Airtable. They have a little Airtable agent, which like makes changes to your database. That's not an employee. Like having to go to Airtable and talking to this little text box, that is not the mental model of an employee. That's what you'd call a co-pilot, right? That's a co-pilot.
That's a copilot. It's a super useful copilot. It's a natural language experience that helps me understand and manipulate an artifact within a specific tool. But an agent is a very different mental model. An agent is not a place where you go to do work. An AI agent... does work on your behalf while you are sleeping, while you are awake. And so the three main characteristics of an AI agent are it's proactive. It wakes up automatically on its own. It's deeply integrated into your tool.
It actually writes the emails, it reschedules the meetings, it updates the rows in Airtable. And then number three and most important, It does tasks repeatedly over time and gets better at them. One of the areas where I've seen a lot of people fail with agents is they come to me and say, I really want an agent to help me with my product strategy for this quarter. That is not a task you need an agent for.
Go to chat, go to a chat bot, go to chat to BT, like go to chat to BT and have a synchronous conversation about your strategy. Then go to gamma and turn it into a slide deck or lovable and turn it into a prototype. You don't need an agent. You don't need a proactive, integrated, repeatedly improved. assistant to do that. You need a one-off AI powered tool. And so that's where the mental model of building an agent is not at all like a productivity tool.
You cannot think of an agent like a productivity tool. If you think about an agent like a productivity tool, you're going to end up in the old box of I go to a website and I type some stuff in and some stuff comes back on the screen. You really need to think of an AI agent more like... an employee with very limited skills or an employee that can only do things in particular SaaS apps that it has access to. And that's when it kind of unlocks the responsibility. And so for example,
I've had a bunch of PMs. I know a ton of, like I said, PMs are using, I think, prototyping tools very effectively. PMs are using chatbite tool. chatbot tools very effectively. But I bet you every single person who works in a large company is still going to their manager and saying,
I wish we had a UX resource on this project. I wish we had a project management resource on this project. Can we get 20% of an analyst on this project? Ooh, can I have an executive assistant? I just got promoted to senior PM, so I have a lot more meetings now, so I should probably- have an executive assistant and they're still reaching for the old bag of tricks which is begging for headcount begging for headcount begging for headcount and
I just want to say that the new bag of tricks is here. It's ready. You don't need to beg for a user researcher. You can build a very good user researcher on your own. Now, it'll be a different kind of user researcher. The AI agent that you do for user research... What can it do for you? The AI agent for user research will be very good at looking at all the support tickets that have come in and identify usability issues in the product based on the support tickets.
The AI user researcher will be very good at looking at all of the existing customer and sales calls and analyzing the transcripts for patterns and top quotes. It'll be very good at writing a weekly consolidated voice of the customer report that includes both qualitative and quantitative. It is not going to do a live usability test. At least not yet. At least not yet. And so...
There are limitations that you have to understand about what AI agents are capable and what they're not capable of. But for most people I've talked to, when they... PMs, when they've talked to one of these roles, like 70 or 80% of what you want can be accomplished just by an AI agent doing knowledge work in your SaaS tool. So try that first before begging for headcount. Yeah.
I mean, I think another kind of failure state that I see is that you apply the AI agent to the wrong use case. Either it's something that, you know, it doesn't save that much time and you end up spending more time building the agent than actually the...
you know, at the time that actually it saves you, right? Or maybe you're trying to have an agent that it's trying to access a tool that maybe the APIs are just not really well designed or they're very protective of the data and then you just give up. So I think that's something that I've... Here are a couple of super common failure, failure modes I've seen. One case is it's a one off task. You don't need an agent at all. Just go to tattoo BD. Another use case is like.
¶ What tasks
Yeah, it happens repeatedly, but the juice is not worth the squeeze. Your quarterly planning process just doesn't happen that often. It's so complex to build an agent for it that it's not worth it. Then the third is some people have, AI agents are not magical. And so, for example, I was working with, this is not as relevant to PM, but I think the analogy will hold. I was talking to a customer who was,
They sell to local restaurants and they wanted to find the personal cell phone of the manager of a given local restaurant. And I was like, well, can you do that? Like, it's not on the website. Like the website has.
the phone number of the restaurant not the manager's personal cell phone i'm like if you searching on the public web cannot find the manager's cell phone number like there's no way the ai agent is going to be able to do that either like it doesn't magically have access to every phone number in the world so that's another failure mode where people where people think like
Oh, I failed to do it. Therefore an agent might succeed. That's rarely the case. Usually you got to prove that you can do it first before an agent will be able to do it. And then another super common failure mode. And this is the most subtle one.
And this, I think Mark, we were talking about this use case yesterday, or Ben, we were talking about this use case yesterday where you were like, I have 800 connection requests in LinkedIn. I need to figure out like who I want to accept and who I don't want to access.
And it's a perfect use case for an AI agent. It's proactive. The AI agent can wake up whenever you get a new connection request. It's integrated. They can do the action directly in LinkedIn. It requires their intelligence and repetition to get better over time. Wah, wah. LinkedIn does not allow.
any automation of connection requests and DMs. And so you could build an AI agent that impersonates your credentials and does it for you, but you may get your account shut down, which is probably not worth it. And so... That last one is super subtle, but typically only more advanced agent builders run into that. Yeah. And what I was going to say is like one of the things that I.
Even though I know AI is not magic, some of the demos that I've seen around these agents are that they can take over your browser and they can just like do stuff on your browser on your behalf. So it's like to the point of like, you know. This is definitely something I'm curious about. I want to get smarter on this if you can school me on it for a sec. Why can I not use agents on my local browser?
While I step away, why can't I go walk my dog for 30 minutes and have an agent go through my connection request at like lightning fast speed? And then just compile a report for me at the end about basically summarizing who is trying to connect with me on LinkedIn by going into my already logged in LinkedIn browser in Chrome. Like, why is that not possible? Yeah. So that particular experience.
I think is possible today. Browser use, like the quality of AI models that are using browsers and using computers, that is a rapidly developing sub area within AI. It's way better than it was a year ago. A year from now, it'll be even way better than it is now.
I think the simple task of load this exact tab on LinkedIn and then either press the checkmark button or the X button, depending on some criteria, that is totally technologically feasible. That's just a cat and mouse game because it's against the LinkedIn terms of service. What if I didn't want it to?
What if I didn't want it to take any action to accept or deny, but I wanted it to just basically go into the profile of every one of those people and then match their profile relative to some criteria that I provided? Again, this is an area where we're going to have to see how LinkedIn evolves their terms.
of service over time. Right now, they have said it is against our terms of service to let an automation using your credentials scrape or interact with any data, both read-only and read-write. And so that is a thing that's like totally technologically feasible and by the way you could probably even like program your agent to do things at the pace a human would and avoid detection but that is explicitly not allowed
by, by, by the platform. If they find out, man, I basically, I just hate, I hate this flight booking demo and the open table reservation demo so much. I hate these demos so much. They're like, so the like They're so the wrong use case. They're low value. They're infrequent. They're really hard to get right. They're very subtle. Like I don't want to see a, I don't, I never want to see another open table browser use demo again, but I think a more realistic and very valuable.
demo that I have not seen browser use models succeed at so far is let's say I want to prepare for a sales call. And to do that, I want the AI to go to the right record within Salesforce. I wanted to look at all the related records, like the company record and the deal record, and I might want it to go search on LinkedIn for that person, and then I wanted to look at my past emails. I have not found...
browser use models, controlling the mouse and the keyboard to be nearly good enough at that task yet. That may change a year from now, but it turns out we have these magical things called APIs that the AI agent can use to directly talk to the data. And so that.
is a use case that is trivial to build with an API based AI workflow agent. Go look, go talk to Salesforce, go talk to Gmail, go talk to my meeting recording system, write a briefing, but not good for the browser yet. And so my general policy is If an API exists, use it. Definitely use it. Like APIs are built for computers to talk to things. If an API exists, use it. If an API doesn't exist and the task is quote, very simple.
try a browser automation if the task is not quote very simple it's probably not going to work but i i'm updating my perspective every day because i used to even just say like if there's no api don't even bother and now i have seen you know browser use models be able to do very simple tasks navigate to this thing put in my username and password
pull down the first three articles like that level of thing is totally doable now nuanced search pagination going back and forth between multiple keywords or tags like that i haven't seen succeed yet but i think we're like we're we're months away not years away from that working so it's a moving target yeah i think the best use case for browser agents are using it for qa and having basically like every time i push code to production or to beta
maybe use the Playwright MCP to test that feature, take screenshots. That's actually a great use case that works really well. But yeah, all the other ones you've described are a bit harder. And that's actually a very good use case for an AI agent because it's...
proactive, integrated, and repeatable. Every time I push a new feature to staging, automatically wake up the browser actor tell it to press these 11 buttons that are related to this feature and also 12 buttons that are not related to this feature and write for me a report in a google doc of what went well and what didn't with screenshots that's i don't know if that's 100 possible today because the io
between all those systems. It'll kind of depend on the details, but that is conceptually a great fit for an AI agent. I was thinking maybe, Jacob, to make this more concrete, I think what would be really helpful for people, like maybe let's pick one agent. Do you think any... every PM should have. Let's go through the process of writing the job description. Maybe we feel brave building it out live using Relay. And then just see how...
See where that goes. Yeah, perfect. And if you want to just screen share or share your, you know, your screen as you're running that. Awesome. Okay, so this is what I mentioned before about how to write a job description. Super simple. Three word name and then four bullet points of the responsibility. And each responsibility should be in the what, when, how.
Framework and the reason I like the what when how is because when we go to translate these into automated AI workflows later It's gonna be very natural to take the what when how and turn that into a trigger and then subsequent AI steps and actions So here's an example
Responsibility. This is not an agent, but an example of responsibility. What? Write meeting follow-ups. When a meeting ends. How? Use AI to review the transcript and send a message in Slack. This is less relevant to PM, so I'll skip it. Another example of responsibility. What? Analyze YouTube videos. Whenever a competitor puts out a YouTube video is the when.
and then the how is analyze it and send a summary to slack and then you can build i i expanded it from five to six i can send out this deck afterwards and then you can build those into ai agents so the one i wanted to show was a competitive analyst this is something that always kind of falls by the wayside because of course you don't want to be overly obsessed with your competitors you want to focus on your users and your product
But it sure is nice to know what's going on with them. And it can be pretty time consuming to track. So here are the four things that I have my competitive analysts do. Number one. The what is it tracks my competitors pricing. The when is every month. I don't think they change prices more than once a month. So every month it checks their websites. It pulls up their pricing page and analyzes their pricing changes. This has given me a bunch of insights in how people are making their free tier.
more generous or left generous or raising the price of a basic tier or collapsing other tiers. Second, every time your competitor launches, you wanna know something about it. See, what was the idea behind that launch? How good is the feature? Do you wanna go play around with it yourself? every week, look at their major launches from their website and summarize them. Number three, this is maybe the most important one of all of these is I wanna get actual quotes.
of what people are saying about my competitors product not what they're saying in their marketing materials but what the users of their product are saying because those are either things i can learn from to do better to match what my competitors are good at or if they're things they're not doing well that i can emphasize in my marketing So every week, go to wherever the most meaningful place.
where competitive reviews are accessible. It could be G2, could be Reddit, could be some other forum, Trustpilot, you know, whatever it might be. And then synthesize the review. I use Reddit for this mostly because Reddit is super, people are really active on N8N and Zapier in Reddit.
And then number four, this kind of relevant to us, maybe it's relevant to other PMs too, but because we're a horizontal product, it can be really hard for us to define a super crisp ICP because we actually have many different kinds of customers. target customers, different functions at different size and shape businesses. And so I'm always looking for like, what are the pockets?
of horizontal usage where our customers are succeeding most. So for example, I now have one of the responsibilities of my competitive analysts is every month, look at the competitor's websites. see what reference customers they're showing or case studies they're showing or what quotes they're including from their customers. And then...
And then send me a report of like, oh, I think N8N is really focusing on marketing teams right now because here's this case study they just did with the blah, blah, blah marketing team. So this is an example of, I think every single PM needs a competitive analyst. These might not be the exact four responsibilities that resonate with you.
but I think they're pretty general. But these are the kinds of things you can have an AI agent take off of your plate. How do you decide where you want to receive the outputs from these? For me, it's pretty much always... The asset that it creates is a Google Doc or Google Sheet, depending on whether it's more of a text-based report or a structured report. And then depending on the content, I usually get it sent to myself over email or to a shared Slack channel.
with the rest of the team. So if it's just for me personally, I usually have it delivered to my personal email inbox just because that's my, I use my email inbox as my to-do list. But if it's relevant to everyone, like competitor pricing, I post to our competitor's channel. Actually, everything here, we have a competitor. a competitive research channel on Slack. Everything here goes to our competitive research channel on Slack. But for example,
My executive assistant that's working mostly with me is mostly interacting with me in a private Slack channel that I have with just my executive assistant or sending me emails, depending on the context. Like I don't want to get my meeting follow-ups over Slack.
I want this to be drafts directly in my Gmail so that I can just one click, go into Gmail and hit send. And so whenever possible, if the AI agent is taking action on my behalf, whenever possible, I want it to create a draft or do an action directly in the underlying tool and then send. me some confirmation. And if the AI agent is aggregating on information and sharing it to me, it's pretty much always like Google doc summary in Slack, like my super common patterns, make it a Google doc.
Also make like a friendly five bullet Slack summary and then send me a Slack message with it. Yeah. So it's almost like adapting to the way you work and where work happens. And I think that also probably makes it just easier for you to use and that adoption curve. Yeah. I saw actually a really.
i saw a really good question in the chat can i answer this one quickly because this is really important which is a lot of times ai agents need to keep track of state like in the case jacob can you repeat the question Oh yeah, the question here is from Nils. How do you tell the agent to compare the previous version to the current version? And this is an example where you can really leverage the power of AI agents that they can have state.
and memory. They can remember their past work. And so in most cases I use a spreadsheet. for the memory of my AI agent. And so the spreadsheet basically has columns that represent the previous versions of the pricing. And that each time I ask my AI agent to look at the pricing page, I say, compare this to the previous version and only alert me if there's been a material change since the last version. Makes sense. And I'm really curious, Jacob, to maybe like.
go deep into one of the responsibilities of the Let's say the competitive analysis agent, you could choose which one. For example, I'm curious, how specific do you get? Do you give your agent, be like, these are my competitors? Or do you just say, hey, you work for RelayApp, you have...
to figure out what my competitors are. You also have to keep track of if there's a new competitor, you need to let me know. That's a judgment call. I've seen people do it both ways. For me, if I don't think something is changing that often...
i don't make my ai do extra work extra work adds extra latency extra cost extra unreliability so i've got my list of seven competitors i just tell the ai to look up these seven and if a new competitor comes up i'll update those workflows accordingly so i basically have I have a table in Relay that captures my competitors that I can easily update over time. But my general rule is I've had the best results when I keep individual responsibilities very focused.
and simple like the more you try to chain together unreliable actions the more the more your overall performance will suffer so i've had by far the best results with a lot of very simple responsibilities The intern analogy, I guess, applies there too, because a human intern would have no problem taking seven companies, going to their page and looking at stuff and reporting back on it. But if that intern also had to figure out...
if a new competitor even pops onto the scene, that sounds like a more complex task too, right? Because they have to define what a competitor means. Exactly. So let me agree and disagree with you there. The thing I'm going to disagree with is that... AI is not smart enough to be more than an intern. I actually think in many cases, these AI models have superhuman intelligence in certain areas. For example, I've built an AI sales coach.
I don't know anything about sales. I've never done sales before in my life. And I have my AI sales coach review all my transcripts and all my emails and give me pointers on how to do better. And that's not something you'd ask an intern to do. That's something where I'm clearly deferring to the superior wisdom and expertise of the AI.
The point I wildly agree with you on is the more complex the task, the harder it is to give good instructions. And so even if you're hiring a senior marketer on your team and you say, find all the competitors and look at the stuff. If they're new to your company,
they're probably going to come up with a bunch of people that you don't really want to track. Like in our case, they'll come up with OpenAI and Anthropic and Google and Glean and everyone who's ever put agent in the name of any of their marketing materials. When in reality, it's a much narrower...
set of products that I know to track because I've already looked through all those other products and I've decided they're not worth looking at all the time. And so I think one of the reasons I like these simple tasks is because it makes it way easier to give good instructions. in the same way that like when you're a manager your primary responsibility becomes
How well do you articulate to your report the context they need to do their jobs and what success looks like? In the same way, those are your primary responsibilities for your AI agent. And so I just like to make my job easier. by making it like simpler to communicate the context and what success looks like.
I'm seeing another good point from Niels in the chat. You could totally add another agent that every month looks for new competitors, brings them to me and say, hey, new competitor Bricks just launched. Do you want me to add them to the competitor?
set for the future, then I would go look at their website and say like, oh yeah, they're relevant enough that they're worth adding or thanks for letting me know. Let's keep an eye on them for future, but I don't want to add them to every single competitive analysis right now. Sorry, Mark, just one quick follow-up and then go ahead. But Jacob, do agents in Relay have the ability to take your feedback to them?
in the context of their job and then go and modify another agent's workflow like does the one whose job it is to surface new competitors have the ability to go and basically tell the agent that's looking at that set of competitors you've already given it to add an eighth competitor to that list? Yes. And the way they do that is they write to the same knowledge base.
And so it's not explicitly like agent one gives agent two new instructions, but agent two, the one that looks up the LinkedIn posts, for example, has a database of which competitors and their LinkedIn profiles.
and agent one can add a new competitor to that database. And so it's sort of... yeah it's implicit collaboration because they work on the same artifacts and knowledge base not explicit collaboration like hey change this thing about the way you're doing your job got it so basically in the competitive competitive analysis example if there's like a spreadsheet and then there's
There's like a row for every competitor and then a column for every snapshot in time when it runs the job. Then you can have the agent whose job is to monitor new competitors, add an eighth row, like an eighth. And then the other agent will just naturally run with just like, and if there's a row here, go do these jobs. Exactly. That's a super common pattern. Super common pattern. Yeah, there was a question in the chat that I also really liked. That's like, how do you think about...
agents improving over time and getting smarter and reacting to that feedback. Is it... Is it kind of a combination of the models getting better and then the agents get better and also you updating the instructions and the scope of the task? How does that work?
I don't think any product, including relay.app, has really figured this out very well yet. Right now, there's two ways an agent can get better over time. One is... you have some underlying knowledge base or instructions that you modify over time like the example we talked about of adding a new competitor but the second is you manually go and edit the prompt like this is super common because like
Every time Relay sends me a report in Slack, it has a link to the underlying workflow. And I'm like, oh man, that report was way too verbose. So I go, I follow the link. I go to the copilot within relay.app. And I said, Hey, you gave me a report. That's way too verbose. Can you make it less verbose in future? And then it'll go into the correct AI step. It'll go into the correct prompt and it'll add something like, hey, make this no more than five bullet points.
what i really want to be able to do and what i'll be able to do soon hopefully is just like reply directly to that slack message and be like too long like make it shorter and then that will kick off the cycle of it introspecting on the prompt that generated that output and then refining the prompt to make it
But that's like a super cutting edge problem. I don't think any product in our space does that really well yet. Got it. By the way, there's another question in the chat that I want to make sure we cover before we go because I know we dove deep into this competitive analyst use case. What are the what are the other kind of like super common?
Use cases that you're seeing for agents in the PM space right now, because obviously PMs aren't generally going to build like an HR. Let me just let me just rattle off the 6pm agents that I think everyone everyone needs. I'll send this out in a deck later if you're interested.
¶ The six agents every PM should "hire" today
So number one, competitive analysts I already showed. Number two, user researcher. Synthesize support tickets, summarize community posts. draft voice of the user reports. Number three, data analyst, write weekly metrics reports, analyze key metrics changes, research benchmarks, identify trending use cases. Number four, social media marketer, identify posts with mentions of you or your product.
Summarize sentiment and key quotes. Find opportunities to engage with your users and suggest original content to post to promote your product. Number five, project manager. Send regular summaries of open tasks. Notify individuals about overdue tasks.
burn down charts and progress reports, and then send out executive summary emails. And then number six, the executive assistant, triage your emails, draft your meeting briefings and follow-ups, send meeting reminders, and capture your personal tasks that remind you about them. I think those are the six that apply to basically every single
Every single PM. Any agents that you should not have? Like the opposite of like, what are some things that you think like PM should never delegate to AI agents? Because that's just what core to the job and you should never delegate that.
Yeah, I don't think it's so much because it's core to the job, because I think we should be using AI for even things that are core to the job to help us. But yeah, don't do it for the one-offs. Don't do it for real. Like, don't ask your AI agent to take your client out to lunch, obviously. Like, here are a couple of things that people...
think would work, but don't work. AI is not good at list building. Like if you ask it for, give me a list of all the companies that have our series B with marketing leads, AI is lazy. AI will say like, here's 10.
And you're like, no, no, no, give me all of them. And it'll say, here's 12. And you say, no, give me all of them. Then it'll say, no, here's 14. So I have not found AI to succeed in list building use cases. The way you do list building in practice is you have some larger data set that you've purchased.
from somewhere or aggregate from somewhere. And then AI is very good at filtering down an existing list. Like given this set of 10,000 companies, figure out which ones meet this criteria. So list building is one where people fail a lot. And then functional imagery.
Imagery that combines text and and pictures like nano banana is better than any model has been before Like I can do some like basic webinar posters with it But anyone who's saying you can just use AI like out of the box to fully generate like a production quality mock or or ad i don't think i don't think i think you need a specialized tool for that and a bit of human in the loop but in general like
I hope that when we look back at this recording a year from now, I'll sound stupid saying those things. Because those... Yeah, but there will still be an element of human in the loop a year from now, though, right? Like, it's hard for me to imagine personally... I think human in the loop will...
Yeah, human in the loop will always exist. Like human in the loop will always exist for high stakes tasks, for tasks that AI isn't perfect at. I don't think human in the loop is ever going to go away. I hope when we have this conversation a year from now. Those things where I said don't even bother with AI yet, those won't be true. You'll still, like AI will still give you a useful starting point and then you might need a human in the loop to refine it further.
Yeah. There's actually one more question that I want to start. We are going to do kind of like a little bit more Q&A for those here. If you had a drop, you'll get the recording. Mark Jacob and I are going to, I just realized Mark Jacobs. That's funny. Mark, Jacob, and I are going to stick around a little bit longer and do Q&A. But before that one question, Jacob, from the chat that I wanted to make sure we cover, okay? The question was that there are some deterministic...
automations that meet the requirements that you outlined, which is proactive, integrated, repeated. I think at least that's what the question is asking. How do you decide when quote unquote, like deterministic traditional automation is the right solution for one of these versus an AI powered agent or an AI agent? Yeah. So for me, there's a spectrum.
between deterministic automations and AI workflows. All of these count in my definition of an employee-like experience because even if you have the simplest Zapier automation in the world that takes in a type form and sticks it in your CRM, if that Zap wasn't doing it, you'd have to hire a person to do it because it's proactive, it's integrated, and it's repeated.
I actually don't think you need to be that intelligent to fit this mental model of like human quality work that happens as you sleep. I think there's a spectrum. of how much intelligence a task requires there can be deterministic workflows that require a lot of intelligence like it could look very similar simple on the front which is like pdf comes in analyze pdf
write summary, but there's a lot of intelligence that goes into writing that summary. So that's one dimension. And the other dimension is how much autonomy do you want to give the AI? over how it goes about the task. This is the classic workflow versus agentic debate. In a truly agentic experience, you would tell the AI, your goal is to look at Reddit and monitor my competitors.
the tools you have available to you are you can search reddit and write google docs go figure it out and then a workflow a more deterministic workflow would say here are the seven competitors for each competitor run this search on reddit aggregate the results of all those searches, then write a report following this format. And in my personal experience with thousands of these workflows and agents,
I recommend being more deterministic whenever possible. If you know the steps that are required to produce a good output, Why would you make the AI invent those every time from first principles? You're adding risk, you're adding latency, you're adding cost, you're adding unpredictability.
There are cases where you do need that open-ended autonomy because you don't know how to do the task. Typically open-ended research. I don't know how many Google searches I'm going to need to do to figure out what Supra is. Like that would be very hard to encode as a deterministic workflow. So in open-ended research, often it's valuable to use like.
truly agentic tool calling in a loop. But again, in almost all cases, if you can represent it as a deterministic automation, please do. You're going to have way more reliable and better results. Yeah, it seems like the 80-20 is just using deterministic automations or workflows. No, it's strictly dominant. It's strictly dominant. It's faster, better, cheaper. Now, it can sometimes be harder to articulate.
And so there's the trade-off in creation time there. But again, like if you know, every time a lead comes in, I want to look up their LinkedIn profile. I want to look at their website. I want to run a, I want to analyze those according to some criteria, like. Don't make the AI invent that every time. Like, AI is smart. It'll probably invent it right most of the time. But like, it's just totally unnecessary cost, latency, and risk. Yeah.
It's almost like, yeah, you probably want to use it if you've never done the task for it. Like if you're a PM, you're learning about marketing, but it actually might be better of you to find someone who works at marketing. Yeah. And what I would say there is.
use AI to help you craft a good workflow. But then once the good workflow is crafted and you know the best practice steps, just run those steps. And that's not to say the workflow is not intelligent. There will be intelligence in many places within the workflow. But that intelligence is prescribed for specific tasks that require it, not for reinventing the whole flow control of the process.
Yeah, which in a way is kind of similar to like, you know, management principles, right? Like you're probably going to be a better manager if you've done the job before and you know what great looks like, right? And maybe the only... The only exception is when you're managing senior execs or you're the CEO and you're hiring people that are better than you. It's exactly like the concept of task-relevant maturity in management and levels of delegation.
if you're hiring someone who've never done a task before you tell them what the goal is and then you tell them exactly how to do it step by step and you tell them to check in with you at every step That's like level one delegation. Then there's like level four delegation, which is like, here's the goal. Here's a couple of steps. You figure out the rest. And then there's level 10 delegation, which is like, figure out the goal and what to do about it.
Especially as you're just starting with AI agents and just getting comfortable with them. I would stay in like level one, level two delegation mode. Give very precise instructions. Tell it exactly what good looks like. Give it feedback at every step. And then...
you loosen the guardrails over time as it gets better and better at the task. Or as you get better and better at giving instructions about the task. Another question I want to make sure we hit, but I think it just comes out very often is like, you know, let's say you work at a very big company. and where you have a lot of IT restrictions of what tools can you use and also what data can you access and those tools can access.
How do you maximize the value of these workflows, AI tools, et cetera, when everything is just so tightly controlled and so locked down? I'll give a facetious answer and then my real answer. My facetious answer is that...
companies who are overly restrictive about what AI tools they're willing to adopt and give data to are digging their own graves. It's just going to be a massive competitive liability. If all of your competitors are using AI to make their sales calls better and you won't give AI access to your CRM.
like you're going to lose. So that would be like the high level argument I would make to the CIO or whoever, which is just like the risk reward is out of proportion here. You got to use AI tools. I know for many ICPMs, like they can't go to their CIO and say, I should be able to give AI. any AI tool I want access to my email. And so what I practically recommend in that case is I start with AI agents that can make use of public data. Competitive analyst, user researcher.
Anyone can look at Reddit. Anyone can look at LinkedIn. Anyone can do Google searches. Anyone can look at G2. And so I recommend proving out value with public data and then one by one getting access to internal data sources from least sensitive to most sensitive. Cool. There are more questions we'll cover. But first, we want to let folks know a little bit more about how they can follow us and support us and go deeper if they found any of this interesting. So I'll start by...
encouraging folks to go and follow us. If you want to see what we're up to and thinking about on LinkedIn, our handles, Jacob Bank, Ben Ares, Mark Baselga, very simple for all of you, just first, last names. Go check those out if you're interested. And then Jacob, you want to maybe kind of do a little plug. This is definitely a part where it's okay to be promotional and tell people what they should know and whether they should go check out Relay.
Yeah, so if you do want to build AI agents, Related App is the tool that I work on. If you've heard of N8N or Zapier or Lindy or Gumloop, we're in the same category. I claim that Relay is the easiest tool for non-technical users to... get started with building AI agents, but you try it out and let me know if you agree. And you also have a course. Yeah, I'm doing my first Maven class. Actually, the first...
The first cohort just sold out for December way faster than I expected. I think I priced it too low, but I haven't changed the price yet. So the next cohort is coming up in January. It's going to be a three-week curriculum that's going to walk you through.
principles of designing your AI agent org chart, techniques for building AI agents and getting really into the weeds of like how to set up the trigger, how to set up the action, how to set up the AI step, how to test, how to iterate, how to write your prompts better over time. And I hope it'll be really valuable. So yeah, Maven's running this.
right now. So check it out. And I think that's gonna be relevant for Ben's course too, which you're about to talk about. Yeah, but before I talk about my course, I want to plug something that Mark and I have been working on together called Insider Loops.
These are guides that we're putting together by talking to product managers who are calibrated interviewers, hiring managers, but also recently successful candidates at top companies. The first four that we put together are Stripe, Figma, Uber, and DoorDash.
And you can buy our guides by going to insiderloops.com. And you'll see the ability to just preview the table of contents for each of the guides to see what you're interested in. And if you don't feel like it's worth the purchase, we have a very generous... fund policy get your money back within 14 days so we've just tried to de-risk this and the way to think about these is if the company took
The same amount of care with equipping candidates with everything they need to know to be successful in their interviews from the outside, these artifacts are what they would end up providing. Mark, anything else you want to say about insider loops?
No, I mean, I think I'm a little bit biased, but I think it's probably the most up to date and has probably in the ground intel of what's happening in these interviews and in these companies. We talk to candidates and hiring managers on a weekly basis. This is really up to date and super recent. I don't think there's anything like it out there. I'm a big fan. We are biased, but we've created the resource we wish we had if we were going through the interviews right now with these companies.
And then Mark, do you want to maybe talk a little bit about the podcast? Yeah. So as Ben mentioned in the beginning, we both met through Supra and kind of in one of our happy hours, we decided that it would be fun to do this podcast together. Two years later, we're... Getting close to 100 episodes. We publish a podcast once a week. We try to talk to either people who are in...
the product leadership sit or founders like Jacob or just experts that we just think that they have interesting things to say. And so if you enjoyed this conversation, that's basically what we do every week. Go to just search anywhere, Super Insider, subscribe. And yeah, you'll get it every week. Yeah, this conversation is a good sampling, I think, of what you can expect. It's basically the same exact vibe that we go on to the podcast.
This is the part where I'll plug my course quickly. So I recently overhauled my Maven course and made it self-paced because I realized a lot of people's interview timelines did not... cleanly mapped to my live cohort timelines and my live cohort schedules. So now if you're getting ready for a meta style product sense or analytical thinking.
interview or both like at companies like OpenAI, Stripe, DoorDash, obviously Meta, you can go and get access to over 30 hours of recorded content from my course. So it keeps you on a happy path to know exactly what to do.
to do in the right order. So do one thing, one, two, three, four, you're not kind of left to figure it out on your own. You do get a one-on-one access to me if you enroll in the course, not for many hours, but I will make sure that you at least are not confused about any of the concepts. And you also get access to my AI copilot, which if you go to benares.com forward slash copilot, that $500 copilot is bundled in the price of the course.
And my course is also part of this promotion that Maven is running right now, so you can get 25% off if you enroll the next week. So, yeah. There are cohort dates listed here, December 1st to 19th, but once you enroll, you get access to the content and you could use it anytime in the future. So you get lifetime access. And Mark, you want to also just tell them about Supra a little bit more? Yeah. So Supra is my main thing.
uh private community for product leaders so if you're a super ic you know principal level pm or a product leader with a team of pms group pm director bp and Basically, when you join the community, we match with a group of eight other product leaders. And this is kind of a peer group that can help you navigate the most thorny topics that you're dealing with.
and kind of get an unbiased perspective on how to go about it or just talk to people that have solved that problem before. And in addition to these small group sessions that we do, we also do in-person events, we do speaker series, and we have a very active Slack community. I cannot speak highly enough.
of the supra community that mark has built and we would not have met if it wasn't for it so thanks for everything you're doing with that mark and with that said i'm going to stop screen sharing now and we'll take our attention to the chat if folks have questions they'd like for us to tackle
And there was a question around getting the presentation. Jacob, you can send me any links you want me to include in the follow-up email and everyone will. Yeah, I'll have the presentation ready to go by end of day so we can send it out with the follow-up. Amazing.
And this will be recorded, so folks will be able to get access to this. You'll get notified over email, and if any of your friends or colleagues would benefit from what we just talked about, you'll be able to link them to this, and they can just put their email in and watch the recording live, too. Cool. What questions do people have? Because I've got some. Jacob, I think that the thing we didn't show yet, which...
I wanted to spend more time on because we ran out of time when we're doing our like pre recording chat is that human language builder. So the belief that you have and that you've baked into your product around how if someone wants to build one of these agents.
they can start with a text box and literally watch it come to life right in front of them, which I think is a really cool part of your product. Can you just say a little bit more about the philosophy guiding that workflow? Yeah, basically the philosophy guiding this is, can you see my screen again? Yes. I want people to be able to go from job description, role, and then responsibilities. And then if they write a good responsibility with a what, when, how.
I want them to be able to paste that into Relay and get a reliably working workflow immediately. And so let me show you live and see if that works. So I'm going to pull up Relay. Demo. And what I'm going to do is I'm just going to copy this first one. When I receive an email, analyze whether I need to reply, label accordingly. And I'm going to paste this into our natural language copilot that is meant for building.
AI workflows. What it's going to do is going to try to figure out based on my one sentence description, what the trigger is, what AI steps are needed. what flow control, if any, is needed, and then what actions are needed. In this case, it's going to trigger when a new email is received, then it's going to use AI to see if it requires a reply.
then if a reply is needed it'll add appropriate label and if no reply is needed it'll add an appropriate label so you can see here that it created a trigger automatically when every email is received
Okay, so it says, I've added steps. Next, I want to label the email as no reply needed, reply needed. Note, these labels don't currently exist in your Gmail. Do you want me to create these labels? And I'm going to say, use... to reply if reply is needed no label if no reply needed yeah for those listening by the way what's going on and there's like a copilot on the left
And then on the main screen, you can see like something happened automatically. You can see like, you know, the trigger and the action now is.
I gave it a little clarification saying actually if it's the no reply needed case I don't need any label so only apply the label in the reply needed case and if it is the reply needed case the specific label I want to use is called to reply and you can see that the AI Copilot has now made this change to the workflow where the AI is going to spit out reply needed or no reply needed.
then do you see this um path this is a branching conditional and so if no reply is needed it's going to do nothing if the reply is needed it's going to automatically add the label to reply and so I don't think it's realistic to go from one sentence prompt
to a completely articulated workflow, because again, the one sentence prop didn't ha it didn't have that nugget of information that my label is specifically called to reply. But I do think that one sentence prompt plus some back and forth refinement with the copilot. can get you and so that's the goal like
First, give you the mental model of I'm hiring an AI teammate. Then give you the techniques you need to write the right job description. Then when you've written the right responsibilities, you paste those in and the AI should help you take it from there to turn it into like a fully functioning AI workflow. Once you've built all four of the AI workflows for each responsibility, then you have your complete AI agent.
Yeah, that was super fast, though. It was really, really cool. That was super fast. That thing has been a journey to build. Anyone who says building AI co-pilots is easy is misleading you. I mean, you took the hardest parts of building like visual workflow configurators and chat interfaces and copilots and just in integration. Yeah, I mean, the counterintuitive thing is that building a copilot to create like a domain specific workflow language. is way harder than building a copilot to create.
like websites. There's like a million example websites out there. Like JavaScript is well, well known, well defined. Like there's good documentation to get something that creates a workflow in our domain specific language is not easy. There's. There's an additional question here from Greg that says, hey, can you recommend some best practices for quality management for these agents or workflows that you're sharing?
Yeah, so a couple of best practices I can maybe show you in this example. Yeah. A couple of best practices are, sorry, the Zoom window is kind of annoying, blocking my screen. What I always recommend doing. is a lot of testing. Do your own sort of mini eval. And so the way you can do this is we have this experience where you can start a test run.
And that's going to find, oh, hey, I have some new Maven signups. Maybe some of you. What you can do is you can look at a past email and then make sure the AI is functioning appropriately. So let me start a run for this recent Maven email. And it says it analyzed it if a reply was needed, no reply was needed. And so what I typically do is I'll sanity check it on five or six emails for an easy task like this one.
That's really all it takes, like a quick sanity check with some testing. For more difficult use cases, I'll typically have a human in the loop approval step for like the first 10 or 15 or 20 times that it runs. I'll modify the prompt each time. I'll do my own sort of mini, mini live reinforcement learning. And then eventually I'll take myself out of the loop and go from there.
For me, because again, I recommend using really simple workflows. That has been sufficient. But what we're working on is adding like more advanced. evals into the product so that you can feel confident that your AI agents won't need that sort of manual reinforcement learning. Got it. There was another question around the amount of AI credits that get used for agents.
I also have been wondering about that. Just like we think about hiring people, we think about the cost of hiring people. So can you maybe just comment a little bit, Jacob, on how you think about almost like budgeting for hiring agents? This is going to sound self-serving, but I promise I mean this genuinely. If you're building an AI agent for something valuable, ignore the cost.
If it's something valuable that's saving you many hours of time per week or saving you from hiring a multi-thousand dollar employee that you can't afford to hire, It doesn't really matter if your token costs are 41 cents or 49 cents. I had a customer email me a couple weeks ago and they were like, hey, our competitive analysis report, it's taking 30 AI credits a week. That sounds like a lot.
and i was like well or 300 ad credits i'm like well 300 ad credits is about 30 cents it wrote you a 15 page report if anyone at your company finds that valuable at all it's like the greatest bargain in the history of the world intelligence is so cheap he's like oh yeah it's silly to optimize beyond 30 cents when an agency would have charged me multiple thousand dollars for that and so
In general, for me, it's binary. If I'm asking the agent to do something valuable, like it's saving me significant time or I would have had to hire a person to do it. I don't worry about cost at all because my, because like.
These things cost like a dollar or $2 a month. It's not like thousands of dollars a month. And so it's just not even worth the time to optimize it. And if you find yourself quibbling over AI credits, the juice is not worth the squeeze almost by definition. So it's almost like the canary in the coal mine.
if you're worried about how much it's costing you, it's probably not that valuable. It means it's not valuable enough. Exactly. Yeah. Although I'll say that some people are kind of what you were saying before, right? It's like you have to shift the paradigm, right? Of like what something costs as well.
This is a product design mistake that we have made that we now need to unmake because we represent usage in our product as steps in AI credits. If you show me a step in AI credit counter and tell me that's what I'm going to be charged based on. of course I'm going to try to make those numbers go up slower. What we need to do instead is change the packaging to say, let's say you're a PM and you pay Relay 100 bucks a month and you get these six AI agents.
And what we should do is make sure that we have a package that's like, yup. You're going to get these six AI agents for a hundred bucks a month. And yeah, if you do something totally abusive or you put in a thousand competitors, maybe you'll have to pay an overage fee, but we want to get to a point where a PM can say, yes, this is my hundred dollar a month package.
It's clearly justified because it's giving me these six roles that would each cost thousands of dollars to hire for. And so I'm not going to worry about the individual little AI credits because I've already prepaid for plenty. That's the model we want to move towards. What's your most expensive agent, Jacob? I mean, just curious, ballpark. Yeah, I can show you actually. Or I guess your most expensive AI employee is what I'm going to start. Yeah, so you can see a lot cheaper.
Customer success and marketing are pretty neck and neck, like using a couple hundred thousand steps a month and a couple hundred thousand AI credits. Just for reference, this is like hundreds of dollars a month because we use our product super intensely. And I think the most expensive single one is our customer health score. Because what we do is every week we go through every single customer in our corpus.
And we compute a health score that's based on their usage metrics, their recent engagement, the depth of their workflows, et cetera. And so you can see that computing that once a week costs us like a couple hundred bucks. Best couple hundred dollars a month we ever spent because anytime a support ticket comes in we know how healthy the customer is we have proactive Like proactive
interventions to schedule time or send them resources if their health scores are low. So like we get much, much more than $200 a month of enterprise value for that one. But that's one expensive because we have quite a lot of customers now and to go through all those customers every week. But again,
Imagine you were asking a human to do that, like to go through every customer every week and calculate, like look manually at each of the workflows and see how well they were doing. It'd be crazy. You could never do it. Yeah. There's also a couple of questions here around one around like.
What are other ways you can give feedback to agents? Like, for example, like in Relay, do you have a way to like maybe just like, you know, give thumbs up and thumbs down? And if not, like, yeah, like, why is that? Or yeah, like, how do you? Yeah, like I said, we haven't really cracked this yet. We haven't cracked it yet. I think the. right answer.
is to move as close as possible to the mental model we have with human employees, which is like, hey, Mark, if you and I are working together and you send me a PRD, I would write a comment at the top of that PRD saying, hey, this section wasn't quite good enough or this section wasn't quite good enough or I love this section.
If you send me a Slack message that says, I just did this competitive analysis, how does it? I'd say, ah, can you focus more on, you know, Lindy's voice agents? Can you focus more on like Gumloops MCP server? That's the model we want to get to where because the AI agent.
is making docs, writing Slack messages, sending you emails, that you can just respond to those like you would to a person and then it'll adapt and get better the next time. We haven't cracked it yet though. I don't think just the thumbs up, thumbs down is going to do it in this case. Yeah. It's too big, in my opinion. Especially for the thumbs down use case. The other question that we have... Imagine a junior PM...
on your team sent you a PRD and you just said thumbs down. That wouldn't be like so helpful to help them improve at PRD writing. Yeah, I don't think they're going to get better very fast with that type of feedback. And also they'll quit the company, but luckily the AI agents won't quit the company. Yeah, they'll be like, what's wrong with this person? How did I get so lucky with my manager?
There's another question here on like, do you see AI agents drift or hallucinate like LLMs do? And I'm sure... That probably goes back to what we were talking in the beginning of like how specific you get with the steps and the more entropy you add to the process, the more likely there is to hallucinate. But yeah, how do you think about that? Yeah, exactly. I don't really think of it as hallucination.
The more vague you are with the AI agent about what tools it should call and when, you may find it calling tools way too much or not enough, like looking at way too many websites or not enough websites, and you solve that with more precise instructions. I personally have not had a big problem with LLM hallucination in over a year. Whenever an LLM hallucinates, for me, it's because I've asked them something unknowable.
And so as long as I get, you know, in my case, I'll typically say, given the transcript of this meeting, was pricing mentioned? And so the LLM is not going to hallucinate because I've given it the transcript of the meeting and I've asked it to tell me if it knows whether pricing was mentioned. And so for me, hallucination.
is actually not a model problem. It typically means you haven't given the right combination of context and instructions for the AI to know what to do if it doesn't have the right information. Yeah. But for example, let's go back to the agent that you just built with the co-pilot. Like the, does this email require a reply, right? Like if you click, yeah, can we click in a second in that like GPT 5.1 prompt?
Yeah, that one. So, for example, like that one, like I could see how it could like, you know, quote unquote, hallucinate or drift. by maybe picking an answer that's not correct. And it can be wrong. It can be wrong. Like, I could think I need to reply to something I don't need to reply to. So here's how I would modify this. Right now it's just spitting out a text reply. What I actually want
is I want two things. I want to say, please output two things. One, reply needed or no reply needed. Rationale. for your decision. This is a technique I use all the time. And you can see that as I write the prompt, the output it's going to give me back changes. And so what I do when I'm testing is whenever the AI gets it wrong. I look at its rationale. And so for example, this prompt is going to tell me I need to reply to a bunch of cold sales emails. I know because I've tried before. And so...
When I get a cold sales email, it's going to say, hey, this looks like Mark is asking you a direct question to reply to. And then when I would go back to the prompt and I'd say like, make sure to look out for cold sales email patterns like XYZ. And then it'll know next time to take that into account. And so this is the process I go through where I use the rationale of why the agent made a decision to make it better for next time.
And this is kind of clunky, right? I have to come back into the prompt. I have to type this stuff. What I really want to be able to do is just like in line within Gmail, say like, nope, no reply needed. And here's why. Yeah, but that's really smart. because you almost kind of get like a window into like the model's brain because otherwise you have no idea why it's like reply needed or no reply needed and so
you're just like, you're just like chasing in the dark, trying to update your prompt. Whereas very often the rationale will have some very, again, it's like, it's not because the AI is dumb. It's because you haven't given the AI clear enough instructions. The AI will say, hey, well.
It looked like it asked Jacob to reply to the email. So based on my first read, I think he should reply. They told me they would save me $5,000 a month. Yeah, exactly. Why wouldn't Jacob want that? They said they only have room for three more clients this quarter and I should reply right away.
Yeah. But, you know, I was just trying to think of like, how would I explain to someone what actually requires a reply? And I think a piece of it is, is there a question in the content that requires a response, right? So there's like a content-based thing. And then there's a thing that's more about the asker, right? And it's like, if this person who I've never met who's trying to sell me software is asking me a question, label that salesperson. But if there's someone who...
I literally have a calendar, like lots of calendar meetings with labeled at a collaborator or something. Right. And so do you, do you recommend that people almost like create this like knowledge? graph or something that allows them to tell the AI agents to look at important people and only say reply needed if it needs a response and it's from important people?
Only when necessary. So let's say, for example, I add this bit of the prompt and it's still not working. It's not looking for Colt's email. Let's say I want to give this AI. the ability to look at my previous emails to see if I've corresponded with this person before. I'm going to add a tool and I'm going to tell the AI you can find previous emails. So I'm going to say If you're not sure whether it's a cold email, check to see if I've ever exchanged emails with this person before.
And so now I've given the AI the additional context it needs via this tool. It can go look up emails. And so, again, if necessary, you can give it access to your past emails. You can give it access to calendar events. You can...
create a running knowledge. You could give it access to your customer database. So it can look up domains where you have a customer. Like you could make this arbitrarily more intelligent and context rich as you need to. In my experience for this particular use case, a simple text. space prompt was good enough and so i decided to keep it simple but if it was a very and for me also another thing that is at play here is like
how high stakes is it when the AI gets it wrong? If the AI labels a cold sales email occasionally as to reply and then it's wrong, like it's not the end of the world. It's still way better than the hundred cold sales emails I had yesterday. And so I'm always to the point about like, is the juice worth this?
I'm always evaluating like, how does it get to a good enough point where it's not, it's the same as when you have a human employee. Like, you know, we've all had junior coworkers and they're on their fourth draft of something. And you're like, ah.
I know the fifth draft will be better, but the fourth draft is good enough. We're just going to roll with the fourth draft. You have to use the same mentality with your AI agents where it's like at some point it's good enough and I need to go on to other priorities. Yep. Cool. Well, I know we are, man, hour and a half in. Yeah, we're a little bit over time. Okay.
Sorry, I can keep nerding out about this stuff for another hour or two, and we'll have to do this again because I think this space is evolving. It's amazing to be able to sit down with you, Jacob, who is like... Three hours a day on these tools is probably more than any of us are reasonably ever going to spend. If you had a day job, how could you have time for that? Right. So being able to get your learnings about where the state of the technology is.
technology is today and be able to do this again maybe in like three months or something and be able to kind of i'm sure there's gonna be a bunch of stuff that just works it'll be totally different totally different yep so thank you so much for that and mark i'm happy we're able to i feel like this was a really fun
first live recording of the podcast so i'm excited to publish this yeah thanks everyone for joining i hope it was really valuable thanks for having me ben and mark for sure i'm gonna stop the recording that is a wrap If you enjoyed this conversation, please share it with someone you think would benefit from it as well. We really appreciate it. We'd also love a follow or a rating on Substack.
Spotify, or YouTube. That's going to let other people find us. And if you have any topic recommendations for a future episode, please send myself or Mark a DM on LinkedIn. We'd love to hear from you. Thanks.
