OpenClaw: Why the Internet Isn't Built for AI Agents - podcast episode cover

OpenClaw: Why the Internet Isn't Built for AI Agents

Mar 19, 202647 minEp. 90
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

The discussion centers on OpenClaw, an open-source personal AI assistant that showcases both the immense potential and significant security challenges of AI agents. The hosts delve into issues like the difficulty of integration with existing services, the risks of agents requesting broad access (e.g., domain-wide email), and how current internet infrastructure is not built for AI. They also consider the need for new business models and security paradigms, including fine-grained access controls and agent-specific interfaces, to unlock mass adoption and manage risk effectively.

Episode description

Yoko Li, Guido Appenzeller, and Joel de la Garza discuss OpenClaw, the open source personal AI assistant that's forcing a rethink of how identity, permissions, and security work on the internet. They cover why setting up Gmail integration took seven hours, what happens when an agent asks for domain-wide access to every email in your company, and why consumer websites like DoorDash and Amazon have no incentive to make their services agent-friendly.

 

Resources:

Follow Yoko Li on X:  https://twitter.com/stuffyokodraws

Follow Guido Appenzeller on X:  https://twitter.com/appenz

Follow Joel de la Garza on LinkedIn:  https://www.linkedin.com/in/3448827723723234/

Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

 

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript

AI Agents: Capabilities and Risks

As a developer, I can totally build this, but I'm not gonna build all the long tail integration. Just the fact that we're gonna go through this exercise of fundamentally rethinking what the product experience is for this stuff is just incredibly exciting. And now it's just sort of natural language expression of what you want and the machine fulfills. My curiosity becomes, what is the feeling? Mm-hmm. Will the big

and offer their functionality for agents? Or do we actually need new companies that cater to agents specifically? Security is always a game of defense in depth and you're sort of when you hit captcha and you hit the front end bot detection stuff, that's like the the tip of the spear. There's this concept in defense called like the redoubt, like Back to the wall inside. And I think what we're gonna see for a lot of these perimeter controls because of agents. What's super

fascinating to me. So this is one of the first time we're having technology. Well what it can do is not limited by its abilities, but limited by h how I can make it secure and and and stop it from doing certain things. We have this uh this genie in a bottle and it's amazing, but how do I how contain this? OpenClaw is an open source personal AI assistant. Message on your behalf, check your calendar, manage your email, and extend itself by writing new integrations on the fly.

Setting up Gmail integration takes seven hours. The agent will ask for domain wide access to every email account. company. Consumer websites like DoorDash and Amazon have no APIs for agents. And if you're not careful, you can create something that can be socially engineered into access it was never supposed to have. This is a technology where the limiting factor isn't capability. The genie is in the bottle. The question is how to keep it there.

Introducing OpenClaw: An AI Assistant

Hello everyone. So we're here today to talk about OpenClaw, which is currently one of the hottest, most controversial, most interesting, most dangerous, I think, technologies here in Silicon Valley. Yoko, you want to kick it off? What is OpenClaw? What is OpenCloud? So OpenClaw is this very cool personal assistant that's open source built on top of another very cool coding agent called Pi. I think it the repos name was Pi Mono. It's a very just like minimal but very extensible coding agent.

that can run the loop, update its own config. An open cloud that's built on top, built around all the sessions, state management for Pi, but also added a long tail of integrations. So you can now Talk to your personal assistant on WhatsApp, Telegram, like a phone number, iMessage, and everything else you can think of, use one password. Not yet able to place the order on DoorDash. We'll chat more about that later.

But the whole ecosystem is really booming what we can use long running agent in the sandbox for. So we all built some interesting use cases. One of our first use cases I've explored is how can I have open claw consistently check my cat's location. via the AirTag API. Since for AirTags you the the location is only updated once you are active on the user session on the browser. So that has been useful. So curious what you guys filled with it recently.

CISO's Perspective: Agents Reading the World

As a former CISO, you must just love. Well and and uh currently acting C SO Acting C C so Never mind. Current system. I I I've been using it for a while now. I think it's incredibly awesome because it lets you see the contours of the future. This is the first time where we can see Like what these agents are going to do. And the firm is built around Mark's famous sort of software is eating the world at piece.

And this is the first time where you can see these agents reading the world. Like it gives them like true agency in a world to do things. And so of course the first couple of use cases I did were very security focused. I really enjoyed trying to just getting things to work, as you guys know, an experience like is not simple.

I think part of the reason why as a CISO I'm not super concerned yet about people here using it because only a very few hand a smaller handful of people can get this thing working, I think, than typical other tools. It's so hard. That's a feature here.

Exactly. People are like asking us, What's homebrew? How do I get it on my computer? You're like, Okay, we're good for now. But you can see as these things become more consumery, become easier to use, like these things are gonna take off. This is gonna be an incredible wave and And building these tools has been incredibly fun. So Wait, I'm curious. I mean, normal people like us, we use it to check our cat's location, check calendar, take notes. What are the security use cases? So

Hacking Techniques and Out-of-Control Agents

And it varies by model. So the models all have very different capabilities. And so the first thing I started doing was giving it impossible tasks. So I need you to do this thing, but you only have access to these two tools. And some of the other models would kind of give up and say, sorry, it doesn't work or do something like that. Or they'd try to like write some code or do something kind of interesting. But like some of the more advanced models actually started using like hacking techniques.

Or they'd be like, hey, I found an AWS key on your device and maybe I'll try it. Right. And so those were kind of the first sets of use cases was basically let's get it running, let's add some basic tools and tasks. And then let's start asking it to do impossible things and see where it goes. And you could very quickly see how these things would get out of control in a really interesting, but also very sophisticated way.

Gmail Integration and Domain-Wide Dangers

The security aspect of Oprah, I just went completely crazy, right? So I connected mine to Gmail, which took me, I want to say about seven hours or so. It's unbelievably hard still, right? And it's like figuring out the account setup, figuring out the education models, getting all the polling right and so on, lots of debugging steps. meanwhile telegram works out of the box

Here we go. But the most interesting thing actually during the process was that when I basically asked it how do we set this up and it started coding and started implementing things, the first time it didn't quite work, the second try it did work, and it was at some point it was like, okay, now I need an authentication. Authentication token, right? And gave me instructions how to set it up and basically said, look, create a service account and then give me this token with a domain-wide scope.

And you were like, wait a second, domain-wide score, what does this exactly mean? So what is it what was it was suggesting is it I should give him a token, not for its own email account, right? I mean it's usually the way how you run open clause that you try to segregate it very well from everything else or own

On email account or Apple account, on Apple account, or credit card if you want to give it a crack or debit card if you want to give it a debit card. We saw one of our startups actually putting it on a separate desk, which I found just super funny. It's like absolutely separation, a separate hardware gap, right? It's okay. But even desk gap, right? It says one more.

Yeah, so but basically what it was suggesting to me is to give it a token that would give it full access to every single email account in the entire company. Right? Which is crazy. And then with read write permissions were everything to do. Exactly. Exactly. But the other thing is that actually would've worked, right? Give me all the permissions, enable me to do it. Exactly.

The New Era of Social Engineering

And so basically understanding this and then reading up on it and understanding I mean, also Google security model on email I think is absolutely horrible, right? Right. For a service account right now, we can only give um domain-wide access. You don't want that. What you instead want is a software specific, then you need to go via O often things get complicated, right? But it's going through through all of this, right, I think it really, really shows how

If you're not very, very careful, right, you can create something which can extend itself, can be socially engineered. I think it's a new thing. We've never had it before that you have a complex software system which you can actually influence with social engineering, right? Subject to influence. Exactly. And and it's very, very easy for even a somewhat sophisticated user to set this up in a way that can do a massive amount of damage.

Why OpenClaw's Accessibility is Key

One prompt for the group is we've seen this pattern of putting an agent, long running agent in a sandbox for a long time now, since I would say six months to a year ago. So why did Open Claw take off? And then what's so special about it? Here's about your view. So I found it relatively easy to set up and get going. And I think that there was enough documentation and support that I didn't have to spend seven hours to just do the telegram use case and start playing with it.

Um and then it led to other use cases and then eventually I got blocked'cause I didn't have seven hours to spend figuring out how to provision accounts properly. And so I just think it's sort of that like just that level of accessibility to users who are maybe not living in a code based day to day. Yeah. Whereas like I know you guys probably spend a lot more time in code than I do.

And I'm probably the world's worst coder, but I was this was accessible to me. So reasonably technical, understand core principles. I do have homebrew on my laptop so I can get stuff working. But you know, the other agent frameworks were pretty difficult to use, incredibly flaky. didn't really want to spend a lot of time debugging someone else's stuff. So I think that was a big part of it.

It is another major part of this that it can extend itself. But I think it's the first agent I've seen where I can say You know, I want an integration with something and that's well, I've never seen this before, there's no package for that. But let me try to put something together. It fires up a coding assistant and tries to extend itself. Right. I think that's new.

There is definitely long running nature of it. Like you leave it running for a night and you're like keep working on this until you finish. I mean Cursor could do this too, but that I think the difference is that they expose the visibility for the end user that you can keep checking with it from your phone or on the dashboard.

you hopefully securely expose that how many tokens it's generating, like how fast it's completing the task. So the visibility part is interesting. Another interesting part is the more presumer consumer integrations. Like if I as a developer I can totally build this, but I'm not gonna build all the long tail integrations. Like I'm not gonna

hook it up to Gmail or one password. I don't want to touch the one password C L I to kind of give it to S N M C P or skill. So M C P layer is also very critical there.

Practical Agent Use Cases and Limits

It is interesting what people are using it for. I mean, Guido was talking about one use case where you were trying to hook up your 3D printer. Yes. It actually doesn't work yet, but I think we get to work over the weekend. I think we're we're trying to figure out the boundaries. Like we can now connect

Because it can extend itself, which is a really new property. It can you can hook much more complex systems to it, right? If there's some documentation somewhere on the web or some API, it can probably figure something out. Which part which integrations are useful, which are not, right? Actually that's a good prompt. What uh integrations do y'all actually use day to day on op U I uh honestly, right now I'm still in the experimentation phase. I don't use the day to day.

I don't let it run unsupervised. It doesn't run overnight. I am there watching this thing. I don't There's a couple of use cases I've I've explored'cause I I really want to just set it free on the Mike Mini and then not monitor it for a long time. The first integration was actually I was so we have a portfolio company called Quiver. They do SVG generation.

So I got very curious. I'm like, what if I just give the API to OpenCloud and have it run overnight to generate some gaming assets for me? Uh, and then only generate to a certain style and then it can use LM to QA it. Um, so what I did is I give OpenClaw a Millify Doc on Kerver. Um, I don't wanna explain how it works. I'm like, build the thing. First build curver MCP. test it with open code and cursor to make sure that you have an instance that actually works with the MCP.

And then once it works, uh, generate a hundred gaming assets for me. So I'm building a game on the side. Uh, you know, SVG happens to be a great composable um layer of it. It actually did that and sent me a huge uh zip. In the morning and I open like there are some assets that are just not great, but like there's like sixty percent of it that's very usable. Yeah. That's awesome.

And then I'm like, well, these are the simple tasks I wouldn't want to do it myself, but like because you have something so long running and resumable, you could do it easily in a box. That makes a lot of sense. I mean so the so I I I I'm still using it very little, frankly, right? It it's not part of my daily routine. There's a few cases which I like. One is if you have an email and you want to look something up related to that email, right? Um it's really nice. So you know, somebody

uh sends me, you know, like say, Guido, can we meet at XYZ? So I can just forward and say like, can you figure out what will be the driving times to this at this time when the meeting is suggested, right? And something comes back. Or even nicer, you can do something like, um

You know, like like like let's say, you know, we w want to meet at some cafe and you ask, you know, where is it? And I c you can just be like, you know, Claw, can you just, you know, uh you know, attach a a a map link to it or something like that. So this So so I think for me, you know, once we got this a little more secure, I think email is gonna be the first killer use case. Being able to say like look, look through my email, delete all the spam, everything.

All the meetings for my conference, uh, you know, next week, uh just put them in my calendar or double check that they're there and make sure there's no conflicts or yes, tell me which conflicts they are. Right. So so going through these things, right, that that is super powerful. I did get an email from Guido's Open Cloud yesterday.

the funny thing is the open claw asked me do you want to order boba if you want to order boba tea go pink griddle he'll place your hair We're still working on the automation. You're getting more work from. It's it's it's the opposite of what you want from automation. But ordering stuff is still hard.

Oh, it's so hard. We so before this podcast we actually try to see if we can order fills in real time and get it delivered. It turns out Uber Eats and DoorDash, if you don't already have an account for open claw. Um, there's some bot detection. Sometimes that ordering experience just fails, even if you give it like a guest checkout link.

Unlocking Mass Adoption: Identity and Security

Uh, which led me um to my next prompt for the group. Like what do you think will unlock the next uh wave of adoption for open law? What is missing? A binary you double click install and get it running, right? Like I think it's I think there's sort of the for for for the sort of home. Isn't it usually exclusive of self extending? Yeah. Well, no, but I mean just to get people up and running.

Like I think I think the current installation path, I know they exist, but I think like a slickly packaged software bundle of this stuff that maybe m I'd say maybe my dad could download and install. W will you in that case just make it a service? Yeah, you could make it a survey. Claw as a service. Probably. Well that would then that would solve a lot of the security problems, right? If you could contain it.

I think you need to turn to a SaaS service for people. I think you need to change the security model and i'm actually not quite sure how Actually, it was. Right. account management paradigm. Like we both had to spend hours setting up all the accounts just for OpenClaw. That's if OpenClaw is a person. Yeah. Right? There's no agent concept. Right, exactly. Yes, exactly. that look like? I mean, Joel, you're the expert on like Okta and the world when I came, you know, to the SaaS world years ago.

I think so like right now so security is always a laggard, just it it's always reactive Um, as OpenClaw itself is demonstrating, it's never front of mind. And so like you've gotta start thinking through what is I I mean, to your point, like what does identity mean in this world? And I think You have this constellation of identities that have to interplay with each other. So you have the constellation of the user that's ad that's orchestrating the open claw. You have the

Identities of all the services that it has access to. And then you have the identities of the agents that launch themselves. And I think you end up in this world. And this is where I'm actually quite hopeful about like a lot of security problems getting solved. You have this world in which I mean, think of how hard it's been for us to get just normal users to use two factor authentication. Coming from Ubico, right? It was just like Sorry about it.

Like I have this thing that prevents cancer and people are still like, No, cancer's not that bad. Like it's like literally like because people are This more or less, you know, it you take sip phishing attacks to zero and audio deploys it. Yeah, yeah.

Like the the threshold of tolerance for stuff for people is incredibly low. Just humans in general is incredibly low when it comes to stuff like that. These agents don't care, right? And so I think it's the opportunity where we could probably start to put in

Things that would annoy a human and a human would never do, these agents will probably do. So you can start to look at maybe there's legitimate uses of I know I'm gonna say PKI and probably get left out of the room, but like maybe PKI founds an application in this world. Well, yeah. The agents deal with it. It's not exposed to the humans, right? Like

Things like that start to make a lot more sense, right? You can get people to start effectively using vaulting. You can get away from passwords that need to be memorable. You can get to this point where identities can step up and step down in their authorization scope and frameworks.

And you come into a world where all the things that we've always been saying from first principles are the things you need to do have been blocked by humans' lack of desire to suffer through them, gets alleviated, right? So like I I think maybe we can fix a lot of stuff.

Authorization Limits and Business Model Impacts

So by the authentication, by an identity problem. We'll be huge issue. I think there's two more. There's a question of sort of authorization limits and monitoring, right? then there's one of business models for for some of the current websites. So so let's start with the with the authorization. So so really what I'd like to have is not

Giving giving the agent access to all of my mail because that creates a huge blast radius, right? If this thing gets gets compromised, right now everything I've ever said. Right. So instead, like for example, how about this thing can only access my inbox, right? That will be useful. I know. Or only access emails in my inbox labeled something.

Or all that, right, exactly. Right. Um see and and right now Google has zero fine grain access controls for there's absolutely nothing. It's it's you know, until until last year you couldn't even in drive have fine grade access controls for at a folder level, right? You've got an access token for all of Drive, right? Which is ridiculous to some degree. For drive now we've got service accounts that you can share, where it can share directories. So if you need something at

Probably even much more fine-grained than that, you know, for for email. And then we want the next thing on Amazon. What are my spent limits? What can it buy and so on, right? So I mean so so th I mean th there's a huge, huge huge infrastructure

And the way this always works with security is that the first thing that goes is a proxy. Yep. And so you know that there's going to be some sort of proxy and some sort of broker for that access. Yeah. And and at some point, what always ends up happening is the service provider themselves might add some of those features. But there might be a long enough tail there that you do get a a proxying infrastructure for agents to access these.

So two observations. One is I think there's a huge opportunity for startups here to create these proxies, right? If somebody would give me like here's you know a scope Gmail, I would adopt that today, right? Um but the second one is I think that that's my the last of my three points. I think it's a business one, right? Because

There are websites today where the majority of the revenue and certainly the majority of profits come from cross-selling. If this website is suddenly only used by agents, that doesn't work anymore, right? If if they're they're they're basically going out of business.

So uh today Amazon doesn't have an API, at least for consumers, right? The DoorDash doesn't have an API, right? All of these large consumer sites are like, No, no, we don't want this. I want to be the what was it, double dash it or something? You know, like why don't you also buy XYZ? You know, here's some recommendations, right? They don't want agents essentially. So I think one interesting question here is Will the big incumbents catch up and offer their functionality for agents?

Or do we actually need new companies that cater to agents specifically? And and then you may say Gridus is crazy, right? Why would not Amazon also be the number one agent vendors? Let's let's look at search for agents. Right? You would be like, well, of course, Google is the number one search, so they're gonna be the number one search with agents.

That's absolutely not the case today, right? I don't think they have an agent search project anymore. We haven't said EXA and Brave and a bunch of other companies. Um doing this. So so do we actually need to replace some of the the big sort of SAS building blocks? Of e-commerce, of online services on and and redo them for agents?

Redesigning the Internet for Agents

What are the areas where we think there's a agent specific service that need to be built yesterday? Exactly. Or or I mean wh why does Google not have an Asian search? Maybe it's just innovator's dilemma. I don't know, right? But it's kind of... It sounds like it, yeah. You have your business model is so much tied to a uh you know, in in a particular way to your service that you can't make the jump to two. I think some of it may have been this sort of head fake around the browser use.

Like there was mid there was sort of a belief that well, these things will just use browsers and so they can navigate the web like a human. And they can to some extent today, but I don't think the whole website environment is friendly to bots. There are some vendors recently I've come across that turned off bot detection because of Yeah, that makes us. Which makes use uh total sense. But then it also opens up the doors for abusers and kind of agents. Thanks for having me.

What does that look like? I mean today if I go to like DoorDash, sometimes they'll ask, are you a bot? Like as a human? And you have to solve very complex puzzles. I ran into this and when I was trying to create a net new uh logging for my open claw on GitHub. I had to solve six puzzles. That's really hard. Yeah, the drag and drop ones, right?

and drop one and I'm like this is actually the next level now. But then what does it look like if I today open up OpenClaw and just like go get five accounts without human intervention? And here's one credential I can give you. What does that look like? And then what if I just don't have to spend hours trying to get it into, you know, all these accounts?

Yeah. I mean, I think for a lot of these companies, to Guido's point about the business model, um, they're gonna have to refigure kind of how that stack works and they're gonna have to move Security is always a game of defense and depth and you're sort of when you hit capture and you hit the front end bot detection stuff, that's like the the tip of the spear. You're kind of just hitting that layer. Um

You're gonna have to there's this concept in defensive s in def in in defense called like the redoubt, like you retreat back to the wall inside. And I think what we're gonna see for a lot of these perimeter controls because of agents

is that they have to move to more of the backend systems. And you have to build a more sophisticated understanding of the way your business operates. So you can spot things like you're going to want bots to register. You're going to want bots to sign up and agents to sign up. What you have to do is protect the things inside the system where there could be issues of abuse or exploitation or fraud and stuff, right?

Instead of bot detection, what I don't know, though that should have is a bots are welcome banner, right? If you are bot click here. User API. Just like here's the API and and uh you know, please sign up as a bot and and when you sign up as a bot maybe state who your you know, who your master is or something like that. Yeah, yeah. Register them. Give us their PIs.

One example of this, which is like a read-only use case. So Milify actually does it really well. If it's a coding agent access the website, it will prompt the coding agent to have a L M dot text instead of viewing the web like'cause it's just my Slower to have founding boxes or Right. And you want a compact text uh blob to send back to the agent. I mean, that's a read only use case. So I do wonder what, you know, write use cases will look like on the web for the agent.

It's not some I mean, it could be API, but the agent still needs to account for identity an API, so on and so forth. Um it could be something between CLI and the API, yeah. 玩世上被API It could be an API, it's just you need a to issue a token first. So to issue a token you need an account. To get an account you need a human. And I don't want to be in the loop. L let's say I give my bot an email address or or a telegraph. Uh a telegram or whatever it is, right? That's some some kind of account.

You could say, look, you're you hey hello bot, you need to register with some kind of account. Yes. That UI where GitHub will ask you, are you a bot? Solve these puzzles. No, no, what what I mean is Front page, you know, bot's welcome, click here, right? Or you know, and then there's like here's the bot API, here's the register bot function, right? And then here you you once you have a token, then here's all the the following functions. That that would make sense.

The Evolving Human-Agent Interface

Fawn UI does remind me of something else which is like the automation UI. UI has evolved so much with OpenCloud. It used to be, I remember using these RPA tools maybe a couple of years ago. It was a lot of drag and drop. I connect the dots from this UI box to another UI box. Now it's so much of like describing that outcome and ask the bot to keep spinning until you get this right, to kinda leverage uh test time compute to the max maximum. And I don't care how much token I'm like spinning out.

So uh my curiosity becomes, what does the future of this UI layer look like? How do you interact with your RPA tools, personal assistant? Is it a prompt? Is it yeah, something else? I mean this is that's the truly exciting part. So I am I'm you know, CISOs in general you should never take product advice from. Like we are we are the worst product thinkers you've ever met.

But like just the fact that we're gonna go through this exercise of fundamentally rethinking what the product experience is for this stuff is just incredibly exciting, right? Like it's It's it's it's these moments where you see like the the transition between, you know, ways of thinking about the world and going from sort of that.

That RPA drag and drop, right? Remember pseudocode, right? And then drag and drop and all these sorts of things. And now it's just sort of natural language expression of what you want and the machine fulfills it. Which just drives a completely different user experience, right? And a user interface just disappears. Um so yeah, I mean I I don't know and I'm the last person that should probably dedicate on it. Really? I think so. Y you're

Managing Agent Autonomy and Oversight

You obviously know you you define your tasks at a much higher level, right? But I still want to be kept in the loop how the task is being executed. Usually when I specify tasks, I'm never precise enough that I basically all the possible trade offs and and design choices and these things are are clearly specified, right?

So whenever one of these these things happens, either I wanna it should be Guido, what should I do here? Or at least it should be Guido, I decided to do X. So you probably still want some kind of user interface, right? I mean it looks very different, don't get me wrong. I mean, I think you probably live on the far right side of the distribution for users of this stuff. What can you think of that? The left side is like the total vibe code, like total like اشتركوا في القناة

Give me an app to tell me plan my wedding versus sort of like I want step by step instructions on architecture choices. Like there there's like a there's a spectrum there. And I think most people land in the middle of that. Like I think you probably wanna get pinged on stuff where it's like a big deal or something fails. But I don't know about like progress. I mean I I mean like I said, I'm the worst person to get product.

Okay, I I I buy the progress part. Just give me the answers. But but I mean if there's if there's meaningful choices, right? Yeah. But you would probably get that up front. An app to plan my wedding. Does the does it involve travel, right? You know, that may may change things. But you would probably have some iterative process with the Yeah, yeah, yeah, that's you. Well, yeah. I mean I guess it would be, yeah.

I mean maybe. Show me a flow chart or something like or show me like concepts. I mean I think that there's still some aspect there. Maybe it's all just text with images. One way OpenCloud has evolved the UI a little bit, which is like very clear on their app, is it abstracted away cron jobs. As a developer, obviously I used to handwrite the cron job schedule. I always have to look it up. Terrible.

It defines the schedule the same way. But now you don't really care about it anymore. Like I was investigating with OpenClaw on like why did you didn't you notify me five minutes ago on something? And it's like, let me take a look. Okay, here's my crown job. So how the cron job works is that it will wake up, it will ping me, and I will wake up and then I'll my I'll um brain I'll process it and then I'll ping you.

So that's how it works now. Like I don't really interact with like I don't care about when the schedule wake wake up in a systematic level. It's more there's a LM taking care of all the systems and orchestrating all of them. for me.

I think this is interesting. To some degree, I think what OpenClaw has has done is is it's it's taken all this of autonomy that we had before for software development and now it does starts applying that a little bit at a systems level. Yeah. Right? It's it's no longer about Just the the you know, my the code itself, but all the things around it, the integrations, you know, the the cron jobs, the operating system, the ports, you know. And when you think about it, email is the Q infra for humans.

And Crown Job is the queue info for agents. Now you just get to abstract away all of that and give all the cues to the agent and they can just process. But sometimes they do need to wake up and then use a very expensive function call which is ask a human to do something. Yeah. In the future they have a tok uh have a token budget and a human interaction budget. We we need to figure out our token thresholds and as humans. Yeah.

Future Integrations and Advanced Security

For open claw, I guess, what are the uh extensions that you all are most excited about that don't yet exist? Uh or are what are the system improvements you want to see? I think my my number one thing would be various consumer sites which currently are incredibly hard to integrate. Compute consumer websites, like like DoorDash, like you know, like like travel booking and all these sites. Well, they mean

We need better a what is it? A AI agent interfaces? Well, we don't have a time for that. As it was user interfaces, right? We need the the equivalent for for for claws and agents that they can that they can talk to these services. Right now you basically have to implement them via browser use or you know, typically via browser use and and it's um super brittle. Right. Right. But uh that doesn't work well.

As a security nerd, I'm gonna say the security tools. Um it's gonna be I mean, so like their integrations with password managers are pretty cool. Yeah. Uh and they work like incredibly well. And it's it's really funny because

It you know, password managers are one of those things where it's not security best practice, but it's certainly better than what most people do. And so it's a net improvement. Maybe you can't do diet and exercise, but if you can get diet right, maybe that helps. Um so As it starts to add these security tools, like you could just have sort of like these agents that kind of look over your shoulder and make sure you're not doing anything stupid.

Um these lay the the the the frontier models are incredibly good at spotting phishing and frauds and maybe maybe if you have them working through your email inbox, they can help kind of remove and flag some of this stuff in a way that the traditional controls don't work. Um as you write code or you use services or maybe you create some sort of

Infrastructure, they make sure that you don't over provision. Right. So like I I can't run Wiz as a as a home user, but maybe I do need something that probably makes sure that I don't set the permissions wrong in an S3 bucket. Um, so stuff like that is like incredibly powerful, I think. Like it it could but again, I'm on the other side of the distribution on this.

Will there be an agent specific vault? I mean, I used to work at Hashi Corp. I love Vault, the open source tool. It's so useful. It's just like it's generation defining. So now the question becomes the, you know, the workloads are a little different. Is there an agent specific vault for open claw of the world? Um does that look different?

I mean, I I kind of use just one password. And uh one password has lots of flaws, right? I mean I'm I'm probably very happy. I'm happy with a security model. I think I would not necessarily recommend it. But they they're they're they're you you can basically just create a new vault, get a token, give that to the agent, then the agent can access everything that's in that particular vault, right? It doesn't rotate the token, which is what Ball could do. I mean...

Possibly, yes. Yes. But the the problem with rotating so let's define token. The rotating the token to access the vault, it's not clear to me what that gains necessarily because, you know. It'd be a

But you can monitor where you get it where the vault is accessed from. So okay, maybe, right? Um but but the I think the the re more important thing would be all the tokens that are in the vault I want to rotate right uh, you know, from time to time because those, you know, I cannot monitor. And but the problem is

Those are often consumer sites, right? And consumer sites have zero functionality for rotating uh for rotating tokens. I mean, other than going into some crappy UI and and doing it there, right? And so I mean cookies in the browser is a form of token rotation. Because it updates once in a while and then what a lot of the agents do is like they take the cookie token and then they refresh it once in a while to read. You know what I mean? Very hacky way to do.

The first sketchy thing my agent did was start looking for cookies and I was just like, I didn't ask you to do that. did ask me, so when I was trying to place the fills order on DoorDash, it's like I can't get through the spot detection thing, but you can give me your username and password. Not recommended, but that will work. Why why not give it a separate account? Um I could give it a separate account, just need to create it. And I think to me,

I I think that's important that I think in the future agents should have separate accounts for absolutely. Right? They sh they should never share with you because you want to just keep a separate trust domain there. Um you probably want to link the accounts, right? And it's um but but give them give them virtual IP I keys, virtual credit cards, you know, so something that they that everything at the end of the day has a layoff and direction in between that you can want to separate. Yeah.

Um, my wish list for OpenClaw is actually more of a multi-threading model. So today it's very single-threaded, which is great for single tasks and you can create new sessions. But it kind of breaks when you have like five tasks running in parallel, which is pretty common for these personal assistant agents. Um so for example, like I wanted to do uh you know generate the gaming assets on one thread, but then at the same time, I wanted to go code up something, use the coding tools.

When that happens, it actually became really slow where it was switched between the tasks which so like the context between the sessions actually is not managed perfectly today. And it's very slow. I don't know if it's because the models are slow or like it's just uh it the UI is just like slower than like say a Pi War II. Use deep.

Yeah, very much. I mean look it it hangs often. When when I installed it, memory by default was broken. You know, it first time I asked it to use iMessage, it for some reason didn't use the Yeah. Love that. It's like it was a why and uh so it's like why are you doing this? Oh yeah, we could also use a standard integration. That's probably faster. I was like, okay, stop and then do that instead.

I do wonder if the build versus spy choices from the agents, open claw agents, follow the distribution of a build versus spy choices by the model. So for example, if you prompt codex, um, would it choose to build everything or is OpenCloud choosing to build everything because of some system engineering? Probably works like a typical enterprise where it's arbitrary. Why'd you build it?'Cause we did. It is fine flip. Yeah, if coin flip.

Enterprise Deployment and Risk Mitigation

So what's the next set of things you guys plan to experiment on Open Cloud? I mean the and this is the this is the big thing for I think a lot of a lot of IT organizations and a lot of companies right now is figuring out how do you run these things and just like

I remember when I started, I was thinking, oh well, you can run it in con in a container, spin something up and load that. And then it was like, well, these things write code and they're pretty clever and they can probably escape containers. And there's a lot of reasons why you wouldn't want to do that. Maybe it's a VM.

And started looking down that road and then it's like, well, you're already you're already in for a penny, might as well go for a pound and just buy a Mac mini, right? And so I think like the the default motion for this now was sort of like, let's just run them on Mac minis. Uh good luck finding a Mac mini right now. Um but but so like it's become a dedicated hardware thing. And then the so the question ultimately in my mind is like what is the stack in which you execute these things look like?

How do you actually bring this to like an employee's desktop without putting your firm at risk? Yeah. Um, you know, that sort of stuff I think are really difficult unsolved problems. I'm I'm still not sure. I think we're still quite a bit away from this becoming part of my daily sort of mainline workflow. On the fringes, it can pick up a couple of tasks. But but to like say, working at Adjuston Horowitz, right? What is the point where I would say, uh just give this access to

you know, uh uh or or a pre preterum due diligence uh folder or something like that, right? That that is a pretty big leap. I think we're pretty far away from that. I wouldn't find a scope permission. So I I could see it getting like with the model you described to a point where I forwarded an email and say, Do something, analyze I don't know, like look look at the the data in here and even like

Even a simple use case here like ordering us boba for our team meeting, right? Like I think it's still not it's still hard to make that work, I think, within a corporate IT environment in a safe way, unless you do dedicated hard. अग्रेया अग्रेगा अग्रेगा अग्रेगा अग्रेगा अग्रेगा Do you think it I mean, do you think there's the risk of escape? Yeah. You wouldn't. We have a Mac Mini inside of our office that just runs the open call but doesn't give it I mean I think that's what we're gonna have.

I think that's exactly what we have. Um, but Mac that doesn't scale, right? You've got you've got six hundred or to a thousand people and it's a sort of like, well, I can't buy a thousand Mac minis.

I look I think we can get there with VMs. Um I th I I I th I'd be like if you say you have a dedicated host that runs, you know, like a dozen or so VMs for a dozen employees, it's like okay, blast radius is probably okay. But but there's still the the issue What if this downloads the latest integration it found on some open claw, you know, bulletin board?

Yeah, yeah. Yeah, exactly. So so be careful. So, you know, the and then you wanna restrict the blast radius somehow, right? It's like look, if so so I mean what I thought about is could you do something where for example

I give it access to say certain documents or certain emails, right? And I sort of have to do it in an explicit way. Maybe I can say my my inbox for today you have access or something like that. But then every night at midnight it resets. It would make it feel a little bit better. So somebody can compromise a day worth of stuff. That's what we do with like Kubernetes, right, in our container infrastructure. Yeah, exactly.

Exactly. So occasionally occasionally you just reset state and and that sort of makes it a little bit easier. Um you know, and then if if if you have that plus separate accounts for everything, I I don't think I don't think you should ever use my account for anything, honestly. I think it should be separate for. probably never run locally on your machine.

It's a different trust domain. Yeah. Today I think it's pretty safe for the transient like crown job, wake up, look at something, but do not remember it kind of. So, uh, for example, like um maybe hour every hour wake up, look at my calendar, see why I'm busy or not, if I'm not gonna be home for dinner, tell my husband.

So that would be a use case I I'm pretty comfortable with. Um so there's actually a lot of if you look at the app's distribution on usage on your personal laptop, there's only a couple. Like there's Slack. We talk to each other all the time. There's email, which is like most of the time is spent on email. There's like all the to coding tools, that's like something else.

there's calendar. So if there you can just uh streamline certain tasks on email and calendar, that's actually a huge win for personal assistant. And there's a long tail of like I write this thing on notion, but you know, in this case, um, for the agents it's just mark down. And then you can persist it anywhere. It doesn't really matter what it looks like.

Um, it is really interesting uh when I think about what's the future of note-taking will look like for agents, right? Today we kind of default to Markdown, but then there's there could be stuff that's executable inside of Markdown. There could be blocks, there be could be charts.

So Markdown just seems very limiting as a format. So I do wonder if there's like Markdown Plus Plus where agent can have runnable things that it remembers as part of the Um You can do charts and markdown with mermaid or like these are the same. You could, but like I meant like charts like hex like charts. Yeah. Oh I see. So it's how like a okay, uh Jupyter, no Python.

Exactly. Like code that's runnable and that's part of the source of truth when you take notes.'Cause it's not just words, it's also That you create along the way. Th this needs to be a whole trend at the moment of expressing graphs as code. Yes. Putting all of this together, I think what's super fascinating to me is this is one of the first time where I haven't technology

But what it can do is not limited by its abilities, but limited by h how I can make it secure and and and stop it from doing certain things. Right? It's like it's this we have this uh this genie in a bottle and it's amazing, but how do I uh contain this? So I has that ever happened before? I I mean security has always come at the end. Like it's never

Embracing the AI Agent Revolution

I I think it's just that we've solved we've solved the coding side of this, the writing code side, and now it's more of a systems engineering. These are all fundamentally just systems and architecture problems. It's not necessarily security issues. Social engineering to some extent is, but that's The problem is is you're bringing up your you're you're commingling risks acros across different trust domains with this. So you have

You have the trust and safety and alignment issues with your underlying foundation models. You have the systems architecture and execution around how OpenCloud does things on your local machine. And then you have the sort of Um the the the the traditional hacking is sort of, you know, prompt injection type stuff. Like people want to do malicious people want to rob you.

We're not stopping there. Because yeah, even if everything is perfect, you still may not want to have certain information bleed over. You have all the sharp edges that are left over from a world that was built for humans. Yeah. Batches. It's it's okay for humans. Those poor agents. You can fire a human, right? I mean, it's like, Yeah, YOLO.

If I dare to put it in a two by two in a very VC way. So there's the uh low security risk and high security risk. There's low value tasks and high value task. So what is something that's low security risk but high value it has? Probably the example of um emailing your husband that you're gonna be late to die. I mean yeah, I guess that's one I could Yeah. Yeah. And I I mean it's just sort of uh

I mean the issues with these things is always the escalation of privileges and the escape out of the environment they're in. And so you can see where these things would jump into doing something that's actually high risk. I think one category that I would put in there is

You can just use something like OpenClause a really smart UI to your LLM in a sense, right? And basically say, let's forget memory, forget state, give it a task, when the task is done, it resets all state. That that makes it a lot more secure. Right now, let's assume I have a PDF in an email and I'd like for an L L M to look at a PDF. It's still kind of cumbersome. I have to save this thing, right? And then go to the LLM and import it and do the analysis and then export and so on.

Just being able to say like, hey, hey claw, look at this thing, give me this analysis, right? And back comes an email with this data. And afterwards the claw discards every state, right? That you I think that we could get to pretty quick. oh i'm excited to use open club for my taxesyeah Yeah. I'm biting on my tongue. If we don't tell this to the IRS then, IRS Your eyes is probably open cloud to review other prices.

You never know. What is something that's for a company, not on a personal setting? High security risk, but very high value. Like you want to automate it yesterday using open cloud, but it's risky. Taxes.

Anything financial. For a company owned. Anything but accounts payable. Like accounts payable, vendor review, like third party assessments, like all this stuff where We have actual humans that spend a tremendous amount of time validating that the vendor exists, making sure the instructions for payment are correct, making sure it's the right PO and not someone doing some sort of social engineering attack.

make and there's just like a whole lot of stuff around vendor management, I think, in the enterprise where these solutions could really sort of increase a lot of efficiency. Um, but if they go sideways, you start writing checks to the wrong people. Yeah, exactly. So what's your advice for um the corporation managers and executives who are open clock curious? Ha ha ha.

I I think this is one of those things where like I mean, I'm a profound believer that if you don't feel uncomfortable you're not growing. And this is one of those times when you're gonna feel very uncomfortable, but you need to lean into this. And I think I think to to to Guido's point and to like I think a lot of the points we've made is like

I can't see these as doing anything other than creating a lot more jobs. Like there's just so much more stuff that needs to get built, needs to get managed. And it's like if you want to be part of that wave, you gotta lean into it. And it's

The same thing happened with cloud, right? When cloud came around, I remember sitting in my big corporate job thinking half of these people will be gone in five years. Cloud infrastructure will just become commodity service abstracted away and we won't have tech people. And then lo and behold,

Ten years later, twenty years later, like the IT organizations are bigger than they were then and they're spending even more money. And so like I just think that there's just so much opportunity with this stuff. That you just have to lean into it and you have to get comfortable with being uncomfortable and try to take smart risks.

I I think actually good analogy is the early days of web and the internet. Right? Where back in those days, some companies they banned the web browser, right? It's like, oh the web browser is insecure. It's like, well Yes, it is insecure, but missing out on the Internet revolution was a far a larger risk, right? And you you you get bounced and nobled if you're if you're if you're not careful. I mean city good. Right.

Trying to ignore this new technology and waiting for it to go away usually doesn't work. If you want to retire, that's a great strategy. There we go. Thanks for listening to this episode of the A6TZ Podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or a review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at A16Z, and subscribe to our Substack at a16z.substack.com.

Thanks again for listening. See you in the next episode. As a reminder, the content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund.

Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z. com forward slash disclosures.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android