#19 – Brooklyn Zelenka: UCAN, Beehive, Beelay | localfirst.fm podcast

⁠¶ Intro

00:00

we've restricted ourselves down to making things look like access control lists, on the outside. and so it should feel very, very similar to doing things with role based access control using, say, OAuth. That should all feel totally normal. You shouldn't really have to think about it in any special way.

00:20

In the same way that, you know, if you have a sync server, other than having to set up the sync server, or maybe you pointed at an existing one, knowing that it's there doesn't mean that you have to, like, design it from first principles. Welcome to the localfirst.fm podcast. I'm your host, Johannes Schickling, and I'm a web developer, a startup founder, and love the craft of software engineering.

00:39

For the past few years, I've been on a journey to build a modern, high quality music app using web technologies. And in doing so, I've fallen down the rabbit hole of local-first software. This podcast is your invitation to join me on that journey. In this episode, I'm speaking to Brooklyn Zelenka, a local-first researcher and creator of various projects, including UCAN and Beehive.

01:01

In this conversation, we go deep on authorization and access control in a local-first decentralized environment and explore this topic by learning about UCAN and Beehive. Later, we are also diving into Beelay, a new generic sync server implementation developed by Ink and Switch. Before getting started, also a big thank you to Convex and Electric SQL for supporting this podcast. And now my interview with Brooklyn. Hey Brooke, so nice to have you on the show. How are you doing? I'm doing great.

01:32

Super excited to be here. I'm glad that we, made this happen. Thanks so much for having me. I was really looking forward to this episode and honestly, I was quite nervous because this is certainly bringing me to an aspect of local-first where I have much less first hand experience myself. I think overall local-first is a big frontier of pushing the boundaries, what's possible technologically, et cetera. And you're pushing forward even a further frontier here all around local-first auth.

02:03

So the people in the audience who are already familiar with your work, I'm sure they're very thrilled for you to be here, but for the folks who don't know who you are, would you mind giving a brief background? Yeah, absolutely. I'll maybe do in slightly reverse chronological order. So, these days I'm working on a, Auth system for local-first, mostly focused on Automerge called Beehive, which does both read controls with encryption and mutation controls with something called capabilities.

02:33

I'm sure we'll get into that. Prior to this, for A little over five years, I was the, CTO at a company called Fission. so 2019, we started doing local-first, there.

02:44

And we worked on, the stack we always called, auth, data, and compute and so we ranged out way ahead on, a variety of things, trying local-first, you know, Encrypted at rest databases databases, file system, a auth system, that has gotten some adoption called UCAN, and Compute Layer, IPVM and prior to that, I did a lot of web and was, temporarily, did work with the, Ethereum core development community, mostly working on the, Ethereum virtual machine. That is super impressive.

03:13

I am very curious to dig into all of the parts really around auth, data, and compute. however, in this episode, I think we should keep it a bit more focused on particularly on auth. Maybe towards the end, we can also talk a bit more about compute. Most of the episodes we've done so far have been very centric around data.

03:34

Only a few have been more, also exploring what auth in a local-first setting could look like, but I think there is no better person in the local-first space to really go deep on, on all things auth. So through your work on Fission, and previous backgrounds, et cetera, you've, both participated in, contributed to, and started a whole myriad of different projects, which are now really like on the forefront on those various fields. One of it is UCAN.

04:04

You've also mentioned Beehive at Ink & Switch. Maybe starting with UCAN, for those of us who have no idea what UCAN, that four letter acronym, stands for and what it means, could you give us an introduction?

⁠¶ UCAN

04:18

Yeah, absolutely. So UCAN, U C A N, User Controlled Authorization Networks, is A way of doing authorization, so granting the ability to somebody else to perform some action on a resource, in a totally peer to peer, local-first way. It uses a model called Capabilities. So instead of having a database that lists all of the users and what they can do, you get certificates that are cryptographically provable.

04:48

And so if I wanted to give you access to some resource I controlled, I would sign a certificate to you. And then if you wanted to give access to someone else, you would sign a certificate to them. And then when it came back to me, I could check that that whole chain was correct. And so people have used this to, do all kinds of things. So at Fission, we were using it for CRDTs.

05:08

For example, there's a CRDT based file system that we had developed, to guard whether or not you were allowed to write into it. There's a bunch of teams now using it for, managing resources. So, storage quotas. How much are you allowed to store inside of some data volume? and for them, it's really helpful because then they can say, Okay.

05:26

Here's a certificate from us to, you know, say a developer, and then they can portion that out to all of their users without having to always register all of their users back to, the storage company. and so it can, both lower the amount of interaction that they have to do with, you know, registering all of these different people, but it also means that they can scale up, really nicely their service so as long as they know about the root signature.

05:50

They can scale horizontally, very, very easily or interact with other teams very easily by just issuing them certificates. So, like, people are doing that kind of thing, So, you've mentioned the term capabilities before, and I think that's also a central part in UCAN.

06:04

I'm most familiar with, from my more traditional background of like building more centralized server applications, et cetera, and how you implement auth is always very, very dependent on the kind of application that you want to build.

06:17

if you want to start out a bit more easily, then you could maybe lean on some of the primitives that a certain technology or platform is giving you, maybe using Postgres and use sort of like the, role based access control patterns that you have in Postgres or maybe something even as off the shelf as Firebase. is this sort of like a useful mental model to think about it that you can gives me similar building blocks or how much more fine granular can I get with what UCAN offers to me?

06:48

Yes, it's a great question. So, in, role based access control or any of these, access control list based systems, right? you put a database that has You know, a list of users and what they're able to do. So often their role, are they an admin? Are they a writer? Are they a reader? You know, all of these things. and, to update that list, you have to go to that database, update that database, and on every request that you make, you have to check the list.

07:20

So sometimes we call this like, it's like having a bouncer at a club. You know, you show up, you show them your ID. They check, are you on the VIP list? And then you're allowed into the club or not, And what those rules are, are set by that, you know, by that bouncer, right? These are the only rules, no others. in a capabilities world, the analogy is, is often to having like a ticket to go see a movie, So, this last weekend, I went to go see Wicked, it was awesome.

07:45

but I bought my ticket online, it showed up in my email, they didn't ID me on the way in, I just showed them my ticket and they're like, Oh, great, yeah. Theater 4, you can go in. so as long as I had that proof with me. I'm allowed in. They didn't have to check a list. There was no central place to look. Capabilities, are not a new model. They've existed for some time. In fact, a big part of the internet infrastructure runs on top of capabilities as well, or a subset of them.

08:15

But it hasn't found its way as much into applications because we're so used to access control lists. The granularity that you mentioned before is really interesting because, in the capability system, anytime I make that delegation to somebody else, I say, you're allowed to use this thing, or then you go to somebody else and say, you can also use this thing. You can grant them the ability to see or to use that. or fewer capabilities.

08:38

So if it was like, here's a terabyte of storage, you could turn around and say, well, here's only 50 MBs to somebody. And so you can get as granular as you want, with it. And, there's never any confusion about who's acting in what way, right? So in a traditional system, if we had, you know, with, with access control lists, you sat, you know, you ran a service. between the user and me, and they made a request to you. Well, they only have a link to you and you only have a link to me.

09:08

So when you'd make the request to me, you'd be using your terabyte of storage. And so there are some cases where that can confuse the resource. So it's like, oh yeah, you can totally store it, you know, use a terabyte of storage, even though the actual user shouldn't be able to do that. With capabilities, we get rid of that completely. We have this entire chain of custody, basically, of this. As granular as you want to get, it's very clear on every request, what that request is allowed to do.

09:34

so I think this is going to become really important for things like, LLMs and other sort of automated agents where you can tell it, Hey, go do things for me, but not with all of my rights, not as sudo. Only with, in this scenario for the next five minutes, these things are what you're allowed to do. And even if it hallucinates some other intention, those are the only things it's able to do. Yeah, I think this is, such an important aspect.

09:59

since I think you don't even need to reach as far as giving agency to a An agent to an AI, but even if you want to go a bit more dumb and a bit more traditional, if you want to use some off the shelf SaaS service, and, maybe that thing integrates with your Google account. Then you also like, you need to give the thing somehow access.

10:23

So you do like the, OAuth flow with Google and then it asks you like, Hey, is it okay that we have access to all of those things, that we can do all of those things? And even though Google's already offers some pretty fine granular things there, often I feel like, Oh, actually I want to make it even more fine granular. Wait, you're going to have like access to all of my emails. Can I maybe just give you access to my invoice emails if this is an invoicing thing?

10:50

So I feel like it's both a bit overwhelming to make all of those decisions upfront, like what should be allowed, Both from a application end user perspective, me using the thing, but then particularly also from like an application developer perspective. And, yeah, it feels like a really, really important aspect of using the app and building, designing the app. And if that is not, intuitive and ergonomic, then I feel it's going to, everyone's going to suffer.

11:19

The application developer, they're Probably just going to wing it, and that will mean probably too coarse of a, granularity for application users, etc. So I'm really excited that you're pushing forward on this. maybe also to draw the analogy, between more traditional OAuth flows and what UCAN is providing. It's, should I think about UCAN as a replacement for OAuth from like both, end user perspective, as well as from an application developer perspective? Yeah exactly.

11:52

so the, the underlying mechanism is different, But we really wanted it to feel as familiar as possible. So even the early versions of UCAN used the same token format and things like this. We've since switched over, to some more modern formats. There are problems with JWTs. but yeah, exactly. You can think of it as, local-first OAuth is one way of thinking about it, exactly. Right. So as an application developer, I need to make up my mind once to say like, this is what's possible.

12:23

This is what is allowed and like define, and then the system then enforces those rules, but often I, as an application developer get it wrong and I need to like, either make the rules like more permissive, or or less permissive over time.

12:40

And similar to how I might get wrong a database schema and then later need to do those dreaded database schema migrations, what is the equivalent of a schema migration, but for UCAN capability definitions, etc. so all of the information that you need to fulfill a request in UCAN is contained in the token itself. so, these days we have a little, policy language, think of it a little bit like, like SAML, inside the token.

13:08

And it says, okay, when you go to actually do something with this token, the, Action has to match the following criteria. you're sending an email. So the two fields has to only be two people inside of the company. Or, you can only send, newsletters on Mondays or whatever it is. Right. And you can scope that down arbitrarily, syntactically. So updating those policies is just issuing a new certificate, to say this is what you're allowed to do now.

13:36

and, you know, you can revoke the old ones if that's needed. But I think the more interesting part of this actually is on the far other end. So we were talking about, you know, the developer sets these policies. And that's true, I would say, the majority of the time. But it's not very, It doesn't respect user agency, right? You're giving the developer all of the agency, but the user's the one who owns whatever, let's say that it's a text editing app, right? You know, so they own the document.

14:04

Why can't they decide, you know, when they share with somebody else what they should be able to do with that document? so in, say, you know, Google Docs, you've got that little share button in the top corner and then says, you know, invite people and then you can say, well, they're an editor and this person said, you know, another admin and this is another viewer. This person can only comment. I think the UI is.

14:22

You know, we'll usually stay like that, but you could add whatever options you wanted in there, right? Why not? So when we were doing, back at Fission, the file system work, you could scope down to say, like, well, you're allowed to write into only this directory, for example, and that was very, very flexible. Or, you're allowed to write files under a certain size limit, right? And so the user now can make these decisions of like, I'm giving you access to my file system.

14:48

I only want you, you know, maybe I'm, you know, I'm thinking back to my school days, you know, a teacher and they're having students submit, assignments to them. Well, you can only submit them to this one directory and I don't want you filling up my entire disk. So they have to be under a gigabyte or whatever, right? And so you can imagine scenarios like this, where we're now inviting the end user to participate in what should the policy be. It's not all set completely.

15:13

The developer can absolutely set it in advance, but you can also then refine it further and further, for the user's intention. Right. I love that. Since particularly now with like LMs and AIs in general, now a non technical user can now just in the way how they would say to another person, like, Hey, I want to give Alice access to this file, but Alice is only allowed to like read the first page here. The second two pages, those are like my private notes. Please don't give anyone access to this.

15:44

You know what? Like actually Alice is allowed to also like comment on it. Like just from like a, a very like colloquial sentence like that, a computer can now derive, those capabilities very accurately. Represented to the user, like, Hey, does this look right to you? And, leveling up the entire application user experience.

16:04

so it's very reassuring to me that all of this is built on top of very sound cryptography, however, even though I've studied computer science and like I have done my cryptography classes. That being said, I have, that's not my day to day thing. And as an application developer, I'm trying to steer away from like low level cryptography things as much as possible, just because I don't consider myself an expert in this.

16:31

So it's very good to know that everything on that is built on top of very solid cryptography, but how much as an application developer, how much do I need to deal with like signing things, et cetera, or how much of that is abstracted from what I'm dealing with? Yeah. so I would say that there's two layers here that people find. correctly find scary, myself included, right? cryptography and auth in general, both super scary topics.

16:59

I remember, you know, as a web dev, whatever, 10 years ago adding, in a web app, the, You know, the Auth plugin and kind of going, and if I don't touch it, hopefully it'll work, right? really the goal with all these projects was to hide as much of the scary complexities in there as possible. So we handle all of the encryption and signing and all of this stuff in a way that should make it, if we do our job well. Completely invisible, to the developer.

17:27

So even, you know, we haven't talked about Beehive very much. Beehive has both a, which is this, project I'm doing, at Ink & Switch to add access control to Automerge. It has both a encryption side, so that's read controls, and then capabilities for these mutations or, or write controls. and for encryption, there's a bunch of things that have to happen. We have to serialize things in an efficient way. We have to chunk them up.

17:52

We have to, make sure that we share the encryption key with everyone. but no and nobody else, right? And that could be, Thousands of people, potentially, and we've set ourselves these, these goals of, you know, you should be able to run, run this inside of a large organization or a medium sized organization. how do you do all that stuff efficiently? And our goal is you should be able to say, add these people, and it just works.

18:16

You do all your normal Automerge stuff, and on, you know, when you persist to disk, or when you send it out to the network, then it gets encrypted, then it gets secured, then it gets signed, all of this stuff. And you don't have to worry about any of it. when you set up Beehive, it generates keys, it does all the key management for you, it does all of the key rotation, all of this stuff. so, again, it's one of these things where it's like, I'm really excited about this.

18:40

and it's like super cool to get to work on. And there's a lot of interesting detail on the inside, but in an ideal world, nobody has to think about this other than I want to grant these rights to these people and everything else is taken care of automatically. I love that. so you've motivated initially that UCAN, happened as a project while you've been working on various projects at Fission. and right now you're mostly focused on Beehive.

19:08

So can you share a bit more, what was the impetus for Beehive coming into existence and then going into what Beehive is exactly?

⁠¶ Beehive

19:19

absolutely. So, you know, we started UCAN very, very early in 2020, came out of normal, regular product requirements of like, oh, well, we probably want everyone to read this document. How do we do that? Or I don't want somebody to fill up my entire disk. How do we prevent that? And, that went through a bunch of iterations and we, we had a lot of learnings come out of that.

19:42

I'd say that really the big one was in a traditional app stack, you have data at the bottom, you know, you have to say Postgres and that's your source of truth. And then above that, you have some computes, maybe you're running. Whatever, Express. js, or Rails, or Phoenix, or you know, one of these. And then on top of that, you put in an Auth plugin, right, that uses all the facilities of everything below it.

20:04

but that requires that you have a database that has all this information in it that lives at a location. We call this, internally at Ink & Switch, auth-as-place. Right? Because your auth goes to somewhere, right? And on every request, you present your ID, they go, okay, sure, you know, here's a temporary token, then you hand that to the application, the application checks with the auth, you know, server again, and you do this whole loop.

20:28

And that has, you know, problems with latency, if you go offline, this doesn't work, and it doesn't scale very well, right? Like, even Google ran into problems with this and started, adjusting their auth system. we found at Fission, and I, I think this, this Very much holds true, like we just kept learning this over and over again, is you can't rely on that system. In fact, auth has to go at the bottom of the stack.

20:50

your auth logic and the auth, the thing that actually does the guarding of your data has to move with the data itself. So we call this "auth as data". So for read control, it's no longer, oh, I'm making a request to a web server and they may or may not send something to me. It's, I've encrypted it. Do you have the key? Yes or no. If you. Have the key. You can read it. If you don't, you can't. And it doesn't matter where you are. You could be on a plane disconnected from the internet.

21:16

You can decrypt the data, right? So we developed these ideas with, with UCAN and, the web native file system, in particular, Fission unfortunately didn't make it, earlier this year, or I, I'm not sure when this will be released in, early in 2024. and, Ink & Switch reached out.

21:32

So we, we've, we've known those folks for a while, cause we've been, you know, obviously working in the same space for a while and, PVH, the lab director was actually an advisor at Fission and said, Hey, we have a bunch of people that are interested in getting, auth for Automerge in particular. could you apply UCAN and WNFS to Automerge? And I said, I don't see why not. Right.

21:56

and so we, we looked at it, a little bit deeper and went, well, yes, like we, we could use these things directly, but they're tuned for slightly different use cases. UCAN is extremely powerful. It's very flexible. and it has a bunch of stuff in it for this, you know, network layer, in addition to CRDTs. You pay for that in space, right? The certificates get a little bit bigger. And so we said, well, okay, maybe, you know, we want these documents be as small as possible.

22:23

You know, there's been a lot of work in Automerge to do compression, right? Really, really, really good compression on them. So the documents are tiny and, you know, you're not going to get that with UCAN. So could we take the principles and the learnings from UCAN and WNFS and apply them, to Automerge? And so ultimately that's what we've done. And there are a couple of different requirements that have come out of it as well. So it's tuned for a slightly different thing.

22:46

But essentially, Beehive says, what if we had end to end encrypted? So in the same way that, you know, say, Signal, end to end encrypts your chats. What if I had end to end encrypted documents? That only certain people could write into, and I can control who can write into them. Has there been any prior art in regards to CRDTs to fulfill those sort of like end user driven authentication authorization requirements? there's some, some nearer term stuff that was also exploring things with CRDTs.

23:19

But, you know, if you go really, really, you know, further back, there's, uh, the Tahoe least authority file system, for example, which was, you know, this encrypted at rest, file system capabilities model, you know, whole, whole thing. Mark Miller was doing capabilities based off going back into, you know, uh, The late 90s, there's capability stuff that goes even further back, but he's, he's, you know, really did the, the work that everybody points at, in, in the stuff.

23:44

But for CRDTs and for a local-first context where we don't assume at all, like there's no server in the middle whatsoever, we may have been the first to do this at Fission. It's, it's possible. I mean, when we got started, the local-first essay hadn't even been published, right? We were doing local-first without, without the term. but there was a bunch of others in the space.

24:04

So, Serenity Notes has done related work, Matrix, Signal, obviously has done a bunch of the end to end encryption stuff, and, local-first to auth, is a, a project that has also worked with, Automerge, to do similar things. so most of these projects, showed up, after the fact. but yeah, so we're drawing from, in fact, we've talked to, all these people and all of the fantastic work that they've done over the past few years, and, collected the learnings, from them into, into Beehive.

24:31

That's awesome. I would love to get a better feeling for what it would mean to build an app with Beehive. My understanding is that Beehive right now is very centric around Automerge. However, it is designed in a way that over time, other CRDT systems, other sync engines, et cetera could actually embrace it and integrate it into their specific system.

24:54

I would like to get into that in a moment as well, but zooming into the Automerge use case right now, let's say I have already built a little side project with Automerge. I have like some Automerge documents that are happily syncing the data between my different apps. so far I've maybe. Put the entire thing, maybe I don't even, have any auth fences around it at all. hopefully no one knows the end point where all of my data lives. And if so, okay. It's like not very sensitive data.

25:26

or maybe I'm running all of that behind like a tail scale network or something like that, which I think in a lot of use cases, simpler use cases, this can also be a very pragmatic approach, by the way. when you can run the entire thing already, like in a fully secured frame of like a, guarded network, and you, you're just going to run this for yourself or like in your home network or for your family and you're all on like the same, tail scale wire guard network.

25:54

I think that's also a very pragmatic approach. but, let's say I want to build an app that I can share more publicly on the internet, where maybe I want to build a TLDraw like thing where I can send over a link where people can read it, but they need to have special permissions to actually also write something into it. I want to build the thing with Automerge. What does my experience look like? Yeah. there are, I would say two parts to that question, right? One is, I have an existing documents.

26:25

how do I migrate it in? And, you know, could I use it with something, you know, you alluded to other, other systems, in, in the future. and, what does the actual, experience building something with, with Behive look like? So Behive is still in progress. we're planning to have a first release of it, uh, in Q1. and, you know, we're currently going at this with with the viewpoint that like adding any auth is better than not having auth right now.

26:50

So like there's definitely like further work where we want to like really polish off the edges of this thing but getting anything into people's hands is better than than not having it right. and there are some changes that we need to make to Automerge because as I mentioned before you know auth lives at the bottom of the stack so anything above in a stack needs to know something about the things below.

27:12

Off being at the bottom means that if you wanna do in particular mutation control, Automerge needs to know about how to ingest that mutation. So we do need to make some small changes to Automerge to, to make this work. but the actual experience is, we're bundling it directly into Automerge or the current plan at least, is we're bundling it directly into the Automerge wasm, and then exposing a handful of functions on that, which is add member at a certain authority level. Remove member.

27:41

And that's it. so your experience will be, we're going to do all the key management for you, behind the scenes, under the hood. if you have an existing document, it'll get serialized and encrypted and put, you know, into storage. And you can add other people to the document. By inviting them using add member or remove member from that document. maybe, maybe also worth noting, this gives you a couple extra, concepts to work with.

28:08

So today we have documents, and you can have a whole bunch of them, and they're really independent pieces, right? And maybe they can refer to each other by, you know, an Automerge URL. instead, or in addition, I should say, not instead, you want to be able to say, I'm building a file system. If I give you access to the root of the file system, you should have access to. The entire file system. I don't want to have to share with you every individual thing. So we have this concept of a group.

28:34

so you have your individual device, you have groups, and you have documents. Each individual device has its own, under the hood, you don't have to worry about this specific detail, but has its own key. So it's in, Uniquely identifiable. Somebody steals your phone, you can kick your phone out of the group, right? Or out of the document and that, that's fine. then we have groups. So let's say that I have a group for everyone at Ink & Switch.

28:59

and then that can add everybody to that, but it doesn't have a document associated with it. It's purely just a way of managing people and saying, I want to add everybody in this group to this document. Right? And so you can have groups contain users and other groups. Then you have documents, which are groups that have some content associated with them. So I say on this document, here's who's allowed to see it. So it could be individuals or other groups or other documents.

29:25

Other documents is interesting because I can say then you have access to this document, this document represents a directory. And so you also have access to all of its children, right? In a, in a file system, you can do things like this. So Add member, remove member becomes very, very powerful because now you can have groups and, you know, set up these, hierarchies of, here's all of my devices. All of my devices sit in a group of Brook's devices.

29:49

All of Brook's devices should be added to Ink & Switch, and Ink & Switch has the following documents. And then, you know, whenever one of my contract finishes and I get kicked out of Ink & Switch, then they can kick all of my devices out by, by revoking that group, right? So using, Beehive is going to feel like that. It's going to say, yeah, I know about the ID for Brooke's devices. Please add her or, you know, contract finishes, please remove her.

30:15

all of the rest of the stuff should be completely invisible to you. So when you persist things to disk or you send them to a sync server, that all gets encrypted first. And even the sync servers have permission. There's a permission level in here of, you're allowed to ask for the, the bytes off, from another node. And they can prove that because you have these certificates under the hood, right? because, and this is an uncomfortable truth, all cryptography is breakable.

30:44

So in 10 years, maybe they break all of our current ciphers. Right? It could happen. In fact, older Cypher's already, you know, broken. Or maybe quantum computing gets very, very advanced, and it becomes practical to break keys, right? Whatever it is. Or there's an advancement in, discrete log problem, or whatever the thing is, right? You know, we have some mathematical advance, and it gets broken. the best thing to do, then, is to just not make those bytes available.

31:10

Make the encrypted content only pullable by people that you trust. And yes, somebody could break into the sync server, let's say, and download everything. But that's a much higher bar than anybody can download. Anybody on the internet can download whatever chunk they want, right? But all of that is handled really for the developer to say, this is the sync server, sync server has the ability to pull down these documents.

31:30

Or even the user could say, I want to sync to this sync server, I'm going to grant that sync server access to my documents to replicate them. But really, we're trying to keep the top level API for this as boring as possible, right? That is a top line goal. Add member, remove member, and the sync server is just another member in the system. Got it. So in terms of the auth as data, that, that mental model, that's very intuitive.

31:58

And, as you're like rewiring your brain as an application developer, like how data flows through the system, now to understand that, like everything that's necessary to make those auth decisions, should someone have access to, to read this, to like write this, et cetera, that this is just data that's also being synchronized, across the different nodes. That is very intuitive.

32:22

is this something that in this particular case, at least with Beehive and Automerge, is this purely an implementation detail? And this is like your internal mental model of this data, or is this actually data that is available somehow to the application developer that the application developer would work with that as they work with the normal Automerge documents? Yeah. So, Again, we're trying to hide these details as much as possible.

32:48

So, you'll hear me talking about things like add member or groups, right? And that sounds very access control list like. capabilities are, like there's a formal proof of this, are more powerful. Like they can express more things than access control lists. So at least for this first revision, we've restricted ourselves down to making things look like access control lists, on the outside. and so it should feel very, very similar to doing things with role based access control using, say, OAuth.

33:20

That should all feel totally normal. You shouldn't really have to think about it in any special way. In the same way that, you know, if you have a sync server, other than having to set up the sync server, or maybe you pointed at an existing one, knowing that it's there doesn't mean that you have to, like, design it from first principles. Or, you know, same thing with Automerge. Technically, you have access to all of the events.

33:41

But really you're going to materialize a view and treat it like it's JSON. And so we're saying the same thing here with Beehive is you will automatically get only the data that you can decrypt and that you're allowed to receive from others and So, essentially, Beehive takes things off the wire, decrypts it, and hands it to Automerge, and then Automerge does its normal Automerge stuff.

34:03

The one wrinkle is if an old write has been revoked, so it turns out that somebody was, like, defacing the document and doing all this horrible stuff, and we had to kick them out, we have to send it to Automerge, Hey, ignore this run of changes. And then it has to recalculate. So that's the one change that we have to make inside of Automerge. but really you will use Automerge as normal. you will have an extra API that is add this person to this document or to this group, and remove them, right?

34:28

As needed. And you shouldn't have to think about any of these other parts, even the sync server. Like, Alex Good, who's the, the main maintainer of, of Automerge. has been working on, on sync and improving sync. and that project started around the same time as Beehive and we realized, Oh, there's actually this challenge because we're, you know, on the security side, trying to hide as much information from the network as possible, including from the sync server, right?

34:52

Sync server shouldn't be able to read your documents. To do efficient sync, you want to have like a lot of information about the structure of the thing that you're syncing so that you have no redundancy. Right? And you can do it in a few round trips, all of this stuff. So we ended up having to co design and essentially, like, negotiate between the two systems, like, how, how much information can we reveal, and still have it be secure?

35:11

And given that you can't read inside the documents, like, how do we package things up in an efficient way? But again, none of that information should be a concern for a developer in the same way that the sync system right now, you don't really interact with the sync system, other than you say, that's my sync server over there and the bytes go over there. There's an extra layer now of, it gets encrypted first before it goes over the wire. That makes sense.

35:33

I think as an application developer, there's typically sort of this two pronged approach. There is like, You, on the one hand, you ideally, you want to embrace that things are hidden from you. That you don't need to understand them to use it correctly, et cetera. But particularly if something's new, some, maybe you're like an early adopter of the technology. you would like to figure out like, what are the worst case scenarios? Maybe the thing is no longer being developed.

35:59

Could I take it over and like, can I become a contributor or maintainer of, of that, or you'd still like to understand it for the sake of like figuring, really understanding, is this. The thing that I want. and just by like understanding how it works, you can come to the right conclusion, like, is this for me or not, particularly if it's not yet as well documented, et cetera. So channeling our like inner understanding application developer.

36:27

I'd like to understand a bit better of like how, Beehive and in that regard, also the sync server works under the hood. Like, it's hard enough to build a syncing system. and now, you build an authorization layer on top of it. What sort of implications does this have for the sync server? And my understanding is that Alex Good is working on this and I think this has been semi public so far.

36:52

And that there's like a, you know, like a sibling product or a sibling project, next to Beehive called Beelay, which I guess like relays messages in the Beehive system. And I think that's a step towards what eventually, we're all dreaming about as like a generic sync server that ideally is compatible with like as many things as possible, I guess, at the beginning for Automerge, but also beyond that. So what is Beelay? What are its design goals and how does it work?

⁠¶ Beelay

37:25

So Beelay, has a requirement that it has to work with, encrypted chunks. So, you know, we do this compression and then encryption, on top of it, and then send that to the Sync Server. The Sync Server can see, because it has to know who it can send these chunks around to, the membership. So Sync Server does have access to the membership. of each doc, but not the content of the document.

37:47

so if you make a request, it checks, you know, okay, are you somebody that, has the, the rights to, to have this sent to you, yes or no, and then it'll send it to you or not. And this isn't only for sync servers, you know, if you connect to somebody, you know, directly over Bluetooth, you know, you'd do the same thing, right? Even if, you know, you can both see the document. There's nothing special here about sync servers. To do this sync, well, we're no longer syncing individual ops, right?

38:10

Like, we could do that, but then we lose the compression. It's not great, right? And ideally, we don't want people to know, you know, if somebody were to break into your server, hey, here's how everything's related to each other, right? Like, that compression and encryption, you know, also hides a little bit more of this data. We do show the links between these, you know, compressed chunks, but we'll, we'll get to that in a second.

38:32

Essentially what we want to do is chunk up the documents in such a way where, there's the fewest number of chunks to get synced, and the longer ranges that we have of, you Automerge ops that we get compressed before we encrypt it, right? On the, I'll call it client. It's not really a client in a local-first setting, right? But like not on the not sync server when you're sending it to it. the more stuff that you have, the better the compression is.

38:58

And chunking up the document here means basically, you're really chunking up the history of operations that then get internally rolled up into one snapshot of the document. And that could be very long. And, there's room for optimization. That is like the, the compression here, where if you set a ton of times, like, Hey, the name of the document is Peter. And later you say like, no, it's Brooke. And later you say, no, it's Peter. No, it's Johannes.

39:28

Then you, you can like compress it into, for example, just the latest operation. Yeah, exactly. So, you know, if you want to think about how this, you know, to get, to get more concrete, you know, if you take this slider all the way to one end and you take the entire history and run length encoded, you know, do this Automerge compression, you get very, very good compression. If we take it to the far other end, we go really granular.

39:50

Every op, doesn't get compressed, but you know, so it's just like each individual op, so you don't get compression. So there's something in between here of like, how can we chop up the history in a way where I get a nice balance between these two? When Automerge receives new ops, It has to know where in the history to place it. So you have this partial order, you know, you have this, you know, typical CRDT lattice. And then, we put that, or it puts it into a strict order.

40:18

It orders all the events and then plays over them like a log. And this new event that you get, maybe it becomes the first event. Like you could go way to the beginning of history, right? Like you, you don't know because everything's eventually consistent. So if you do that linearization first and then chop up the documents, you have this problem where. If I do this chunking, or you do this chunking, well, it really depends on what history we have, right?

40:41

And so it makes it very, very difficult to have a small amount of redundancy. So we found, two techniques helped us with this. One was, we take some particular, operation as a head and we say, ignore everything else. Only give me the history for this operation. Only instruct ancestors. So even if there's something concurrent, forget about all of that stuff. So that gets us something stable relative to a certain head. And then to know where the chunk boundaries are, we run a hash hardness metric.

41:15

So, the number of zeros at the end of the hash for each op, gives you, you know, you can basically say, you know, each individual op, there may or may not be a 0, 0, 0, so I'm, I'm happy with, with anything. Or if I want it to be a range of, you know, 4, then give me two 0s at the end, because that will be, you know, 2 to the power of 2 is 4, so I'll chunk it up into 2s, and you, you make this as big or as small as you want, right?

41:38

So now you have some way of probabilistically chunking up the documents, relative to some head. And you can say how big you want that to be based on this hash hardness metric. the advantage of this is even if we're doing things relative to different heads, now we're going to hit the same boundaries for these different, hash hardness metrics. So now we're sharing how we're chunking up the document.

41:59

And we, Assume that on average, not all the time, but like on average, older, operations will have been seen by more people. So, or, you know, more and more peers. So, you're going to be appending things really to the end of the document, right? So you, you will less frequently have something concurrent with the first operation using this system. That means that we can get really good compression on older operations.

42:28

Let's take, I'm just picking numbers out of the air here, but let's take the first two thirds of the document, which are relatively stable, compress those, we get really good compression. And then encrypt it and send it to the server. And then for the next, you know, of the remaining third, let's take the first two thirds of that and compress them and send them to the server. And then at some point we get each individual op. This means that as the, the document grows and changes.

42:52

We can take these smaller chunks and as that gets pushed further and further into history, we can, whoever can actually read them, can recompress those ranges. So, Alex has this, I think, really fantastic, name for this, which is sedimen-tree because it's almost acting in sedimen-tree layers, but it's sedimen-tree because you get a tree of these layers. Yeah, it's cute, right?

43:15

and so if you want to do a sync, like let's say you're doing a sync of like completely fresh, you've never seen the document before. You will get the really big chunk, and then you'll move up a layer, and you'll get the next biggest chunk of history, and then you move up a layer, and then eventually get like the last couple of ops. So we can get you really good compression, but again, it's this balance of the these two forces.

43:35

Or, if you've already seen the first half of the document, you never have to sync that chunk again. You only need to get these higher layers of the sedimentary sync. So that's how we chunk up the document. Additionally, and I'm not at all going to go into how this thing works, but if people are into sync systems, this is like a pretty cool paper. It's called Practically Rateless Set Reconciliation is the name of the paper.

43:57

And it does really interesting things with, compressing how, all the information you need to know what the other side has. So in half a round trip, so in one direction on average, you can get all the information you need to know what the delta is between your two sets. Literally, what are, what's the handful of ops that we've diverged by without having to send all of the hashes? so if people are into that stuff, go check out that paper. It's pretty cool.

44:23

but there's a lot of detail in there that we're not, we're not going to cover on this podcast. Thanks a lot for explaining. I suppose it's like, Just a tip of the iceberg of like how Beelay works, but I think it's important to get a feeling for like, this is a new world in a way where it's decentralized, it is encrypted, et cetera.

44:42

There's like really hard constraints what certain things can do since you could say like in your traditional development mindset, you would just say like, yeah, let's treat the client like it's just like a, like a Kindle, with like no CPU in it let's have the server do as much as the heavy lifting as possible. I think that's like a, the muscle that we're used to so far.

45:04

But in this case, the server, even if it has a super beefy machine, et cetera, it can't really do that because it doesn't have access to do all of this work. So the clients need to do it. And, and when the clients independently do so, They need to eventually end up in the same spot. Otherwise the entire system, falls over or it gets very inefficient. So that sounds like a really elegant system that, that you're like working on in that regard.

45:32

So with Beehive overall, like again, you're starting out here with Automerge as the driving system that drives the requirements, et cetera. But I think your, bigger ambition here, your bigger goals, is that this actually becomes a system that is, that at some point goes beyond just applying to Automerge, and that being a system that applies to many more other local-first technologies in the space.

46:01

If there are application framework authors or like, like other people building a sync system, et cetera, and they'd be interested in seeing like, Hmm, instead of like us trying to come up with our own, research here for like what it means to do, authentication authorization for our sync system, particularly if you're doing it in a decentralized way. What would be a good way for those frameworks, those technologies to jump on the, the Beehive wagon.

46:33

so if they're already using Automerge, I think that'll be pretty straightforward, right? You'll have bindings, it'll just work. but Beehive doesn't have a hard dependency on Automerge at all. because it lives at this layer below and we, Early on, we're like, well, should we just weld it directly into Automerge? Or like, you know, how much does it really need to know about it?

46:55

and where we landed on this was you just need to have some kind of way of saying, here's the partial order between these events. and then everything works. So, as, just as a intuition. You could put Git inside of, Beehive, and it would work, I don't think GitHub's gonna adopt this anytime soon, but like, if you had your own Git syncing system, like, you, you could do this, and, and it would work. you just need to have some way of ordering, events next to each other.

47:22

and yes, then you have to get a little bit more into slightly lower level APIs. So I, when I build stuff, I tend to work in layers of like, here's the very low level primitives, and then here's a slightly higher level, and a slightly higher level, and a slightly lower level. so people using it from Automerge will just have add member, remove member, and like, everything works. to go down one layer, you have to wire into it, here's how to do ordering. And that's it.

47:48

And then everything else should, should wire all the way through. And you have to be able to pass it, serialized bytes. So, like, Beehive doesn't know anything about this compression that we were just talking about that Automerge does. But you tell it, hey, this is, you know, this is some batch, this is some, like, archive that I want to do. It starts at this timestamp and ends at that timestamp, or, you know, logical clock. please encrypt this for me. And it goes, sure, here you go. Encrypted.

48:11

And, you know, off it goes. So it has very, very few, assumptions That's certainly something that I might also pick up a bit further down the road myself for, for LiveStore where the underlaying substrate to sync data around is like a ordered event log. And, if I'm encrypting those events. then I think that fulfills, perfectly the requirements that you've listed, which are very few for, for Beehive. So I'm really looking forward to once that gets further along.

48:40

So speaking of like, where is Beehive right now? I've seen the, lab notebooks from what you have been working on at Ink & Switch. can I get my hands on Beehive already right now? Where is it at? what are the plans for the coming years? So at the time that we're recording this, at least, which is in early December, there's unfortunately not, not a publicly available version of it.

49:02

I really hoped we'd have it ready by now, but, unfortunately we're still, wrapping up the last few, items in, in there. but, Q1, we plan to have, a release. as I mentioned before, there are some changes required, to Automerge to consume. specifically to, to manage revocation history. So somebody got kicked out, but we're still in this eventually consistent world. Automerge needs to know how to manage that.

49:24

But. Managing things, sync, encryption, all of that stuff, we, we hope to have in, I'm not going to commit, commit the team to any particular, timeframe here, but like, we'll, we'll say in the next few, in the next coming weeks. right now the team is, myself.

49:39

John Mumm, who joined a couple months into the project, and has been working on, BeeKEM, focused primarily on BeeKEM, which is a, again, I'm just going to throw out words here for people that are interested in this stuff, related to TreeKEM, but we made a concurrent, Which is based on, MLS or one of the primitives for, for messaging layer security. he's been doing great work there.

49:58

And, Alex, amongst the many, many things that Alex Good does between writing the sync system and maintaining Automerge and all of these, you know, community stuff that he does, has also been, lending a hand. So I'm sure there's like for, for Beehive in a way you're, Just scratching the surface and there's probably enough work here for, to fill like another few years, maybe even decades worth of ambitious work.

50:24

Can you paint a picture of like, what are some of like the, like right now you're probably working through the kind of POC or just the table stakes things. What are some of like the, way more ambitious longterm things that you would like to see in under the umbrella of Beehive? Yeah. So, There's a few. Yes. and we have this running list internally of like, what would a V2 look like? So, one is, adding a little policy language.

50:48

I think it's just like the, bang for the buck that you get on having something like UCAN's policy language. It's just so high. It just gives you so much flexibility. hiding the membership, from even the sync server, is possible. it's just requires more engineering. so there are many, many places in here where, zero knowledge proofs, I think, would be very, Useful, for, for people who knows, know what those are.

51:09

essentially it would let the sync server say, yes, I can send you bytes without knowing anything about you. Right, but it would still deny others. And right now it basically needs to run more logic to actually enforce those auth rules. Yeah. So today you have to, sign a message that says, I signed this with the same private key that you know about the public key for in this membership, we can hide the entire membership from the sync server and still do this.

51:39

Without revealing even who's making the request, right? Like, that would be awesome. in fact, and this is a bit of a tangent, I think there's a number of places where, that class of technology would be really helpful. Even for things like, in CRDTs, there's this challenge where you have to keep all the history for all time.

51:55

and I think with zero knowledge proofs, we can actually, like, this, this would very much be a research project, but I, I think it's possible to delete history, but still maintain cryptographic proofs, that things were done correctly and compress that down to, you know, a couple bytes, basically, but that's a bit of a tangent.

52:10

I would love to work on that at some point in the future, but for, for Beehive, yeah, hiding more metadata, Hiding, you know, the membership from, from the group, making it, all the signatures post quantum. that is like even the main, recommendations from, from NIST, the U. S. government agency that that handles these things only just came out. So, you know, we're still kind of waiting for good libraries on it and, you know, all, all of this stuff and what have you.

52:36

But yeah, making it post quantum, or fully, big chunks of it are already post quantum, but making it fully post quantum, would, would be great. and then yeah, adding all kinds of, bells and whistles and features, you know, making it faster, adding, it's not going to have its own compression, because it relies so heavily on cryptography, So it doesn't compress super well, right? So we're going to need to figure out our own version of, you know, Automerge has run length encoding.

52:59

What is our version of that, given that we can't run length encode easily, encrypted things, right? Or, or signatures or, you know, all, all of this. so there's a lot of stuff, down, down in the plumbing. Plus I think this policy language would be really, really helpful. That sounds awesome.

53:12

Both in terms of new features, capabilities, no pun intended, being added here, but also in terms of just, removing overhead from the system and like simplifying the surface area by doing, more of like clever work internally, which simplifies the system overall. That sounds very intriguing. The, the other thing worth noting with this, just, I think both to show point away into the future and then also draw a boundary over where what Beehive does and doesn't do, is identity.

53:41

so Beehive only knows about public keys because those are universal. They work everywhere. They don't require a naming system, any of this stuff. we have lots of ideas and opinions on how to do a naming system. but you know, if, if you look at, for example, uh, BlueSky, under the hood, all of the accounts are managed with public keys, and then you map a name to them using DNS. So either you're using, you know, myname. bluesky.

54:07

social, or you have your own domain name like I'm expede.Wtf on BlueSky, for example, right? Because I own that domain name and I can edit the text record. and that's great and it, definitely gives users a lot of agency over how to name themselves, right? Or, you know, there are other related systems. But it's not local-first because it relies on DNS.

54:28

So, like, how could I invite you to a group without having to know your public key, We're probably going to ship, I would say, just because it's like relatively easy to do, a system called Edge Names, based on pet names, where basically I say, here's my contact book. I invited you. And at the time I invited you, I named you. Johannes right? And I named Peter, Peter, and so on and so forth, but there's no way to prove that that's just my name for them. Right.

54:54

And for these people, and having a more universal system where I could invite somebody by like their email address, for example, I think would be really interesting.

55:03

Back at Fission, Blaine Cook. Who's also done a bunch of stuff with Ink & Switch in the past, had proposed this system, the NameName system, that would give you local-first names that were rooted in things like email, so you could invite somebody with their email address and A local-first system could validate that that person actually had control over that email. It was a very interesting system. So there's a lot of work to be done in identity as separate from, authorization. Right, yeah.

55:30

I feel like there just always, There's so much interesting stuff happening across the entire spectrum from, like, the world that we're currently in, which is mostly centralized, for just in terms of, like, that things work at all, and even there, it's hard to keep things up to date and, like, working, et cetera, but we want to aim higher.

55:54

And one way to improve things a lot is like by going more decentralized but there's like so many hard problems to tame and like, we're starting to just peel off like the layers from the onion here. And, Automerge I think is a, is a great, canonical case study there, like it has started with the data and now things are around, authorization, et cetera. And like, then authentication, identity there, we probably have enough research work ahead of us for, for the coming decades to come.

56:25

And super, super cool to see that so many bright minds are working on it. maybe one last question in regards to Beehive. When there's a lot of cryptography involved, that also means there's even more CPU cycles that need to be spent to make stuff work.

56:43

have you been looking into some, performance benchmarks, when you, let's say you want to synchronize a certain, history of Automerge for some Automerge documents, with Beehive disabled and with Beehive enabled, do you see like a certain factor of like how much it gets slower with, Beehive and sort of the authorization rules applied both on the client as well as on the server?

⁠¶ Performance benchmarks

57:10

Yeah. So, it's a great question. so obviously there's different dimensions in, in Beehive, right? So for encryption, which is where I would say most people would expect there to be the, the performance overhead. There's absolutely overhead there. You're, you're doing decryption, but we're using algorithms that decrypt on the order of like multiple gigabytes a second. So it's fine, basically.

57:32

and that's also part of why we wanted to chunk things up in this way, because when we get good compression, you know, all, all of this stuff. So if you're doing like a total, you know, first time you've seen this document, you've got to pull everything and decrypt everything and hand it off to Automerge. the, the encryption's not. going to be the bottleneck.

57:48

and then on like a rolling basis, like as you know, per keystroke, yes, there there's absolutely overhead there, but remember this is relative to latency. So if you have 200 milliseconds of latency, that's your bottleneck. It's not going to be the five milliseconds of, of encryption that we're doing or signatures or, or whatever it is, there's a space cost because now we have to keep. Public keys, which are 32 bytes, and signatures, which are 64 bytes. So there is some overhead in space.

58:22

that happens. but for the most part we've taken, we've chosen algorithms that are known to be very, very fast. They're, they're sort of like the, the, the best in class. So I'll just rattle down, down, down a list for the, the, the, the best. People that are interested. so we're using, EdDSA Edwards Keys for signatures, and key exchange, chacha for encryption, and BLAKE3 for hashing. BLAKE3 is very interesting what you do. Things like verifiable, streams.

58:47

So like as you're streaming the data in, you can start hashing even parts of it as you're going along. the really big, bottleneck, the, like, the heaviest part of the system. or, or sorry, the part that we were at least happy with our original design on that we then ended up doing a bunch of research on was, doing key agreement.

59:06

So if I have whatever, a thousand people in a company, and they're all, you know, working on this document, I don't want to have to send a thousand messages every time I change the key, which will be rotated. every message, let's say, or you know, once a day, if we're being, you know, more conservative with it. and that's a lot of data and a lot of just like latency on this and just a lot of network. So we switched to, instead of it being linear, we found a way of doing it in logarithmic time.

59:35

So we can now do key rotations concurrently, like totally eventually consistently, in log n time. and That has been, a lot of research, happened in there, but then that let us scale up much, much, much more. So the prior algorithm that we were using off the shelf from a paper scaled up to, in the paper, they say about like 128 people, right? It's sort of like your upper bound and we're like, uh, you know, we had set ourselves these, these higher, levels that we actually want to work with.

01:00:02

and so now we can scale into, into the thousands. When you get up to 50,000 people, yeah, it starts to slow down. You start to get into, you know, closer to a second if you're doing, very, very concurrent, you know, uh, 40,000 of the 50,000 people are doing concurrent key rotations. Doesn't happen very often, but like it could happen. if one person's doing an update, then it'll happen. in, like you won't even notice it. Right. So it depends on how heavily concurrent your document is.

01:00:26

Do you have 40, 000 people writing to your document? Yeah. You're going to see it slow down a little bit. It's so amazing to see that. I mean, in academia, there is so much progress in those various fields.

01:00:36

And I feel like in local-first, we actually get to benefit and like directly apply a lot of like those, those great achievements from other places where like, we can now like it makes a, Big difference for the applications that we'll be using, whether there is a cryptographic breakthrough in efficiency or being more long term secure, et cetera.

01:00:57

And like, I fully agree that latency is probably by far the most important one when it comes to does it make a difference or not, but if my, like battery usage, et cetera, is another one.

01:01:08

And like, If I synchronize data a lot, maybe I open a lot of data, like a lot of documents just once because maybe I'm reviewing documents a lot and like someone sends it, or maybe I'm an executive, I get to review a lot of documents and I like, I don't really amortize the documents too much because I don't reuse them on a day to day basis. I think that initial sync also tends to matter quite a bit. But, it's great to hear that, efficiency seems to be already, very well under control.

01:01:39

So maybe rounding out this, you've been at Fission, you've been seeing, like, the innovation around local-first in, like, three buckets auth data and compute. As mentioned before, on this podcast, we've mostly been exploring the data aspect. Now we went quite deep on some of your work in regards to auth. We don't have too much time to spend on something else, but I'm curious whether you can just seed some ideas in regards to what does, where does compute fit in this new local-first world?

01:02:15

Like, if you could fork yourself and like do a lot more work, what would you do be doing in regards to that compute bucket?

⁠¶ The "compute" role in local-first

01:02:23

Yeah. So, we, we had a project, related to compute at Fission, right, right at the end. and, I'm very fortunate that I actually have some grants to continue that work after I finish with Beehive. I'll switch to that and then, after that project, see what else is, is, is interesting kicking around. but, essentially the motivation is, all the compute for local-first stuff happens Completely locally today, or you're talking to some cloud service, right? Like maybe you're using an LLM.

01:02:48

So you go to, you know, use the open AI APIs, that kind of thing. but what if you're on a very low powered device and you're on a plane? Right. you know, you still need to be able to do some compute some of the time. So the, the trade off that we're trying to, to strike in, in these kinds of projects is, what if I can always run it even slowly?

01:03:08

So let's say I'm rendering a 3D scene and it's gonna take a, a minute to paint, versus I have a, desktop computer, you know, nearby and I can farm that drop out to that machine because it's nearby in latency, and it has more compute resources. Or maybe, I need to send email to a mail server that only exists in one place. Like, how can I do these, you know, compute dynamically where I can always run my jobs or my resource management whenever, whenever possible.

01:03:40

Email server is a case where you can't always do this, right? But when somebody else could run it. Maybe I can farm that out to them instead. so there's a lot of interest, I think, in how do we bridge between what is sometimes called in the blue sky world, big world versus small world, right? So I have my local stuff. I'm doing things entirely on my own. I'm completely offline. And that is the baseline. But when I am online, how much more powerful can it get?

01:04:06

Can I, you know, I'm not going to ingest the entire blue sky firehose myself. I'm going to leave that to an indexer. To do for me. So when I'm online, maybe I can get better search, right? Things like this, or maybe if I'm rendering PDFs, maybe I want to farm that out to some, server somewhere rather than doing that with Wasm in my browser. So kind of progressively enhancing the app.

01:04:28

And I think, there's a lot of like recent, Oh, even more relevant with AI, but like with AI, this is particularly more relevant because now suddenly, we get lot of work. to be done that get massively benefits from a lot of compute. And with AI, in particular, I think it's also like, now we're in this, in this tricky spot.

01:04:49

Either we already get to live in the future, but that means, typically all of like our Our AI intelligence is coming from like some very beefy servers and some data centers and the way how I get that instant, those, these enhancements is by just sending over all of like my context data into those servers. well, I guess you could get those beefy servers also, next to your desk, but that is a very expensive and I think not very practical.

01:05:17

I guess step by step, like now the newest MacBooks, et cetera, are already like very capable and running things locally, but there will be always like a reason that you want to, fan things out a bit more, but doing so in a way that preserves like your, privacy around your data, et cetera, like leverages your, your resources properly. Like, if I'm just looking around myself, like I have an iPad over here, which sits entirely idle, et cetera.

01:05:44

So. It's as with most things, in regards to application developers, if it's the right thing, it should be easy and doing, compute in sort of a distributed way is by far not easy. So very excited to, to hear that you want to explore this more. Yeah. Well, and you know, especially things like AI, you know, the, the question always is I should never be cut off from, from performing actions, if possible, like when possible, sometimes something lives at a particular place and I'm not connected to it.

01:06:15

Fine, right? email being, you know, the canonical example here. Mail server lives in one place. Okay, fine. but why not with an LLM? Like, maybe I run a smaller, simpler LLM locally. And then again, when I'm connected and I'm online, I just get better results. I get better answers. so I'm never totally, totally cut off. mean, there's plenty of research on distributed machine learning and all of this stuff, but that's like, I would say in the future.

01:06:41

just kind of to put an arc on all of this stuff. and everybody's seen my talks before has probably heard me give, give this short spiel, once or twice. but you know, in, in the nineties, when we were developing the web, right. As opposed to the internet. the assumption was that you had a computer under your desk. It was a beige box that you would turn on and you would turn it off sometimes. Right. It was the last time you actually turned off your, your laptop, or your phone for that matter.

01:07:04

And when you wanted to connect to the internet, you'd tie up your phone line. That's no good. So you would rent from somebody else, something that was always online with a lot of power. And we now live in a different world, but we're still, you know, the centralized, you know, or the, the cloud systems rather, all have this assumption of, well, we have more power and we're more online and are better connected than you. Okay. That's true, but how many things do we, does that actually matter for?

01:07:32

And with systems like Automerge and, you know, local-first things developing, it's like, actually, you know what, my, my machines are fast enough now where I can keep the entire log of the entire history. And it's fine because we can compress it down to a couple hundred K and it's okay. And I'm fast enough to play over the whole log. And we can do all of this eventually consistent stuff and it doesn't completely, you know, hurt the performance of my application.

01:07:56

It's massively simplifying the architecture. Things have gotten out of hand. So there is this dividing line between things that are still, you know, the cloud isn't completely the enemy. They do have some advantages, right? But they don't, not everything needs to live there. And so we're moving into this world of like, how much can we pull back down into our individual devices and get control over them?

⁠¶ Outro

01:08:20

Yeah, I love that. I think that very neatly summarizes a huge aspect why local-first talks to so many of us. So I've learned a lot in this conversation and I'm really excited to get my hands on Beehive. As it becomes more publicly available, hopefully already a lot closer to the time when the, this episode comes out.

01:08:45

In the meanwhile, if someone got really excited to get their hands dirty and like digging into some of the knowledge that you've shared here, I certainly recommend checking out your amazing talks. I have still a lot of them on my watch lists and like our, I think there's many shared interests that we didn't go into this episode here. Like you're also, a lot into functional programming, et cetera. And I think you're, you're like going really deep on Rust as well, et cetera.

01:09:13

So lots for me to, to learn. But, If you can't wait to get your hands on beehive, I think it's also very practical to, play around with UCAN. I think there are a bunch of, implementations for, for various language stacks, and that is something that you can already build things with today. and I think, it's not like that Beehive will fully replace UCAN or the other way around.

01:09:36

I think there will be use cases where you can use both, but this way you can already get in the right mental model, and, and be ready, Beehive ready when, when it gets available. So that's certainly, what I would recommend folks to check out. Is there anything else you would like the audience to do, look up or watch? Yeah, so definitely keep an eye on the the Ink & Switch, webpage. we have lab notes, at the time of this recording.

01:10:03

There's just the one note up there, but I'm, I have a whole bunch of them, like many, in draft that I just need to clean up and publish. we'll also be releasing an essay, Ink & Switch style essay, on, on this whole project, in the new year. And, yeah, keep, keep an eye out for, for when this all gets released.

01:10:20

there's a bunch of stuff coming, in Automerge, in, in the new years, I can't remember if it's Automerge V2 or V3, but there's, you know, some, some, some branding with it of like much faster, lower memory footprint, better sync, and, and security. And like all of these sort of, you know, big, big headline features. So definitely keep an eye on, all the stuff happening in Automerge. That's awesome. Brooke, thank you so much for taking the time and sharing all of this knowledge with us.

01:10:44

super appreciated. Thank you. Thank you so much for having me. Thank you for listening to the Local First FM podcast. If you've enjoyed this episode and haven't done so already, please Please subscribe and leave a review. Please also share this episode with your friends and colleagues. Spreading the word about the podcast is a great way to support it and to help me keep it going. A special thanks again to Convex and ElectricSQL for supporting this podcast. See you next time.

Transcript source: Provided by creator in RSS feed: download file