#14 – Matthew Weidner: Architectures for Central Server Collaboration - podcast episode cover

#14 – Matthew Weidner: Architectures for Central Server Collaboration

Sep 03, 202457 minEp. 14
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

The guest of this episode is Matthew Weidner, a computer science PhD student at Carnegie Mellon University focussing on distributed systems and local-first software. Matthew has recently published an extensive blog post about architectures for central server collaboration which is explored in depth in this conversation comparing different approaches such as CRDTs and event sourcing.

Mentioned in podcast

Links:

Thank you to Expo and Rocicorp for supporting the podcast.

Transcript

localfirst.fm #14 – Matthew Weidner: Architectures for Central Server Collaboration

And this also feeds into features that you might want to give to your users, especially in productivity apps. You want to have that change history where you can see what was everyone doing. You also want to have undo's. basically what you can do for undo is when you create this action or this mutation describing, the high level intent, you can also tag along with it, a mutation saying, here's how to undo this operation later.

And then you store that somewhere and then you just have a queue somewhere in your app that's like the action queue. You can go through that and undo things in a nice way. Hopefully, users will be happier with this than if you just, you know, revert states exactly, ignoring collaborators updates. Welcome to the local-first FM podcast. I'm your host, Johannes Schickling, and I'm a web developer, a startup founder, and love the craft of software engineering.

For the past few years, I've been on a journey to build a modern, high quality music app using web technologies. And in doing so, I've been falling down the rabbit hole of local-first software. This podcast is your invitation to join me on that journey. In this episode, I'm speaking to Matthew Weidner, a computer science PhD student at Carnegie Mellon University, focusing on distributed systems and local-first software.

Matthew has recently published an extensive blog post about architectures for central server collaboration, which we explore in depth in this conversation, comparing different approaches, such as CRDTs and event sourcing. Before getting started, also a big thank you to Rocicorp and Expo for supporting this podcast. And now my interview with Matthew. Hey, Matthew. Thank you so much for coming to the show. How are you doing? I'm good. Yeah. Thanks for inviting me.

Yeah. Super excited to, to have you here. I think, our shared friend, Geoffrey Litt introduced us and he and, Matt Wondlaw and a few others have, when you were writing this blog post, the architectures for central collaboration, all of my friends shared this blog post with me. And it has since, like, served as a really, really reliable and good foundation to just provide an orientation around, yeah, how do syncing systems, et cetera, how do they work?

So this has been the, the initial touch point for me, but would you mind briefly introducing yourself?

Background: PhD on collaborative apps

sure. Yeah. So I'm Matthew. I'm a researcher and developer. I've been thinking about, local-first software, more generally the problem of how do we make collaborative software easier to program. So that's been, I guess, five years of PhD work and now working full time on a collaborative app, at a small company.

And yeah, the, the question for me has always been, how can we make building a collaborative app in the style of Google Docs or Figma as easy as making a smartphone app or a local only desktop app? Amazing. I'm curious, what led you, like when you say five years ago, you started working on this, what led you to, to that point? What motivated you to, to look into this? yeah, so it actually started a little earlier. So six years ago, I was doing a master's degree at the University of Cambridge.

I had to pick a master's thesis project, and some of the Ph. D. students talked about what their lab group was doing, the TrueData group, where they were working on an end to end encrypted version of Google Docs. The idea is that some professions, like lawyers or journalists, they want the collaboration of Google Docs, but they don't trust their data to a third party. where the, you know, the employees can look at it or it's on someone else's servers.

So they wanted this end to end encryption where you say only you and your collaborators can read the unencrypted data. So I thought this sounded like a really interesting project. I just joined them for my master's thesis. Turned out to be working with Alastair Beresford and Martin Kleppmann, um, mostly on the cryptography side. Then after that, I decided that actually the collaboration side sounded more interesting, and I wanted to work on that for my PhD. Very interesting.

What did the technology landscape at that point look like? I mean, today there's like Automerge and quite a few other technologies that already try to attempt this. what did the technology landscape back then look like? So this was before the local-first essay. I think I actually saw a draft of the local-first essay that year, now as a master's student. Automerge I believe had started, YJS had started, but I hadn't heard of people using it yet.

but yes, people were just getting started to use this idea of. collaborative data structures for the web, not necessarily with central servers like these CRDT libraries were just getting started, and I don't know if the local-first world had really even started yet at that point. Right. Yeah. I think there are so many people who thought about similar problems over like decades before then.

There was like CouchDB and PouchDB and like a lot of great minds already thought about this, but I feel like the real momentum started with the local-first essay. So I'm curious, take me through a little bit of like the, the five years working on that. What were some of the milestones? How did you go about starting this in the first place? Sure. So the, the main things I was coming at it from a more academic perspective, like I really have a theory math background.

So I was looking at the, the theory of CRDTs, these conflict free replicated data types. Which, sort of, the idea is that it's a data structure that's copied on multiple devices. You put your data in it, like your app's data, and then one user can change their copy of the data whenever they want. At some point later, you'll sync up in the background and come to a convergent copy where everyone's looking at the same document again.

This is really designed for the sort of peer to peer model where you don't necessarily have central authority, it's just everyone updating their own data. and also this local-first spirit, where you always update the local copy of your data first, and then you talk to everyone else and say, here's my changes. So I spent the first year really just reading the papers in that field.

So there's a classic paper by Mark Shapiro right now for 2011, a lot of papers by Carlos Vaccaro and his collaborators, yeah, just trying to learn what are these data structures, what can we do with them. Got it. And so after that, you started your own implementations of CRDTs. And was there any sort of reference app that you oriented this around? Not really. So there's actually, there's a reference CRDT.

So we started with this paper, which is very theoretical about this way that you could maybe combine two CRDTs. So the example we use, which is a bit silly, is if you have. a number that you can add things to, like maybe a bank account balance you can add to, you can also multiply to if you're applying the interest. How do you combine these two operations in a single CRDT that can be updated with either add or multiply? So then my advisor had this idea, let's implement this in a library.

and there that already set some sort of unique design principles, which is that we're going to assume you're making your own CRDTs. It's not just a collection of CRDTs we give to you, like map, list, et cetera, actually going to be whatever, and then some way to combine them together. So that was really the starting point, is that we want to make a place where you can make your own CRDTs and compose them. I don't think we really had a specific application in mind at the beginning.

Was that technology ever released or open source or talked about in some way?

Collabs

So we did make a open source library about it. It's called Collabs. So it's written in TypeScript. we have a documentation site. I think it's collabs.readthedocs.Io. it's definitely still an academic project. So it's really about, here are these data structures that you can play with and you can make your own things. we do have some basic demo apps, like your basic, uh, You know, text editor. there's a to do list sort of thing somewhere.

And then there is an archive paper about it that you can read, which goes into more detail about the system design and why we did things the way we did. Got it. And so it sounds like you've really gone super deep on this, mostly oriented from the CRDT side of things. But, as you read the papers, as you were working on this, you also got a better understanding of the larger space and the other approaches.

And I think you got more curious about the other approaches and this is what you've laid out so clearly and brilliantly in this blog post that will be linked in the, in the show notes. And I highly recommend anyone who's listening to read it in depth, if you're curious about those topics.

so the, the blog post called Architectures for Central Server Collaboration, and it provides a really nice way to think about this, like provides of like a. Hierarchical structure of what are the design decisions? What are the trade offs? What are the concerns about the different approaches. And so I've, I'd love to just go through that step by step. maybe you want to walk us through it.

Architectures for Central Server Collaboration

Sure. Let's see. Yeah. So the, the idea of this blog post is we're thinking about. Real time collaborative apps. So these are apps like Google Docs, Figma, Notion, that sort of thing. And sort of the distinguishing feature of these apps compared to more traditional web apps is that, you know, when you make a change, it updates your local copy immediately. It's not just click a button, go back to the server, get a new web page and show it to you.

It's click a button and something updates on your own screen immediately and eventually it'll tell the server what you did. So this blog post was trying to think about, in general, with these real time collaborative apps, like, what are we doing in a semantic sense? Like, what does it mean to be real time collaborative? And then, what sort of, you know, the high level of how you can implement that in the most flexible way possible.

And so you've derived a couple of like really nice ways to, to think about that, like in terms of dimensions and later on you, you can nicely summarize it in a nice overview table. would you mind motivating some of the dimensions that you come up with here? Sure. Let's see. So I guess just for context, my own background is, as I said, thinking about it from a CRDT perspective.

This is very much the perspective if you have some data structures, which are usually pretty low level, like maps and lists, and you have some prescribed operations that you can perform on them, and then it'll sync it for you under the hood. And then also in the CRDT model, it's usually not really assuming a central server, where the central server is doing basically the same thing as the clients.

So what the dimensions are thinking about is, okay, what can we do that's different from just the CRDT model? And there is Yeah, there's really three dimensions. I guess maybe the most interesting one is the is how you describe operations on the collaborative state. So you have sort of the, the database or key value store model, which is, you have these low level state changes.

Like when I check a box in to do list, that's creating a row in a database that says, you know, to do list checked, true, that sort of thing. And then there's also this opposite model, which is sort of the more event sourcing approach where you have these high level operations, sometimes called mutations. And this is where, when you change the data, you're actually telling the server exactly what the user's intent was.

You say, the user wants to check this box and make it true, and then you broadcast that high level intent back to the other users. And tell them what to do and how to update their state. And I think this is also like this distinction between the intent of a mutation and the, the change more directly. I think this can be, a little bit of a subtle difference for people who haven't built something with either approaches yet.

But, uh, I think to draw an analogy from the web world, When you're working with something like Redux, this is where I'm not sure whether you ever built some, some front end apps with Redux, but this is where you have, for example, if I remember correctly, the, the concept of an, of an action, which is basically the idea of an event where you declaratively say like, okay, there is an action or there is an event for, someone wants to complete this to do.

Then further down the road, there's like a reducer, which then in, for example, maintains a list of to-dos and maybe kicks it out or maybe, overrides a property in the to-dos array and says something is done as opposed to the other approach where you directly mutate the, the state.

Which is, for example, in the web world, we're using something like MobX, etc. And so now we're talking here about the equivalence for distributed states, and where CRDTs, I think, give us more the analogy, this might be a stretch, but give us more of like an equivalent of like something like MobX, where you mutate the state more directly. And the CRDT underpinnings nicely make that principled constrain you in the in the right way and then also distribute the state.

Did I summarize this in the right way? Yes, good description. Maybe another way to think about it that's in more illustrative than to do list is to think about like the the video game example. So for example in a video game if you press an arrow key on your keyboard you can do sort of the high level intent is I want my character to move forward. And then your game server will interpret that intent. It'll try to move your character forward, but if there's a wall in the way, it'll stop you.

And if you step on a pressure plate, it'll do something. and then ultimately compute the actual state changes, which are the low level things of like, what coordinates are my player at now? what is the state of the world in terms of, you know, doors that are open or closed? And it'll send those low level state changes back to clients. So that's another example of this distinction between high level versus low level intent.

Right, and I think this is now also a really important distinction because in the Redux or MobX example, it's, like, all of that is happening on the local device. There's no cheating in that regard, but when you're talking about games, they can actually be cheating. And how do you prevent that particularly in a multiplayer context? And this is where you, what do you do on the client and what do you do on the server, maybe need to be different things where the server acts more than authority.

And the client rather provides, instructions as opposed to providing the authoritative source of truth for the actual state of a world. And so this is where the intent is not equal to the reality that is coming out of it. And I think this is nicely illustrated in your article through this game example, where you can basically send to the server, like, Hey, I want to move forward. The server knows where you were before, and the server tells you afterwards, like, now you're here.

The client locally can probably, if everything is in an okay state, has probably already arrived at the same conclusion, but, at least this way the client can't override to say the player position is somewhere in an illegal state.

Server-side rebasing

Maybe this sort of transitions into the next point or another dimension in the article. Which is, what does the server actually do when it receives an operation, in particular an operation that's out of date? So the classic example is if you have a like counter, like a post has some number of likes on it, if it has six likes and I send a command to the server that says, I like it, change the number of likes to seven.

But what if someone else also liked the post in the meantime, and their like made it to the server first? So now the like count's already seven, I don't want to set it to seven again, I want to increase it to eight. And there's a, yeah, so basically there's a few philosophies in how the server should process this operation so that it still makes sense. I mean, technically it's legal to keep the original operation as just set the count to 7, but that's not really what the users expect.

So the one philosophy, sort of the CRDT way, is to say, I'm going to phrase my operations in such a way that the server will know what I want it to do, And it'll do the correct thing. So for a light counter, the classic way is you say, increase the light count by one. The server can get that, and even if the count has gone up since what you originally thought it was, it's still going to add one and do the proper thing. So you're going to end up with eight lights instead of seven.

And sort of the other spirit is the operational transformation spirit. So this is an older technique for collaborative apps that's used by Google Docs and was developed in the 90s for the Jupyter collaboration system. And here the spirit is, the server is going to look at your operation, it's going to look at all the intervening operations that you didn't know about but the server has received already, and it's going to use those to sort of compute what your new intent is supposed to be.

So this example, you would tell the server, change the like count to seven, but the server would see that there was an intervening change the like count operation already. It's going to rewrite your operation as change the like count to eight, and actually apply that to its state and send that operation to the other users. Got it. So, and this is basically about the, the convergence aspect And I suppose where this code is running, this can equally work on the client as well as on the server.

So this is sort of orthogonal to the, the game example case that we talked about, which is more about the authority. Yeah. Yeah. So this isn't about how does the server. interpret operations from, like, a correctness permissions perspective. It's just how does the server handle operations that are sort of stale, in the sense that the client originally applied them one state, but by the time they arrived at the server, the state had updated because other people were doing things.

Now the server has to figure out what to do. Yes, this is the server side rebasing. This is where the server has to rebase your operation, or the incoming operations, on top of whatever its new state is. And sort of the analogy is to git rebasing, where you might try to apply a commit on top of some new commits that weren't there when you first tried it. Got it. Okay, so that is one dimension that you've nicely dissected here in this, in this blog post. what is the next one?

Optimistic Local Updates

So the next one is the the optimistic local updates on the client. So now if we assume there's an central server, everyone's taking these updates, they're sending these operations to the server, the server knows what the state's supposed to be. And what you could say is just the traditional, web app model. If I submit an operation to the server, it processes it, it sends back, sends me back the result, and now I get to see it.

So if you think like, um, you know, traditional HTML form, you submit your operation to the server, it gives you a new page back saying what it is. But with modern apps, we want to do better than that. We want to say that when I perform an operation on the client, it's going to update my own state immediately. And that's an optimistic update because I'm sort of optimistically assuming that the server is actually going to receive my update. It's going to process it in the way I expected.

No one else is going to interfere. this is just a nice property in terms of making the app feel more responsive. You want to see your key presses immediately. You want to see that button get checked immediately. So the question is then, how do we actually do that? Or, I guess the first question is even, what is the correct answer? What does it mean to optimistically update my state?

And I guess, yeah, sort of the conclusion I came to that, you know, people have come to in computer games as well, is that you want to take the latest state you've received from the server, plus your own optimistic local operations on top of that. And that's always what the correct state is. And even as you receive or perform new operations, you're just maintaining that state.

Like from your first dimension, which is about server side rebasing, now it's a lot of the same ideas, but applied on the client where you need to make the same trade off decisions again, you might come up with different conclusions based on the server and based on the client, depending on your use cases. So that, that is the second dimension. And, then you're, you talk about the, the form of operations. So how, a state is changing based on mutations, based on state changes.

Can you go a little bit more into, into detail here?

Form of operations

Sure. Yes. This is what we were talking about at the beginning, where when you, you check a box in a to do list, you want to say, Am I updating a row in a database that doesn't know anything about to do lists, or am I sending a high level mutation that says, like, this user wants to check the to do list and, you know, do that action or maybe do something else if that's not valid anymore. So here we get to choose which form of operations we want.

We want to send these high or low level from the client to the server. Then once the server updates its state, does it want to send high or low level changes back to the clients? yeah, so the video game example is an interesting one where you actually make different choices usually. So usually you'll send the high level operations from clients to the server. You say, I want to move forward, I want to shoot my crossbow.

And then on the way back from the server to the client, usually it won't send those actual actions. It'll just send the results, which are changes to some basic key value store. But you can also make different choices, like you can say, you know, Git is an example where it's sort of high level mutations. You're saying, like, I want to, you know, change this text paragraph in a specific file, and Git will send those exact operations to every client.

It's not going to interpret them at all on the server and change them into a low level change. Whereas if you use something like the Firebase database, that's all low level. You send low level changes to Google servers. Where you say, I want to, you know, set this key to this value or I want to delete this object in the database. And it's going to send that change back to clients without having any idea what the keys and values actually represent. That makes sense.

And so I think this is also nicely drawing a boundary between the more declarative approaches that you have in mutations that you can reason more clearly about, like in the context of your domain. But it also only makes sense in the context of your domain. Whereas with state changes, this is the appeal of CRDTs.

This is you just mutate a document and, the, the underlying mechanics, make sure that the state changes are behaving in, in a useful way since I, I suppose like listening to the state changes yourself in your app, that's no fun. So you really want, a system like CRDTs to make sense of that

. So now with those three dimensions and I go through them again, the server side rebasing, the optimistic updates and the form of operations like declarative versus state based, now you've combined all of that in a really nice, classification table where we get a whole bunch of like matrix cells here with different technologies.

So, Again, highly recommend, actually reading this and looking at the beautiful table for yourself, but in the different cells, you've also filled in a couple of existing technologies and see where they slot in. So would you mind going through the different technologies and maybe sharing what's interesting about it?

Classification table

Sure. So I guess first I can talk about the one cell just near the bottom, right in the table, if you're looking at it. Which is the CRDTxCRDT cell. So this is basically the place where I spent my most time reading about CRDTs, working on this academic open source library.

And that's where the operations that users send are really these low level state changes to some sort of magical replicated database, where you update the database, like normally on your local device, and it promises to do this synchronization in the background and make sure that everyone converges to the same state immediately without really caring about what specifically your data or operations are. So that some prominent examples.

So Firebase Realtime Database, I think of as an example, also the CRDT ish libraries, like YJS. also, yeah, Triplit, InstantDB, those are all sort of in this quadrant or in this cell thing that we're going to replicate low level changes for you, just like as they are. another cell on this table, which is sort of near the bottom left, we mentioned in the computer game example.

In a computer game, you're going to send these high level actions to the server, which is going to figure out what to do with them, and then communicate the state changes back to clients. that's another interesting cell, both because it's sort of old, like, you know, this is, starts with the Half Life game engine in the 1990s, so people have been using this technique forever, just not in web apps, it's in computer games.

But more recently, Replicache implements this model as a data sync layer for web applications, which I know a number of companies are using. and I found that really inspirational reading about how Replicache works. I'm glad to have learned about it. Right. And I love like how you compare those technologies. Both technologies.

I love, love like the Half Life game engine spent way too much time, playing various Half Life game engine games, where it's very, very intuitive that if you play, press the W key, which moves you forward. That's like communicating the intent. To the server, you don't tell the server like, Oh, I'm at these coordinates. You just give it like a history of like which keys you pressed and therefore like how you moved. and it does some validation of like whether all of that is okay.

And it sends you back the location. And it's the same about Replicache where you send it a few mutations And on the Replicache server, it interprets all of that and sends back to you the state using the server side knowledge, which might be different than the client side implementation, so it's the authority.

So that is very clear and very nicely laid out here, where you send the intent, you send the declarative mutations, and the server sends you back some state changes, as opposed to what you before mentioned, with a CRDT times CRDT, where Both on the client, on the server, you run the same CRDT convergence. And, uh, so those two, those two cells are very clear. Yes, exactly.

And then, yeah, so I guess the remaining cells of the table, they mostly, they either use state changes in both directions or they use high level mutations in both directions. So, let's see. Two interesting ones. Automerge in ShareDB. They're both doing a similar idea to the CRDT libraries, like YJS, where they're sending these low level state changes around and making sure everyone converges to the same state, but they have a different way of doing this internally.

So with Automerge, what you're actually doing is you're performing these state based Automerges that a library is basically a JSON CRDT, but the way it works is more of an, like an event sourcing model, where you have this total order of CRDT style operations. All clients are going to make sure that they eventually confer to the same total order.

So everyone will agree what operation 1, operation 2, operation 3, etc. The state is the result of applying all of these operations in that fixed order. And if, you know, people do operations concurrently on their different devices because the network's not working, then we'll just sort those operations into some order later, make sure everyone agrees on the same order, and that's giving you your state.

Interesting. So given that Yjs and Automerge, which I think are in the web ecosystem, the, the two most popular CRDT implementations, they actually do differ in this dimension of like how state changes are implemented. again, Firebase, as well as Yjs. following more strictly the CRDT approach and Automerge using server reconciliation. is there an example that comes to mind where this, in a example app use case would differ and where you would use Automerge or Yjs, intentionally because of this?

I think in terms of the, the external. the API, or what you see as a user of these libraries, it doesn't really differ. It's more just in terms of the implementation, I guess, in, in this totally ordered model like Automerge uses, you don't have to worry as much about getting the math exactly right. Like, am I sure that these two operations actually do the same thing if I apply them in different orders, which is this mathematical requirement that you have to satisfy for CRDTs.

So that makes it a bit easier on the, to like the correctness and sureness of the implementation. Whereas with the YJS or CRDT style, if I'm just going to apply my operations directly, in principle that can be a bit faster because you don't have to worry about rewinding your total order of operations and then applying a new thing and walking it forward again.

That said, usually if you're making a collaborative application with CRDTs, you don't really need to process more than a handful of operations every second, so it doesn't matter if it takes a little bit longer. Got it. Okay. That, that makes sense. So in the CRDT approach, wherever I am currently in my state, I can just apply on top the existing or the new events.

And, with a server side reconciliation approach, this is where depending on what the new events are, where they sit in terms of the timeline. I might need to, uh, Wind back, apply them, and that might take a little bit longer, but possibly also makes the implementation a bit easier. Yeah, I guess just one note. So, you've been saying server side reconciliation. Automerge does not actually require a server. It's a completely decentralized model.

The name is just sort of by analogy to what you would do if you would have a server. You would put all the things in the order that the server receives them. Automerge instead infers a sort of order in a decentralized way. That makes sense. So, we've now mostly talked about the state changes side of it. And, we talked about how our optimistic, locally, how are the state changes applied.

But we didn't talk too much about the mutations times mutations quadrant, which also has couple of, like, Subsections. So let's dig a little bit into this one.

Event sourcing

Yeah, so this, this mutations, mutations quadrant, this is sort of the event sourcing idea where instead of sending around low level changes, we're going to send around the actual user actions, both from users to the server and from the server back to other users. So an example would be like, if you do a find and replace operation, or maybe you rename a variable in VS code, the operation that you're going to send to the server actually says, you know, rename this variable from foo to bar.

As opposed to a bunch of low level edits where you go through and change the actual characters, F O O to B A R, in every place they happen to exist. So this quadrant is interesting because it gives you a lot more flexibility in terms of what You can communicate this really high level intent, like code refactors or actions in a computer game, and then the server can interpret that intent in a reasonable way. You know, applying permissions, maybe you can see that someone else has also been.

you know, added a new reference to that variable. So it's going to rename that reference as well. and you can do this a lot more flexibly as opposed to if you just see the low level intent and have to sort of, or the low level operations and sort of have to guess what intent that corresponded to. So there's a few systems along these lines. So one of them, which I link here, which is not as well known is called Actyx.

It's actually a company in Europe, which does, Like iot, coordination in factories. So if you have some, you know, robots moving around a factory floor, they're talking to each other over the local network and they might say things like, oh, someone needs to go pick up this box and move it from point A to point B. one of the robots can say, okay, I'm going to go pick up, pick up this box and move it. And that way the other robots know not to move it themselves.

And these, these actions or messages, they just get put into a log that all the devices in the factory see. And that way they sort of know what's going on, what tasks are outstanding, that sort of thing. Right, and I think one very nice benefit of that as well, is that if there's some real world stuff happening, and whether in a factory a robot has moved, or you've now like manufactured a new part, or destroyed a certain thing. Now you have like a real log of those events.

So in case something goes wrong or in case there's an audit, now you have some hard facts that you can look at. So it's not just useful for an app and a machine, but it's also useful for human purposes to understand what has happened. Exactly. Yeah. And this really feeds into the idea of business logic. You know, in a lot of applications, we have this. Business logic that we want to do in terms of, you know, what happens when a user clicks this button.

And it can often be more complicated than you can express with simple database changes. And keeping these actions around gets you really first look at what the, the business logic was supposed to do and also have the server customize its response. Like you can check permissions at a very fine grained level. You can make decisions about, you know, bank balances going below zero and that sort of thing.

yeah, sort of tossing to some of Pat Helland's articles, if you've seen like building on quicksand or, immutability changes everything, this idea of, you know, accountants don't use erasers, all those ideas. Yeah, exactly. And I think for web developers, this is also very intuitive, where if you build a React app, for example, and you have Some complex state that you express in react use state.

And now you try to somehow do the right thing based on how the state changes using some react use effect, for example. They're like, you should use better, better mechanisms and better foundations for that, for example, using XState for like some, some state machines, et cetera. This is where you.

Very explicitly and declaratively deal with the state changes as opposed to like, trying to somehow, reinterpret how some, like, nitty gritty state things have changed, whereas, like, if you just have a beautiful, simple event that is easy to understand, okay. That thing has changed. The robot has entered this room. that's much easier to understand than interpreting the coordinates of a certain thing.

And this also feeds into features that you might want to give to your users, especially in productivity apps. You want to have that change history where you can see what was everyone doing. You also want to have undo's. basically what you can do for undo is when you create this action or this mutation describing, the high level intent, you can also tag along with it, a mutation saying, here's how to undo this operation later.

And then you store that somewhere and then you just have a queue somewhere in your app that's like the action queue. You can go through that and undo things in a nice way. Hopefully, you know, the users will be happier with this than if you just, you know, revert states exactly, ignoring collaborators updates.

Text & list editing

Right. So, in this quadrant of the event sourcing quadrant here, there's still a couple of like sub cells, um, how the mutations are applied, namely the serializable, CRDT ish, and OT ish. Can you give a little bit of an intuition how they differ in the implementation and when you would choose one or the other? Yeah. So the examples here mostly concern text editing, which is not a coincidence.

So in text editing, when you're doing any sort of collaborative text editing, like in Google Docs, you have this problem that your operation might say, I want to type, you know, the word hello. After, you know, maybe I want to type the word world after hello. So the, this message that you're going to send to the server might say something like insert world at index five, because you know, hello is five characters long, but someone else might also edit this world. Hello, or this word. Hello.

Before your change makes it to the server. So maybe now it's like, hello there world. It's what you want to happen. But your edit is still trying to target index 5, so it's going to go in sort of the wrong place. You want it to shift over to accommodate edits that have been before yours in the document. And of course, this gets worse if you're, like, editing the bottom of the document, someone else is editing the paragraph on top.

All of your array indices are going to get horribly messed up by the time they reach the server. Like, they're not going to be accurate anymore. So the three choices here are basically different ways to patch up those indices so that they make sense again. That makes sense. And, I think this is a common theme for local-first software is that there are a couple of like special buckets that deserve special treatment, namely text editing and also lists.

And those, the, the latter two are, I think also like closely related. So on that note, the article, you also went, went a bit more in depth. on possible approaches to tame lists in this distributed setting. Will you mind sharing a little more context about that?

Sure. Yeah, so if the list, as you said, it's hard and it's hard specifically because of this index problem where your obvious choice for what operations you're going to send over the network often don't make sense anymore by the time they reach the server. Um, and the solutions really fall into two camps.

There's the operational transformation camp, which is used by Google Docs Which is where you're going to send, you know, index five, that sort of thing, a raw number, and the server is going to look at these, this index. It's going to look at all the intervening operations that arrived that you didn't know about, but have already reached the server. And it's going to sort of like walk through those one by one to try to figure out what index you actually meant.

Because it's going to see, okay, if you inserted something at index five and three other characters have been inserted before that, I'm going to change it from five to eight, just adding five and three. Got it. So a very common app use case for this is, let's imagine Notion where on the left sidebar, you can have your, your favorite, pages pinned and those you control the order. So you can move them around or also on a Notion page, all the blocks you can reorder yourself.

And a very naive approach would be, whenever you reordered something, you send to the server a full copy of the entire document, and that contains the order. But that is not very useful in the collaborative setting where now the merge radius of the entire thing is the document. And it doesn't really allow for collaboration on a per block level. And the another naive approach would be to send the block and say like, oh, now I'm at position three.

But something else might've already, moved and it's no longer in reality position three. So this is what this is all about and, uh, the different approaches for this. Figma has written also a really nice blog post about this, how they, tamed this problem, where I think they call it fractional indexing. And I think you connected the dots here.

can you, draw a line between the different approaches here, the CRDTish approach and the OT ish approach, and how that relates to the, the list indexing problem?

Fractional indexing

Yeah. So the, the OT ish approach, that's what I was describing with, you know, you send index five to the server, but the server's going to rewrite it to index eight. So this is really this idea that the server is going to. Mutate your operation to try to make it still make sense.

Then the CRDT ish approach, which is used by fractional indexing and YDS and those sort of things, is actually the clients, instead of sending, you know, index 5 to the server, they're going to rewrite this message in a way so that it still makes sense, even if it reaches the server a little bit late.

So, for example, you could have, in fractional indexing, you might label your characters with these decimal numbers instead, where you say, like, the characters are at 0.1.2.3, etc. And then if you want to add a new character in between 0. 4 and 0. 5, you give it the label 0. 45. So this isn't really a list index, it's what they call a fractional index. And the idea is that this will still go in between the characters at 0. 4 and 0. 5, even if some other changes happen elsewhere in the list.

Because those other changes don't actually change your fractional index. You're keeping the characters at the same 0.4.5.6, etc. Right. And now the 0. 45, this is what you use to derive the, the real. Integer indexes from by lexicographically ordering it. Got it. Yeah. So I'm using the same mechanism inspired by the ideas of like the, the Figma blog posts, et cetera, for Overtone.

And I'm even using it before I started implementing syncing, just because I found it to be the Easiest way to keeping a list ordered in an event source system, since this is what I'm also already using to circumvent schema migrations for the, the app I'm building. So it's, I think it's actually a very simple self contained concept that can be applied even outside of the scope of a full blown local-first data stack. Yes, exactly. Yeah. It turns out so what.

Text editing CRDTs are doing is very similar to factional indexing, just with some extra changes to solve some bugs, basically, like what happens if two people try to insert a character at the same place. Factional indexing breaks down, CRDTs just have the smallest change needed to make this not break down. I agree with your point that this isn't really a collaborative thing. This is just a general data structures thing.

It's like the way we describe text and as an array is sort of flawed because array indexes are changing all the time, even though the character is staying the same and staying in the same place. Intuitive sense.

So what we really want is an abstraction where the characters keep the same identifier at all times, whether that's a fractional index or whether it's part of the list CRDT internals, and then that's how we should represent sequences that can move around, which is basically any list in a GUI where you can drag something in between two existing elements.

Combining approaches

That makes a lot of sense. And what's also so cool about like seeing all of the different options in this classification table is that you don't have to choose exactly one for your app. what I'm planning to do for, for Overtone is mostly follow the event sourcing idea for collaborative state. However, in the places where I have.

complex, particular problems such as a description text or like a document text, this is where I most likely will resort to something like Automerge or Yjs to let those technologies deal with the text editing, the collaborative text editing use case.

But, and with that, I'm gonna I think I get the best of both worlds where I get all the benefits from event sourcing for the, the more high level data structure of my app and for the specificness of the text editing, I embed a little CRDT use case in the broader document use case that I tame with event sourcing. Do you think that general approach makes sense? Yes, that's exactly the way to do it.

Yeah, if you look, there's a lot of, you know, blog posts saying about how CRDTs are complicated or they're hard to implement. Usually these blog posts are talking specifically about the text editing part. That's sort of the hard part where you want to let someone else do it and have their nice battle tested, fuzz tested implementation.

But for other data structures, like if you have, you know, sort of a database table sort of structure or a map structure, it's easier to make your own sync engine for that and just drop in an existing library to handle the lists and text editing. Right. So it's funny that you came from like going super deep on CRDTs of like spanning this, broader table of possibilities. And it seems like now you're actually much more drawn.

So the first quadrant around event sourcing, what, led to, to this interest for you? Let's see. So it might just be, you know, the grass is greener on the other side. I haven't tried to make an app or a library using the event sourcing approach yet. So maybe I just don't know what's wrong with it. but it really started out about a year ago. I was thinking about version control. This was around the same time that Ink and Switch was thinking about version control with their, upwelling essay.

and the idea was like, what if we could do this Git style model where you make changes to an app, like a text document or a spreadsheet. We just put these into linear branches, and then when we merge them, you copy from one branch to another.

And originally, the idea is we're going to put CRDT operations in these branches, because that's what I'm familiar with, but I eventually realized like, actually, because the branches put the operations in a total order anyway, we don't care about the CRDT correctness properties that say that you can apply operations in different orders. So we might as well just use arbitrary operations.

And that unlocks a whole lot of possibilities that would have been hard to do in a CRDT system, like you can do these rename variable or find and replace operations, maybe even like a change tone with AI operation. Just put these in a log, have the log be in a fixed order, and run the operations in that order. That makes sense.

so aside from the versioning use case, can you think how, using a CRDT approach versus an event sourcing approach might be a good or a bad fit for different categories of apps that you can think of? Sure. Yeah, so I think the advantages of a CRDT approach, well first off, you can do this more database model. If I'm going to put my data in a magic box that says database, and it's going to synchronize it for me, I don't have to worry about it.

Whereas you're doing an event sourcing approach, you have to think more carefully about what are my mutations that I'm sending around? How do I process them? How do I make sure that they still make sense, even if someone else's mutation reach the server first? So that's a bit harder. the other advantage of CRDTs is the efficiency perspective.

You can have, the CRDTs can implement operations in a very efficient way so that you're not going to accidentally say, you know, I'm sending this mutation to the server that's going to take. an entire second to process is going to slow everyone down. It's sort of the, the general trade offs that CRDTs behave more like a database. They, they just work and they're optimized to be fast. Which, with an event sourcing model, you get flexibility.

You can send arbitrary mutations around, you can have arbitrary business logic on the server, it can even differ from the logic on the clients. Just coming back to the video game example, you have a lot of logic that the server needs to step through, checking permissions, checking collisions, that sort of thing. Which would be hard to do with a CRDT or with a database model.

Event sourcing challenges

So you mentioned that you haven't yet built larger systems with the event sourcing approach, but I think you've still done a little bit of research on what might await you in the event sourcing world. So could you outline a little bit of like the potential concerns you see on the horizon when going all in on event sourcing? Yeah, so I guess the main concern always is if you're Sending around this log of events to clients.

And if you're storing this as your single source of truth, then storing all these events forever, it might take up a lot of space. If you could imagine a text document, if each text character corresponds to 100 bytes of JSON, then the history of all the events is going to be a hundred times bigger than the actual text document. Even if you've since cleared out the entire text document, now it's empty. You still have all this state.

So that's the main challenge is just how do we store the events efficiently, how do we maybe compact them, say I don't need these events anymore, I'm going to throw them away and replace the state, while still making that play nicely with, you know, clients who have been offline for a month, that sort of thing. Which sort of mechanisms do you think will mostly help to overcome some of those issues?

I'm hoping the main mechanism is just To give up, basically say text is very small for any, the main sources of lots of data in your app are blobs like images or videos, which you can put somewhere else anyway. And then for the actual event describing the fine grained changes, just store them all and it's only going to be a few megabytes per document anyway. Got it. Yeah. And I think on top of that, there's also the compaction use case.

Now that I have a little bit more, insight on, on that approach with building Overtone. for example, given that everything you do within Overtone, whether it's playing a track, whether it's navigating within the app, whether it's adding a track to your playlist or follow an artist, all of those are an event and Adding a track to a playlist, there you do a lot less of those than, for example, in the background, the app auto playing the next track, which is also an event.

And another kind of event is if the app tries to authenticate with a music service such as Spotify to exchange tokens, which it needs to do at least Once an hour. So it does so a little bit ahead of time. So, also when you reload the app, it needs to do that. So just by the fact by, the app running in the background over time, it Racks up quite a lot of different events.

And I think they're the interesting part is the nature of the events and the nature of those events also allows for different trade offs. So me putting a track into a playlist, A, there's going to be like way fewer events of those. and it's fine to keep the entire history of this around. What's so cool about this also, the fact. That, I have this event allows me to trivially implement a feature like that.

I can hover over the track and I see the information when was it added by whom was it added to, to the playlist. It also makes implementing things such as undo much easier, but the other kind of events, which might be implicit or which might just be a lot more, higher quantity, what I've seen is that, it's not as crucial to keep those events around for eternity, but some of those events are then also made irrelevant by follow up events of the, the same type.

So for example, if your app has authenticated and overrides sort of like an off state into the database, and. two hours later, it has already done so 10 more times. I don't need to keep the entire history before that, maybe besides auditing reasons, so I can just at some point remove the old events, which keeps an otherwise always growing event log at a, for this given event type at a much more like constant size, which makes it much more feasible.

Another thing that I, started thinking about is like, what if you have not just like one event log, but what if you have multiple event logs? And what if you have, a hierarchy of event logs? This is something that I also want to think a little bit more about, Let's say you have a, a tree of, playlists, like a, a folder of playlists. So you have a, a playlist. And that playlist could also, possibly be a folder of other playlists. So now what does the event log exist for?

Does it exist for like, everything in my library? Does it exist for a broken down to. only giving information about which playlists I have, and then I need to subscribe to another playlist, but what if that playlist is a folder? So this hierarchical aspect of it, I think this will keep me busy for, for a little bit as well. Do you have thoughts on those problems? Yeah, I mean, this, the, what you're saying is really interesting. It makes me think of the problem of ephemeral presence.

So, you know, in Figma, when your collaborators are moving their mouse cursors around, you can see where they're at it every time. I would imagine Figma is not actually persisting those mouse movements, it's just sending them over the usual channels so that you can see them live, but then you forget about these events because they don't matter anymore. So I wonder if you could maybe do that for a lot of the events that don't matter as much, or even in a text editor.

So one thing that's really hard with a collaborative text editor is you'd like it so that whenever you press a key, that key is immediately sent to your collaborators. But if that actually creates an event that's persisted in the log, then you have this issue of, you know, 100 times as much storage as key presses, but maybe what you could say is when you press a key, that's like an ephemeral presence message. It's not actually stored, it's just sent over the same channel as the mouse movements.

And this is sort of like an ephemeral mini log that's stacked on top of the actual event log, and then every 10 seconds or so you send a compacted version of like the entire sentence that the person typed as a single event, and that's what's actually stored on the backend. I wonder if that could help at all, or if this is even possible to implement. Right. I've actually implemented a small version of that already, which I call local only events.

The idea of that is that, there's kind of like hierarchies of syncing as well. There's like syncing, just from the main thread to the workers thread, which is responsible for persisting the data, but also from one tab to another tab. And, those two tabs should in some regards, Converge, and in some regards, allow divergence.

so for example, if you have Notion open in two tabs, you want to be able to navigate to different documents and those different tabs, but if you're in the same document, you probably want to see the same thing. So it's the same that applies to a music app. Maybe in one tab you want to have. The playback of one track and the another one, you want to not have the same playback, otherwise you hear it twice. but you want to maybe work on a playlist.

And so keeping things in sync is important, but I don't want to, constantly as the playback progresses, have persistent events for this. So I try to A, have like, very Deliberately small events. And the other thing is where I have events that are broadcasted around. But, if the app reloads, it doesn't rehydrate from those. It either catches them midway or it's not important enough. that it shows it so very similar to the presence feature in Figma.

So I have implemented a first version of this, but I think there can be use cases where you might want to keep them around for like 10 minutes or 10 seconds, like you say, and then have a version of compaction. I think that that's really interesting. What you're describing sounds really cool. I'll be interested to see this code someday. I'm planning to open source it a little bit further down the road.

Local-first ideal are still hard to reach

So you've now been in the local-first space for over five years, and I'm sure you've seen many technologies come along over time. I'm curious whether you have certain strong opinions about the local-first space or the web ecosystem more broadly. Yes, I guess one. Well, this isn't really an opinion, but just I'll make an observation that the local-first movement has really exploded just within the past 12 or 18 months.

Like, starting out five years ago reading CRDT papers and going to CRDT conferences, it was much more, you know, mellow academic atmosphere. But now there's just so many tools popping up, I can't keep track of them in my browser tabs. you know, the local-first discord, all that stuff. Just a lot more activity. So it's both exciting and also a bit scary, because now I can't read all the papers that come out anymore.

yeah, in terms of opinions, I guess the The strong opinion I've had in the past year or so is that the local-first ideal, I think, is too hard right now. There's just too many problems we'd have to solve to actually make like a local-first app where the hosting provider can go away and you'll still be able to collaborate and keep your data.

So the problem that I've been focusing on for the past year is the narrow goal, like the baby step, of how do we make traditional central server SaaS collaboration easier to implement, and maybe a bit easier to deploy. So that's working on primitives like what you were describing with LiveStore. We want some way to have events that you send around and persist IndexedDB.

broadcast channel between different tabs and then eventually send it to a server that stores them and broadcasts them back to the client. Just make some really good implementation of that that people can reuse so they don't have to reinvent it every time. and I think that'll be. Both useful for, you know, developers and also a good stepping stone towards the eventual goal of we want to get rid of this server and have our, have our data forever. I love that observation, and that opinion.

I think that's also one of my key takeaways from talking to many folks at the local-first conference we had this year in Berlin, where Everyone gets excited about all the goals and all the ideals of local-first, but going after a few of those already is technically very complicated. And then going like all the way to making sure that the software still works if the vendor goes away, etc. That is, I think, right now achieved by only a very, very few set of products and technologies.

I hope that in five years from now, it will be table stakes. But, I think it's a little bit like Maslow's hierarchy of needs. And like we, here we have like the hierarchy of ideals and we haven't, Yet quite made it as easy to achieve all of it, hopefully we'll, we'll get closer to that over the next couple of years. So those technologies that you've, now mentioned, is there anything that you're working on that can be looked at by other people?

List-positions

Let's see. So the main project I've had recently is it's a library called list-positions. So you can read about it on my blog post or look at the docs on GitHub. But it's basically trying to solve this fractional index generalization problem. You can think of it like a fractional index library that also implements the extra features that CRDTs have to prevent some bugs. The idea is that you can use this as a drop in part to do just the text and

list collaboration in some arbitrary data structure . So I built examples on top of Triplit, Electric SQL, Replicache. So these are our collaborative data stores that don't talk about lists or texts at all. They're basically syncing maps or database tables. And I said here, if we just stick these souped up fractional indices on top, we can actually do text to rich text collaboration. So that's, that's been my focus.

Outro

Very interesting. I will check this out. Maybe I can use it for Overtone. Maybe I could even integrate it with LiveStore. I will certainly check this out and we'll put the link in the show notes. Great. Matthew, is there anything else you want to share with the audience? No, I don't think so. It's been a really good chat. Thank you so much for sharing all of your knowledge about different approaches to syncing state.

I think this is the most in depth we've gone on those topics so far, and it provided a brilliant overview for future conversations. Has helped me a ton to, to better understand this, both your blog posts as well as this conversation. So thank you so much for taking time today and coming on to chat. Yeah, thanks so much for having me. Thank you for listening to the local-first FM podcast.

If you've enjoyed this episode and haven't done so already, please subscribe and leave a review wherever you're listening. Please also share this episode with others. Spreading the word about the podcast is a great way to support it and to keep it going. A special thanks again to Rocicorp and Expo for supporting this podcast. See you next time.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android