Rethinking Database Management: Automation, Innovation, and Changing Roles in DevOps - DevOps 196 | Adventures in DevOps podcast

00:14

What's up, y'all? This is the Adventures in DeVos podcast. I'm your host, well Button, and I have the co host with the most Warren Parade. What's up? Worn? Hey? I liked how you had to look to the side to remember your name before, right, It's so focused on remember and all the other stuff I have to do for the intro that, like my own name is like, no, brain's got no room for

00:39

that. Yeah, I'm warning. I'm the CTO authoress. You know, I had to be careful when I'm talking in conferences because I'll often look at my computer even if I have no reason to do that, and like, I know what the words are that I have to say, but it's just so distracting to have something else there at the same time, for sure. All right, Joining us today in the studio Tyson Trautman, VP of Engineering over at Fauna, former GM from AWS, Code Pipeline and Code Deploy,

01:14

Senior engineering manager at Riot Games and so on and so forth. What's going on, Tyson? Hey, good morning, Super excited to be here, dude, I'm excited to have you here. It's kind of we've had a theme going recently, not planned or intentional but it definitely has turned into a theme where we've been talking in the last couple episodes about databases, and so you're going to talk with us today about the bottlenecks and challenges that developers have

01:42

with databases. And one of the common things we've talked about in the last few episodes is how there used to be this career path called a DBA or database administrator, and that just seems to have mostly disappeared. So give us a little bit about your background and your thoughts on the state of the database today. Yeah, for sure. I mean I'll say, like you know,

02:07

DBAs are certainly still out there. They're still feeling a very important function at a lot of different companies, including you know, back when I was leading the Infrastructure Platform group at Riot Games, you know, we were kind of a sequel shop had you know, a lot of DBAs as part of kind of the broader group just managing our database instances. But definitely a trend

02:30

that's changing, for sure. You know, even like when I at that time at Riot, we were thinking very much about you know, moving towards automation using things like declarative schema and some of the you know, things that I think we'll talk about here and bringing continuous delivery to the database. You know, that kind of remove that need for a specific individual that's like, you know, laser focused on the database, you know, running scripts manually

02:57

on a specific database instance, like those types of things. You know, it kind of gets into this moving towards this uh you know, cattle not pets kind of mantra. You know, that's kind of more been more commonplace in other areas of software engineering, for sure. Yeah, because like when it comes to the world of pets, like databases worthy most prized pet for

03:19

a long time, still still are a lot of places. Yeah, yeah, speak because since we're live by the way, mine may you know, come bursting through the door at any means excellent literal pets, not the database

03:31

pets, right on, right on. That works. Yeah. So you know, I think that whole trend towards automation has has made it a lot easier to interact with the database, especially you know, I know that o rms are very contested and hotly debated, but I gotta admit, like having something that generates roll forward and roll back statements for my database is kind of nice, even if I don't get the opportunity to specifically tune and test that query. Like for the most part, it tends to do the right thing

04:14

in most cases for me. Yeah, it depends. I mean, so there's no question that if you want to bring the best practices of continuous livery to the database, you know there's a few specific things the pieces that you're going to need. And we've been thinking, you know, spending a lot of time thinking about this and kind of how it applies to to fauna specifically, but more generally, like you know, uh, one of the you know, what you're trying to do in the database when you know to apply

04:43

these practices of continuous delivery. And first of all, when you make schema changes, you have to be able to validate those schema changes against your existing data against services that call the database. And then you have to be able to replicate the transition process to make sure there's no issues get from A to B. And then and you have to be able to do those things in all of your different in different environments, right ranging from kind of local development

05:10

all the way to production. And so you know, will the capabilities you were just talking about, which is you know, some kind of mechanism to go and programmatically apply schema changes is absolutely a critical requirement. There are others as well. We could get into that, but for sure, you know, whether it's whether you're manually issuing querries you know, through an RM to do things like manipulate data maybe you know, potentially manipulate schema, or whether

05:39

you're using something like uh, you know, in fauna terms. What we've recently launched is are what we call the fun of schema language, which is a declarative language for defining schema and getting to a known state, you know,

05:51

declare I'll say, I'll put in a little plug. Declarative is typically where it's at for these types of applications, because you know, there are a lot of ways when you're imperatively describing sort of infrastructure that you can get yourself in trouble or you can get into, you know, kind of a these little edge cases, and so there are some issues that you can run

06:14

into there. But that's just just a little little plug. Yeah, And so just to clarify thereby declarative, you mean you're just saying this is what I want the end state to be, Like, I don't really really care about what happens in between, as long as it ends up like this,

06:30

that's right exactly. I want you know, I want these database all these fontorms, but I want these databases, these collections with this field level schema and then you know, even getting into more exotic concepts like we support the notion of like user defined functions and check constraints and you know, all these other kind of cool things that you can you know that we can get into.

06:51

But but yeah, the entire the ability to define that whole data model, the schema for that data model, and know that you're going to get into that state, you know, without necessarily having to describe like here are fifty different transformation stuffs I'm going to run and I hope that it all works out given the data in the database, and you know, to get to get into that state for sure, yeah, or learning the specific syntax or

07:17

whatever variant of SQL your database platform happens to be using and hoping that it either does it correctly or errors out not something in between, right, Yeah, yeah, for sure. Cool. So so it sounds like we're funny. You're setting the stage for people to to move because I feel like there was this trend about a decade or so ago where we put all of the business logic in the code, like your your constraints and your your validations and

07:50

things like that. Whenever prior to that, you know, that was the role of the DBA, and databases have really powerful tools for managing that type of stuff quickly and efficiently. But it sounds like you're giving that control back to the database, but in a way that makes it easier for developers to leverage the things that databases do. Well. Yeah, that's I think that's

08:13

a fair assessment for sure. I mean, we try it to be too prescriptive, like we want to meet developers where they are, right, you know, I think, but certainly bringing those capabilities to the database so that if you want, you know, you can put some of that logic with your data. You know, there are very powerful benefits that come with that too, right, Like so for example, you know, for modern kind

08:39

of responsive applications, performance is a huge deal. So the you know, the ability to go and run a bunch of compute over your data, you know, on the database is potentially a game changer for performance performance standpoint versus like you know, doing a whole bunch of run trips around trips and running that compute on the client, right, So for sure, yeah, having that that those capabilities available, and then also you know another thing with being

09:07

able to run that compute as part of a transaction on the database. It's even the next level, uh kind of benefit once you get into these real operational use cases. Right. So for sure, having those capabilities there so that folks can reach for them, you know, if they want them when

09:22

their application calls for it is absolutely something that we're focused on. And so I think a lot of that is there's probably a pretty large educational component to that because unless you have experience or a lot of background expertise in database servers, these are all things that you potentially didn't know database servers were capable of doing. So how has that shaped how you how you approach this problem?

09:52

Yeah, yeah, I mean in some ways, so what you said is true, although in some ways, you know, there's the kind of you know, everything old is new again, right, you know, people sort procedures since the beginning of time, you know, and I think there are different kinds of ebbs and flows in terms of you know, uh, the kind of hot application architecture of the day, and you know some of the sometimes these trends you look back and you say, hey, we did we

10:16

didn't have it so wrong with some of these things that were going on previously. But but for sure, I mean, yeah, you're you know. Uh. One of the things that's been interesting about my journey upon a for sure is you know, you you can build a really amazing product, and you know, as a consumer, I feel that, you know, using the product, there's no question that database we're building is the database I would

10:39

use if I left to start a company tomorrow. But you still have to tell that story to the market and help people understand what's compelling about it. And you know, this cool stuff that we're doing, you know, you're doing kind of in a vacuum, doesn't matter. What matters is how it maps to actual requirements of software developers and moves the needle makes them more productive, their life easier, mixed the applications that they're building, you know,

11:03

more secure, more available, more performant. And so yeah, there's a big educational component. I mean a lot of that is in docs. You know, I think when you're building these types of apps, the quality of your docks is a huge deal. Examples. All of those types of things are you know, big, big area of investment, no question. I feel like you said something super controversial. So and I don't know if I want to let you jump over that. What what? What is? What?

11:28

What is old is new again? I feel like there's a whole swath of engineers out there at orgs and companies who would swear by the stored procedures have always been wrong and will always be wrong into the future. I'm curious, like there's no I don't think there's a particularly right answer here, but maybe there's like some sort of spectrum and this is what belongs in at the database layer, and this is what belongs more at the domain or business logic

11:54

outside in your application. Any thoughts like where is that line that actually needs to be drawn? Yeah, I mean it's a great point. I think the I try to stay away from some of the kind of more you know, Zelody kind of religious debates like that again, you know, to me, I think, you know, Warren, it really comes down to like the use case and what you're trying to do with the requirements your application. Are sure, but you know, and we could get into some kind of

12:22

specific examples there. But I think one of the problems with store procedures in the past is it was this entirely separate code, you know, application code that sat kind of outside your app, so and it was it had a

12:39

separate life cycle from your application. There was no you know, kind of it single life cycle in terms of like continuous delivery and how you get that application and production and to think about things like versioning between client and server, and so a big part of what we've been focused on, you know, with our recent launches with on a schema languageeste is solving that problem by putting

13:03

that logic together. So you know, the typical pattern when you're using FSL to describe you know, your data is that code lives in your repo with your application code. It's version together. You know, that's what we do internally. We take a heavy dependency on fauna. We eat our own dog food, drink our own champagne. You know, however you want to put it, and that's you know, that's certainly kind of the best practice that

13:24

we follow as well. Everything right together. So I think, you know, part of the answer is it depends on the application and the specific requirements of the application. But another part of it is the problems that people had at that point in time can be solved through better products and better tooling around

13:39

those products as well. No, I'm totally with you, Like I remember on non trivial number of times, writing some giant framework to manage getting storage procedures written in your get repository into ms sql at the time and making sure they were executing the right order, intertwined with whatever schema changes you also had.

14:00

And then there's no way to really live validate that without running it for real and making sure that it hits some arbitrary query plan so that it's optimized, or even that it's using real fields like dive in like a couple types just aren't there in most databases when you think about it. Yeah, for sure, you nailed it. And that's why I mean to me, what good looks like here is you know again, you need the you need the

14:26

you need the declarative tools that where you can in any environment. You know, so in funt of terms, you can spin up in UH an instance of fauna locally in in our Docker container and you can use FSL to apply schema there. So you can use this for even like local integration testing. But so you have the ability to apply declarative schema it works in any environment.

14:46

And then you have you know, tools to load data as well, because like we mentioned earlier, you need to validate schema changes against existing data. And we do that through you know, we have uh two kind of two mechanisms there back up and restore, backup and copy tool in the live environment, and then we also have a data import tool where you can sort

15:07

of play arbitrary data to load that into a database. But then you know, you start wiring this thing up in automation as part of your CD pipeline, and all of a sudden, these tests are running locally again against the doctor container or something, or against the live service if you prefer, you know, in your integration environments where you're seeing actual consuming services in you know, under potentially real world load, you know, with those schema changes with

15:33

the new version of your your UDF or your store procedure as that's going to production. And then you know you can even safely apply those changes using that same tooling all the way out in production again, which is what we do.

15:43

So super powerful concepts. So I'm going to ask a really dumb question and hope that the listeners of the podcast who might actually have the same question can appreciate the fact that I'm taking one for the team here is is Fauna its own database engine or is it something that sits on top of other database engines. It's its own database engine. It is a it's a net new novel database that was built, you know, effectively kind of from the ground

16:14

up. It was deeply inspired by There's an interesting research paper that came out of Yale years ago by Danie Labodi who's an advisor at Fauna and a few others. That was kind of an interesting spin on how to do how to run distributed transactions in a very different model than like the typical kind of two phase commit model that Spanner and other like truly distributed databases use. So that's kind of at the heart of Fauna. That's what we call it distributed transaction

16:45

engine. You know, it's a big component of Fauna. It handles multi region replication with the strongest possible level of consistency guarantee. So that's kind of a secret sauce, you know, in the core the guts of the product. Again, we don't like, you know, sort of abstract nerdery. We like to tie back to real world application requirements. But the reason that matters is, you know, number one, people want multi regional application for

17:12

performance. If I can do fast reads from the local region, that's a really big deal for a lot of modern applications. Right. It's also the best possible dr story if you literally have multiple regions running in an active active configuration. You know, we see examples where our good friend USC one is having issues and you know, and so just keep cruising, cruising right along.

17:40

But then you know, doing all that with strong consistency is really the huge deal because as developers, I mean, you know, it's very expensive to reason about these different edge cases that you can hit in your application code, right, And to me, that's a bit like people talk about, oh, you know we have a strong ish level of consistency garantee. Well I'll either have to reason about these funky edge cases that I can get into or I don't write. Yeah, and with with Wanta, you don't.

18:06

So that's kind of the the guts of it when you talk database engine. That's kind of the the some of the secret sauce innovation that we've been focused on. That's cool. So what almost sounds like rethinking the database based on considerations that are relevant now but weren't whenever most of our common data base platforms

18:27

were created. Yeah, I mean, I think for sure the considerations that weren't relevant when folks were running you know, you know single ins or single instance uh you know postcress instances or you know primary secondary type configurations that are really still there, you know, with you know, with some of the

18:48

bigger players that folks are are using today. You know, for some apps, that's fine, that's you know, for some there's some applications where you know you're not going to hit this insane scale where you're gonna suddenly have to start thinking about partitioning. And then you got to start thinking about, oh, well, I was using Postgress because of the strong transaction support, but now I can't do post you know, or a cross partition transactions. How

19:12

do I solve that? And you know, there's a ton of interesting innovation there I don't want to put you know, they're like very cool tools for specific kinds of jobs, but you know, for applications where scale and potentially multi region replication or a strong requirement right out the gate, I think Fauna is is you know, in pretty unique air up there with a few other offerings that are really kind of ground up, Like let's think about how to

19:36

do distributed transactions, you know, in a in a first class way. That's a hard thing that just you can't just take that and bolt it on. You brought up documentation before, and I feel like there's a direct link

19:47

here. Actually something on my mind a lot is I feel like Authors has done something similar in the security domain, and the novel aspects that we've added to hopefully remove the confusion from users sometimes get in their own way, like they have expectations that a database requires a commit uh transaction step, that replicating across regions with some sort of replication agent like is there by design, and some of the more experienced engineers get caught up with expecting those things to be

20:18

in place. Do you see that a similar problem exists for Anna? You

20:23

know, I don't. I think people are usually if you build it, I think we've built it in a way that feels pretty intuitive because with Fauna, you know, so with traditional databases, you know, whether that's let's say a SEQL based database or that's something like Mango, you know, you're you're, you're your transaction model is kind of tightly coupled to the session, right, and so that's to your point more and that's like what you're the

20:51

mental models, what you're used to. You have to do things like session managment management, et cetera. With Fauna, you're just sending us for a SHPE requests. Basically, an AHPE request essentially contain some code that you run as part of a transaction. And so I guess my experience there in this particular domain, it so sounds a little bit different, is uh. It's the way we've implemented is pretty intuitive. So folks get there and they're like, hey, this is cool. I don't have to you know, I

21:15

don't I don't have to think about these things like session management. I don't have to think about is this code running in the context of a transaction? When do I, you know, commit the effectively commit that transaction. It's just I'm sending a transaction, I'm getting a response back, and I can do that in the shape that my application expects it, and that feels pretty intuitive. That's I think has been my experience while watching folks when they first

21:38

pick up the product. No. The reason I asked is because there was a a lifetime ago there was an instance where I was working with an engineer on GCP's Firebase, and there's a bunch of promises that firebase makes with instantaneous commits, none have to worry about concurrence or anything like that. And it's that that really felt counterintuitive that these databases that are on opposite sides of the world won't have any sort of race condition. And I guess that's not.

22:07

Like the way you've designed fauna is much more seamless than that that it makes it obvious when there is going to be potentially a problem, uh, and

22:15

when there's not and no one has to worry about anything. Yeah, I mean to be clear, like there's no free lunch of engineering, right, so we try to promise something that doesn't you know, and the trade off that you're making, uh, you know, when you're doing things like effectively running this type of active active configuration where you have multi regioner application with strong consistency, like you can't cheat the speed of light, so there, you

22:37

know, there are performance trade offs to that, you know. There what we've done is try to optimize to that frontier of what's possible so that you're pushing you know, if you're if you're doing rights through the transaction pipeline in the US and in fauna, you know, you're talking mid double digit milliseconds right of replication time and reads can be served from the local regions so very fast like low single single digit milliseconds, And in practice, that trade off

23:07

is very attractive for most applications, especially when you can avoid these like multi round trip like multi transaction scenarios where you have application code running on a client like it's it's it's not you know but there, But I just want to be clear, like there's always going to be you mentioned the fire store case not you know for sure, if you if you're if if for whatever reason, you think your your application is fine with eventual consistency, that's a different

23:30

model and you can achieve a different performance profile. You know, you know where where warranted. I'd say buyer beware because a lot of times you think your application is okay, you know, eventual consistency is an acceptable model for your app, and then you end up planning out it's not so. But yeah, just want to be I want to be honest. There no it makes a lot of sense really. So you said that interacting with Fauna is

23:55

HTTP a p I Is that right? That's right? Yeah, Yeah, it's done a lot of interesting where I don't know if we want to go there, but we've done a lot of there's a lot of innovation in FAUNA kind of in how we route to the database as well, which has been

24:07

a super interesting set of challenges. You know, typically like most kind of an afterthought for a lot of database providers, but thinking about you know, the point, because we're so performance focused, thinking about that trip from the application to the database as well as something that we've focused on on quite a bit, right, Yeah, that makes sense because using that pattern has you you end up getting a lot of being able to take advantage of a lot

24:34

of work that other people are doing in that same space instead of reinventing that work. Yeah. Absolutely, for sure. Backing on top of each GDP is so much easier than TCP backets quick or even if you're going to implement your own quick uh communication productol m, Yeah, no question. You know,

24:53

I actually took a look at FUNA before this. I was curious and I'm looking at that and then I'm looking back at the cloud provider, and I'm wondering, like, when what is it going to be that one of the cloud providers offers me a database that actually achieves as much as these external providers get. And you know, I just see such a huge discrepancy, and I you know, I'm surprised that they haven't made a lot of progress

25:15

in this area. Yeah, I mean so first of all, I'll say, like, you know, effectively all of the kind of hyper scalers are you know, fantastic partners. You know, today we're in the AWS marketplace, where in the GCP marketplace you know, will be an azure down the road as well. And you know, Fauna by definition, we have we have our what we call our public region groups called region group as we replicated

25:37

across regions by default. But you can also deploy something called virtual Private Fauna, which is still serverless but single tenant like VM level isolation, and you can pick your cloud providers and your footprint. So pretty cool offering for like

25:51

larger enterprises that need more isolation. But but yeah, you know, I think the reality is like those large providers, if I take my former employer, you know, AWS, I'm saying this is a as somebody who's no longer with AWS, but they have very successful offerings, you know, DYNO, dB RDS, et cetera, that are out there, and those things have a lot of inertia and they continue to do work to make those things

26:17

better. But you know those are those are you know, multi billion dollar businesses, you know, in and of themselves, and so it gets harder to go and compete with a startup in the in a weird way there because you don't have any competing interests at a startup like we're just setting out to try to build the best possible database for modern application developers, whereas I think at some of those you know, we've all anyone who's operating in the context

26:44

of a larger, uh, you know company. Once you have those revenue streams, there are different kind of goals and you have to cater to that existing audience, right. You know, Mango, which is a fantastic product obviously has been around for a long time, been very success full, still

27:00

growing leaps and bounds. You know, they can't they can't throw out the on prem business, right, you can't throw out the baby with the bathwater, And so you have these kind of existing businesses that are very successful. You know, a great thing for those those enterprises that they have to serve and focus on. But I think to your point, I think it makes it a little bit harder to go and build something that's kind of a generational

27:23

leap, right that's significantly better for application developers. And but you know, in fairness, we do see that they're pretty there's a lot of hunger from those from those companies to go and partner, and you know, they want there to drive the best outcome for their customers and if that ends up being you know, we have a close running conversation with the Lambda team at AWS for example, because there's a lot of synergy between Fauna and Lambda where the

27:48

data is at the edge where you can run your Lambda functions. You don't have to maintain sessions, which is fantastic from those kind of ephemeral environments that can just go away. So yeah, they're good partners for us, you know, and like I said, we're right there in their marketplace, so you can buy us alongside the first It sounds like, you know, a bunch of innovators dilemma, but also innovation is coming if you pick the right

28:11

providers. Yeah, yeah, for sure, absolutely that what kind of design changes or design approaches require mental shift to take on something like fauna where you've got you know, a server list serveralist database with your data sitting on the edge. In terms of so when you say design shift, you mean from

28:36

the consuming application. Yeah, Like, if you're architecting an application, are there any considerations that you are there things that you may you may make it down the road and say, oh wow, if I'd done this, I would have been a lot happier. Yeah, good question, just thinking you know, the there are there are kind of these very mod and application architectures that Fauna in a way is sort of you know, uniquely suited to serve.

29:06

But you don't have to be using one of those architectures for you know, to take advantage of a database like you know, Fauna or some of these other like very modern databases we have. We have plenty of customers that are running you know, a more traditional application architecture where they have like a single region app server you know, sitting in you know, pick a region that's accessing fauna, and they're not necessarily taking advantage of multi region application.

29:33

Other than the fact that it's there as part of a dr story, So they could fail over, you know, if they're they're apps in region, but if their app was up for some reason and you know, some service dependency brought us down in a region, that traffic could fail over and roped

29:48

to another region. But you know, I think, but, but no, I think the answer is like, we work just fine with those types of like you know, more traditional architectures, or if you're building if you're some new startup that's very forward thinking, you're building something greenfield, and you know you want to take advantage of all these new awesome offerings. Run compute

30:10

out at the edge. You know you can do that, and fauna pairs very nicely with that with those types of architectures as well, right because we're going to have your data there, server lists, you don't need to manage to think about scaling, you know, managing instances, et cetera, those types of concerns. So I think I think we've I don't think they're you know, I can't think of any specific gotcha's or something that you should have

30:32

in mind as you adopt these kind of modern databases. I think they you know, roll pretty well in the existing paradigm. But then as you get towards these these newer architectures, you know, they can they can really shine.

30:42

So if someone spent the last twenty years building MS SQL and now I shifted to RDS, there's not like it still sort of works pretty much the same if they were to switch to FAUNA other than get all the benefits and there's not a lot of potential pitfalls with this sort of different database engines that are out there today, that's right. Yeah, I mean the you know, the downside is you have to go and and you're gonna have to migrate application, yeah for sure, and you know and your yeah, your data,

31:12

you know, I mean you may have to. I mean we have some we have some tools that do you you can export, import data, do some different things. I think we'll continue to get better. There hasn't been you know, top very top of stack focus for us. We've been kind of more focused on innovating the core product. But but yeah, for sure, now I think if you're you know, if you bite the bullet on that migration, there's a whole bunch of benefits that that that come with

31:34

it. For sure. I mean we you know, we have existing customers that have come from you know, you name it. I'd say recently, more of the focus for folks that are coming to Fauna are people who jumped from relational databases to document databases because they believe in the document model. They're not a fan of r ms. They don't like this document you know,

31:53

relational impedance mismatch. But then all of a sudden, you know, they're running on Go fire Store whatever, Dynamo, Cosmos, TB, and they missed the relational some of the relational capabilities they had with their relational database. They don't they're not ready to go document, but they want these relational capabilities, you know. Fun of fundamentally is what we call document relational. It's this kind of new paradigm where we store data as documents on disc but it

32:22

has the capabilities that you typically expect from our relational database. So for example, the ability to define first class relationships across uh, you know, documents and and kind of uh uh you know, and access those documents the ways that you would expect with the relational database. So yeah, so there's a lot of excitement for those folks that see this as like, all right,

32:44

kind of the best of both worlds. I don't have to I've jumped in the document model, but I get all these capabilities that were that were the good part of my postgrass database. And it's not like a partially QL on top of Dynamoe deb That's right, that's right. Yeah, yeah cool. So one of the big challenges I think with a lot of a lot of larger databases is you always end up in a situation where you're doing et L. You know, it's pretty common to et al your data out for analytics.

33:19

How did y'all approach that? Yeah, We've got a few options there, you know, Like first of all, we support so there's a Fauna air Byte connector that some of our customers use Airbyte open source product. They have managed offering, so you can use air Byte too, you know. They're they're the connectors they have to you know, to pump data from Fauna to your analytical database of choice. You know. We we also support uh

33:47

streaming, so you can stream changes to documents or sets in Fauna. So different mental model, you know, the you know, air Byte is kind of the more traditional, like I want to push batch updates to my analytical database. Streaming is more commonly used for these kind of you know, responsive

34:07

apps where you want to push changes to a client or something. But we do have some customers that for these different use cases actually want to leverage streaming to push changes to sets too, you know, a different a different, a different database. So those are kind of the most common patterns. I'd say, you know, it's not there today, but we're thinking a lot about. I felt like one of the buzzwords that reinvent this year was like

34:31

zero et L. You know, there were Amazon was approaching. It was announcing zero et L for all this database in this database, which really, you know, my mental model like that, you know, you're still doing ETL. It's really I think it's more just kind of like managed et L, these kind of uh, you know, higher level connectors between databases where more of that is managed just behind the scenes and it just works. You don't have to think about it, you know, you don't have to necessarily

34:59

bring in a third vendor or manage something yourself. And so that's something that we're thinking a lot about now, is what that would look like, you know, to create these kind of more managed ETL connections to some cloud providers.

35:10

Nothing there you know it's in product today, but something that's it's certainly on our on our radar screen, Like we're talking about streaming to say big quarry or red shift something like that, or to stove like for instead about the customer needed to do anything extra, that's right, you know what if you could just do some kind of an off handshake to that provider and you know, maybe give us a little bit of metadata about I guess you're getting

35:32

into describing transform. So there's you know, same sets of defaults where things just work right. Like that's I think you know this. I'll say like Amazon went way down the path of this, like a specific database for every possible scenario, and then quickly realize, well, that's creates a big headache when you're thinking about moving data around. And so they have these different offerings like blue and other things to you to move data between services. But I

35:58

think there's a lot of uh folks that aren't content with that situation. Yeah you got you got us in this situation. Now help me move data around more seamlessly without you know, I don't want teams of engineers that are just doing un differentiated heavy lifting building tools to move data from a to b so uh yeah, I think that's part of what created this problem that's now resulting

36:22

in like zero ETL. I'll say, by the way, too, like we're more bullish on uh innovation at the database let layer that can unlock a more general, generally suitable database. Right, so fauna, you know, we even see people starting to push the bounds where they're issuing you know what

36:38

traditionally looks more like analytical queries on top of FAUNA. I know we're not optimized for those use cases, but you know, I'll say, uh, new databases like FUNA are very powerful, and I think you know, we're getting away to some extent from that mold of like, oh, I need seven different databases for all these specific use cases, and I have to think about replicating data between them. I have to think about keeping data and saying,

37:01

you know, et cetera. Oh for sure. So so rather than etaling the data at all, just build the database so that they can meet those needs in the same platform. Yeah, yeah, yeah, depending. I mean, you know, again, I don't want to get back to you know, you know, use car salesman here. I'm not saying that architectural trade offs you make you build a share that will make them more suited

37:23

for a specific set of use cases. But it's very clear that customers don't want a whole you know, database portfolio, right, and you see this, but other you see this going the other way too. I mean you see, you know, Snowflake, for example, you know, launch their uni store offering where they sort of try to bolt the transaction pipeline onto your analytical database. You know, I haven't played with it. I've heard you know, their performance issues, things like that, et cetera. But for

37:54

some people it's compelling. If you're right there, your primary workloads are analytical, you need transactions for some specific use cases. You know, that can be compelling. And so I think there are you know, we're certainly not the only ones. There are a lot of vendors thinking about this. You know, how do I bring more functionality potentially into into fewer offerings to make

38:13

life easier for customers. I think that's super important because at least historically with Snowflake, I haven't looked at it recently, to be honest, getting the data in was sort of a challenge. So offering more functionality doesn't necessarily change that. The zero ETL connection on the actual database application side is super like a lot really valuable. Yeah, yeah, I agree. I guess you could say databases aren't pokemon, you don't have to collect them all. True.

38:44

I agree with I really got to get the applause to plug in working on this because it's just awkward. I didn't. I should have brought me my snare DROM and my crash I could have, given it's downstairs, though,

38:58

so we can't make that happen. I mean, I feel like there's a duality there though, because the more sort of fundamental engine aspects that a database engine offers, it has to be trading off something right, Like I do have to pay more because the data has to be replicated very quickly to a different format, or you pay some sort of latency on processing in order

39:20

so that it can actually figure out what the optimal execution is. Like, we have this problem, so we primarily use dynamouod V a lot, but we also have some graph needs and I still haven't found a good like I love it graph database for us to sort of bet on and I'm not sure that's something that Vaunta does, but a lot of them don't do that. It's not an easy thing. Yeah, I mean, you're you're absolutely right.

39:45

I think the question is, you know, you there's going to be you know, the frontier that you can get to, right in terms of how much you can do based on the architectural decisions that you've made. For sure, the question is can you get you know, eighty percent of the way they're in the same database where for specific use cases it's good enough. I mean you mentioned graphs that it's an interesting example we could dive into.

40:07

You know, faun is not neo for j right, Like, we don't have a first class concept of edges between the sort of documents or entities in our database. You could model that in documents and fan us, you know, because we support this ref field type you know, where you can effectively

40:28

uh, you know, denormalize as part of a query across documents. These relational capabilities we're talking about, you can model a lot of these graph use cases and people do this, right, Like, we see an interesting customer that's using kind of relationships that you would think about as more of a graph type use case to fetch related entities to go and inject into an ll M as part of an AI app. Right, like, you know, so

40:51

there are things you can do there, you know. Now, are we going we made a different set of architectural decisions than NEO for J So are we ever going to be as good for the P ninety nine P one hundred

41:04

graph use case. No, that's not our goal. But if your data is already in fauna and if you like our capabilities as just a general purpose operational database, and then all of a sudden you have these graph use cases, you know, is are the requirements that P and A nine P one hundred use case where you need to go and etl data into a purpose build database. I don't. For some people the answer of yes, and that's totally cool. We want to support those use cases have convenient tools available.

41:29

But for a lot of folks the answer is no. And if we have the right set of capabilities, you know, we can we can unlock those use cases for them, you know, without adding the management overhead of kind of another offering. That's a really that's a really good point. And I just want to set the records rate. We don't use we don't use Java

41:45

for ours. But uh no, I mean there there's there's some joke here about like Yeah, for sure the Bredo curve that you can make it look like it and I even feel like the graph database are much too slow for the usage of graph that most people have, right Like, even if it is relationship based, you probably can achieve a much faster results at using Fauna than pulling out the P one hundred graph database solution and putting your data in that it's just usually not as fast in most cases. Yep, yeah,

42:17

definitely true. So more in question over to you, what was the deciding factors that pushed you to choosing Dynamo for your data? You know, I feel like this comes up a lot, and primarily I think the sort of data that we're saving is sort of relationship aspect. But we store the roles and identities for individual users and the permissions that are associated with them, and

42:44

that lends itself a lot to a key value store. There is a relationship aspect there, but as I said, it is sort of like this eighty twenty principle where most of the data does not benefit from being highly relation enough. Also, an ease of usage something that came up previously, and I want to ask Tyson here is like when do you hire a DBA, like for whatever you words you associated with the acronym DBA when you hire one, like, we don't have one. I don't want to get into the business

43:15

of having to get one. I don't think it scales for us. And so things like DynamoDB Fauna, these no SQL or PSEUDOOSEQL document stores that offer key value aspects I feel like have a much lower complexity, a much lower barrier to entry for application engineers than you say, trying to spin up an RDS, which also doesn't replicate well. Dynamo offic with global tables. We offer global solutions as a global company, and having multi regions that are active

43:45

active configurations is like a huge importance for us. Yeah, I'll say, I mean, so there's a few interesting ways to go there. First off, in the Dynamoe because I thought this was fun, fun question to chase down a little bit. You know, you're absolutely right, Like Dynamo is very a great tool for super fast kV access, predictable latency. You know, even at several you know, several nines, you're still getting very predictable

44:14

latency. It's interesting because having been at Amazon, you know where most apps internally now are being built on Dynamo. There's this almost religion to it where you're saying, again, we can't build a performance enough, scalable enough general purpose database, so we have to fall back to a tool like Dynamo. It's effectively at the end of the day kV access and then build a lot of complex application logic to go and support that, right because we have kind

44:45

of a lesser consistency model, you know, limited access patterns. You mentioned global tables for multi regional application, but when you go to global tables there's a lot of Gatcha's, I mean, they're expensive, you know, eventual consistency again even more limited in terms of access patterns and what you can do. And so you know, for sure, I think, you know, Dynamo is an amazing tool for what it is for those specific use cases.

45:08

But we see a lot of interest from customers that they want that predictable kV access which other tools like want to provide. But then you know, they don't want to have to think about transactions in their application code. They don't they don't want to have to think about you know, they want to support very flexible access patterns where a lot of times you know in Dynamo even kind

45:31

of the the adopted mantra of Dynamo, things like single table design. You know, you really have to which has been involuntarily, You really have to understand your access patterns upfront, and so maybe you know you you you come up with your data model and things work fine for that initial use case, but then you go to build the next feature and you're like, oh crap, I'm screwed. You know, it doesn'ts not gonna support this. So there are a lot of gotchas there there. You know, I think great

46:01

service for what it is. For sure, I can say firstthand is somebody used for an aw services like you know, they're always going to get a lot of uplift because they're integrated with IM CloudWatch, cloud Trail, they're right

46:12

there in the ecosystem. But I do think it's worth for a lot of folks that are just diving in, you know, taking a peek at some of the more modern offerings that are out there, like Fauna and say, you know, just because because not only are they going to work for that initial use case, even if it's just KB storage providing super low latency you know kV, but they're going to scale to those access patterns without you,

46:35

you know, requiring you to kind of understand all those access patterns up front. So that's my little spiel. Warn't you've gone into a separate question, which is about hiring DVAS. I don't know. We could go ahead. I don't know about you either. I mean, you mentioned a lot of

46:45

really interesting things there. I think that what I'll say is we definitely have a bimodal distribution where it's very easy with things like Dinamo dB for a very simple app to get up and running without the complexity we' and then there's this

46:59

lull in the middle or you don't want to be in. And we're at the point on the other side where we sort of fundamentally know how dynamodd works as though it was we implemented it at the service level, because we know where the edge cases are and we're figuring out how to get around them or understand how partitioning even works because we're at that scale and we have tons of

47:21

application code that wraps it to make it as effective as possible. So yeah, I mean, at the second mode at long tail, you for sure if you're using Dynamoe to b at scale you will, you will know how it works for real, and you will have a lot of application code to deal with that. Yeah. Absolutely, there's no question, I mean for sure. And then once you're in that state, you know, we talked to a lot of companies that are there, say, we built up expertise,

47:44

we have a lot of application code around it. You know, it's working, we have there's for sure pain. The question is like, when does that pain get too high? You know. We talked to a company that was a big Dynamald customer, you know, and it was taking them like literally I think days to build you know, new gsis. They're running in GSI limits, et cetera. And so they were like out of the point where they were. You know, it's like, Okay, this is

48:05

literally no longer tenable. We have to explore other options because it's not working for our use case. But that's pretty extreme, you know, And there are a lot of companies that are in the state that you're in where Dynamo is working great. It's a very powerful tool. They've built up internal expertise and they've built these kind of this layer of their application on top of that.

48:23

That's necessity you know, for their specific use case. Uh, and then may be fine, Like I'm not that's that's you know, I could could be the right spot for them to be in, you know, I think I guess my point was more around there's a lot of people that dive in because they're there in the ABS ecosystem. It's a convenient tool without necessarily understanding that they're going to have to go and do what you guys did for which is a lot of application, you know, or a lot of investment

48:50

around the database to make it work for your specific use case. And you may not need to do that, like if you do, there may be kind of other more modern databases out there that handle some of that, you know, what i'd consider to be more un differentiated. Have you lifting for some use cases not for all use cases? On you know, on on your behalf, So that that was more what I was was was pushing. Yeah, I totally agree. I do want to dig into the DBA thing

49:15

though. What do you think the role of the DBA is moving forward? Like from here moving forward? Do you think it still exists? Is there

49:23

a particular place where it makes sense? Yeah? I think I mean so DBA has traditionally understood which is like a human, you know, thinking about pets on a kind of a very one off basis, like thinking about database instances like you know that that will exist, you know, for a while, just essentially to service this these kind of legacy applications that are written on top of you know, single instance, my SQL postgress whatever, you know,

49:55

SQL server instances that will be around again. We'll be around for a long so there we demand for that, you know. But I think looking forward, I think what you're going to see is, you know, software engineers with you know, some domain expertise in terms of data and data modeling, because that's a big field and nobody can be, uh, you know,

50:21

a software engineer with general expertise across every area. But you're gonna see software engineers with with that kind of expertise, you know, who are helping enterprises at scale to think about their data architecture, you know, and how data is consumed across you know a complex soa or whatever whatever that architecture looks like, right, But they're going to be doing it in scalable ways.

50:44

They're going to be taking advantage of things like you know, these the faunic capabilities that are referenced earlier, where it's going to feel more like writing application code traditionally has. They'll be using all these practices like you know what we

50:57

call schemas code more similar to infrastructure as code. You know, it'll be bringing that cattle nut pets kind of mindset to to the database, so that that there's going to be more and more of a need for folks in that kind of role, which looks more like you know, software engineer with some expertise in data and data modeling. You're integrating with the database layer than you are figuring out how to optimize one. I mean, obviously, you know,

51:24

I think that's a great answer. I mean, the cloud providers or the faunas of the world, right, if you want to be a DBA, maybe that's where you go, right to these products that are offering this primarily at scale, so that companies don't have to think about what their engineers are going to have to know in order to put data in a database somewhere. Oh yeah, for sure. I mean, but the people that we're hiring, you know, like the it's actually it's hard to go and recruit

51:50

for these these roles. But you know, people folks with deep database sme that are yeah, that are actually going to get in and think about you know, uh, you know, improve performance of sort of the transaction pipeline, you know, store the storage engine. Uh, for sure, I mean those are that's a there's there's a there's a limited pool of people that have are you know, I have spent time solving those challenges in the real

52:13

world. They're very highly sought after, and we would rather hire those people, uh, you know, so that you don't have to think about those

52:20

kinds of hard challenges, for sure. Is the way that we think about it, Like, we want to take as much of that sort of complexity as as reasonably possible and solve those challenges in a common way so that you know, more and more, I mean, our the primary goal of what we're doing is trying to allow uh, these application development teams to go and innovate and focus on their application and not you know, again not do this

52:43

you know, undifferentiated heavy lifting. I mean, you know, one of the things I saw in a past life when I was a leading the code

52:51

pipeline and code deplay teams at Amazon. It was interesting because you know, at that time we were competing, we had built an interesting offering at kind of a lower layer than kind of the githubs and the get labs of the world, with you know, things like actions that they were that were launched, and that was compelling for like the p one hundred of enterprises that wanted

53:12

to go and innovative. That layer weren't kind of like what we were talking about top of Adnamo right where it actually matters and you need to go and do this, you know, build these specific release capabilities in your application. But the bulk of people were coming to us and they're saying, hey,

53:23

make this, make this really easy. Make it easier for me to release safely like Amazon does, but for my enterprise without me having to staff an internal team of eight people, you know, to build this layer on top of you know, say cook pipeline or whatever the tool you know, Jenkins,

53:37

whatever the tools that they're using. I heard that repeatedly from every every enterprise that we talked to, and you can think about what we're doing in some ways this kind of analog in in the you know, the the database that are different. I like it you drove barrel between you know, code, Pipeline and Jenkins, because you know those are at the same level for me, they're different for sure. But yeah, we could go down a

53:59

whole rabbit hole there for sure. But I would say, like the common thing is, you know, when I was when I was leaving the infrastructure group at Riot, you know, we had innovated a lot on top of Jenkins for reasons, and you know, so I guess the parallel is that, you know, those lower level, less opinionated offerings that are very powerful, but in practice you end up needing to staff you know, a group, a team or a group of people to think about how to take that

54:23

and apply it to your specific domain. And you end up building these layers you know, on top or in your application to handle you know, to take that and integrate it into your use case. That's the common I guess the common thread I was trying to try to I'm totally with you there.

54:39

Actually it was my job at a previous instance to actually migrate the organization to you start using get and along with that was turn off Jenkins and actually start using I get lab at the time, and there was a whole number of challenges with that, but for sure, no one was going to do that work if it requires building on top of the equivalent of Code Pipeline, Yeah,

55:06

for sure. I mean, also in fairness, I think we Amazon it's interesting, you know, internally, I think Amazon spends more time thinking about things like release safety than any any company on the planet. I feel fairly confidence saying that. And you know, Code Pipeline has interesting capabilities inspired

55:22

by that internal learning that I think made it the best offering. If you were a large enterprise with unique release requirements, meaning you need to release across regions, and you know, you need to think about let's say four plus nines of availability, et cetera. You know, I think we had optimized in some ways for those sets of capabilities, more so than just like you

55:45

know, it's the typical developer that's using GitHub actions. For example, it's totally fine to spin up containers to run you know, release pipelines that actually have no knowledge of other executions in the pipeline because they're only releasing once at a time and maybe once a day, you know, or whatever. It's

56:04

not there's not this complex sequencing. So we were thinking about those kinds of use cases, you know that that are super interesting but more relevant to the p one hundred of of you know, release automation, and again getting back to the kind of no free lunch theme, you make trade offs to do that, and ease of use is one of the babies that could sacrifice sometimes as part of as part of those trade offs. No, I think I think you're absolutely right there. I think that highlights one of the one of

56:34

the needed or one of the neees of using managed providers. Like as a managed provider, you know, I'm looking not just for management of the infrastructure, but also like some guidance on how to effectively use the product. Mm hmm, yep, yeah, I mean, yeah, No, I agree. I mean I think the there's sort of tiers to what you mean and managed I mean, this is true of serverlists too, right, Like people use the term serverlists that I've described just about everything todays it's become a little

57:07

bit whitewashed. Yeah, you know, you know, we say all the time that we see APIs as kind of the highest possible layer of serverleists because literally, you know, your its request response, specified input output, and

57:21

you're not you know, thinking about any of those lower level concerns. But you're getting to is interesting point well, which is you can even go higher than that, which is, you know, thinking about things like documentation, instruction and learning and all those things as sort of part of that core offering as well. To take the abstraction if even hire, which I think is a good point. Yeah, to use the I think I used this quote

57:40

in last week's episode too, from Jurassic Park. You spend so much time thinking about if you could, You never stop to think if you should. And I think that's one of the opportunities managed providers have is to to answer that or provide some insight into whether or not you should be taking this approach. Yeahhen you said Jurassic I'm a disappointment, he said, Jurassic Park. I thought you were going to go Samuel L. Jackson and say hold on

58:04

to your butts. Are rebooting the servers, right, Yeah, I mean there's there's a limited number of Samuel L. Jackson quotes you can use on the podcast. I'm a big fan of all of them, but there's a of them. Uh, where are we asso there? Yeah, well, you know we're coming up on an hour here, So how about a quick summary on who's the who the ideal user of fauna is and what they can

58:46

expect by trying out the service. Yeah, I mean, you know, really i'd say the ideal, h you know, fauna user is someone who's building modern Apple locations that store operational data, right. I think that's a pretty that's pretty general. I mean we see there are certainly specific verticals where we see you know, more adoptions if you talk about those, but you know, it's a pretty generally applicable product for you know, these these different

59:15

transactional workloads. You know, particularly if you if you you know, applications that have very high performance requirements, responsive applications, et cetera, applications that have high availability and security requirements. You know, where because we manage things for you, you know, typically we're going to be more available than if you go and try to manage your own database. I think those are a few areas where where we really shine, right And when you say high performance,

59:47

you're talking like Nasdaq level high performance. Would you go that high? Yeah? I mean we have we have folks that have built you know, different financial kind of you know, like cryptocurrency for example. We have folks that have built like trading applications, uh, you know, streaming streaming data

01:00:06

using our events streaming. But again the main focus there is that because the way that we do multi regional application and the way that we have that data there, and even the optimizations we've done at the routing layer, you know, we can just get you to that data very quickly to do things, to make decisions based on that data, or to render that data. So

01:00:27

yeah, you know latency sensitive applications. I mean well, from the from the knowledge that I have about this, a lot of the order books are usually tried to be saved in memory anyway, so you persisting them to a database. I mean, you could pick anything. It seems like Fana would be just as good a choice as anything else, whatever makes sense for the actual application. I got you right on cool. Well, before we move on to picks, Tyson. People want to learn more about you, interact

01:00:54

with you, find out more about Fauna. What's the best way for them to do that? Yeah, for fun, I mean funa dot com. You know, reach out to us, interact with us on social media, email, drop me line whatever. I guess I'm less interesting, but I'm around occasionally. That mostly works. I'm not much of a social media guy, but but yeah, I'm I'm around on the platform, so it feel free to drop me a line. All right, awesome, let's do some picks, Warren, I want to kick us off. He always likes picking

01:01:27

on me to go first. So that's that's really why he needed another host in here, so that start the time. Yeah. Yeah, So mine's going to be something for my personal life. I know that as a DevOps advocate, optimization uh is such a huge part of what we do, and especially automation as well, but optimization primarily, and so this is about picking

01:01:53

socks. I have to say that if you're someone that loses some socks every once in a while to the washing machine or dryer monster, just when the next time you're out and you need some more socks, just get like twenty pairs of whatever it is. You'll never have to think about it again. And when you lose one of them, you won't even know and it won't even matter. From that, get a whole doesn't matter, just throw it away. So especially favorite socks, I gotta say just absolutely fantastic. Has

01:02:22

changed my life right on. It's a great pretty socks for and I'm like order the same kind and bulk from Amazon, and every time they start getting a little chevy and chuck them out and go Yeah, that's that's definitely the thing to do. I tried doing it with shoes one time. That was a mistake. I would not recommend buy any more than one of the same kind of shoe. But they also deteriorate over time. So if you really do like something you've got and you know it's gonna get worn out in the

01:02:51

future, get it. Get multiple extra pairs, right on fair advice, Tyson, What you got for a pick? Well, I'm gonna, I'm I'm so, I'm told I can go pretty broad here. I'm gonna yeah, for sure, this is a bit out there. But I'm a wine guy. My my biggest hobby when I'm not not not making software, we

01:03:14

actually make make a bit of wine. I love. There's something I think very rewarding when your day job is thinking about software and something that's intangible from like pretty much of work in and making something that you know you pour into a glass and and enjoy at the end of the day. So, uh, my pick, I'm gonna give a shout out. I live in the state of Washington. There's some pretty amazing wine being made up here, and

01:03:35

so if you want to try some good Washington wine. I'll give a shout out to a friend of mine who mixed wine at a winery called the Bapiano Silver in Walla. Walla does a great cave and a great Sara. So if you're looking for to experiment with some really good Washington wine, you can

01:03:50

go in and uh and check them out right on. Awesome cool for me, my pick I. I live on the the edge of the country, and so I go for long runs out in the countryside pretty often, and there's a lot of people who have dogs out there and no fences and no leashes, and most of the time the dogs are cool, but occasionally you

01:04:14

get some dogs that are pretty territorial. And so I recently picked up the Viper Tech heavy duty stun gun, and the thing is just so cool because you know, like I'm going to try to avoid, you know, becoming

01:04:30

the target of the dog's aggression. And if a dog does come after me, I don't want to hurt it, but you know, at the same time, I don't want to get hurt either, And so I did a lot of research and a few people suggested stun guns, you know, a taser number one, because you can just click it and the noise is enough to deter most dogs, but then if it escalates, you know, you've

01:04:51

got the ability to tase them as well. And so yeah, I picked up the Viper Tech, just ordered it on Amazon and it's it's pretty cool. I haven't actually used it yet, haven't had to use it yet other than just you know, seeing a few dogs and sure enough, you know, you click it and they're like, ah, yeah, okay, we're done. So how heavy is it? It's super light, Like carrying it

01:05:16

in my hand doesn't doesn't even register. Got it cool? So yeah, I was worried about that, you know, worried about carrying it and having it just become cumbersome. But it hasn't been by any means. So there you go. Got socks, wine and a stun gun from today's episode. What could go wrong? All these things together? Right? Awesome? Well, Tyson, thanks for joining us. It's been super insightful and pretty entertaining, so appreciate it. Yeah, it was my pleasure. Thank you for

01:05:51

having me. Uh yeah, it's love fun, cool right on burn. Thanks for joining me once again. Thanks for holding up the co host role and showing up. Appreciate it and to all the listeners, Thank y'all for listening, and we will see y'all next week MHM.

Transcript source: Provided by creator in RSS feed: download file

Rethinking Database Management: Automation, Innovation, and Changing Roles in DevOps - DevOps 196

Episode description

Transcript