S4E17 - Cosmos DB - a fully managed NoSQL and relational database for modern app development | Let's Talk Azure! podcast

00:00

You. Hello, and welcome to the let's Talk. Azure Podcast with your hosts, Sam Foote and Anne Armstrong.

00:07

If you're new here, we're a pair of Azure and Microsoft three six five focused It security professionals. It's episode 17 of season four. Alan and I recently had a discussion around Cosmos DB, a globally available scalable, scalable and flexible NoSQL and relational database system. Here are some of the things that we discussed. What is Cosmos DB and how is it different from traditional database management systems? What APIs are supported and how do you choose between them? The operational considerations of Cosmos DB and how it's licensed? We've noticed a large number of you aren't subscribed. If you do enjoy our podcast, please do consider subscribing. It would mean a lot for us for you to show your support to the show. It's a really great episode, so let's jump in. Hey, Alan, how are you doing this week?

00:55

Hey, Sam. Not doing too bad. How are you? Yep, good, thank you. Ignite in person sold out. Is that a first? Has that happened before? I think it's sold out before, but I don't think as quickly. Yeah, it's only been announced, what, a couple of weeks? Three weeks max, I think. Yeah, I don't know. I agree with you. It feels pretty rapid. Is that sign of the world back to normal or a smaller event? Yeah, we don't really know at this stage.

01:30

Well, last year was the first in person since the pandemic, wasn't it? So people were probably a bit weary, to be fair, and I had attended, so it definitely felt like it was like the first one back kind of things. But looking at some of the stuff that was put on the list for what you get in person, it sounds like it scaled up again, or at least getting there kind of thing. So no wonder people sort of decided to go for it again.

02:04

Yeah, 100%. And it sounds to me like there's a lot of buzz around Ignite. Feels like there's going to be lots of big announcements. More copilots. Do you think? More AI? More AI everywhere. Something like that, yeah.

02:22

Maybe I've not seen too much, but I bet a lot of some of the copilots we've heard about will probably start coming into public view, or maybe going ga. I mean, I don't know which ones will do that, but I expect some of them will start to sort of go down that route. So we do see a lot of those announcements of public preview or private preview sort of capability then becoming ga at.

02:54

Good to see. I hope it is as big and sort of interactive in person as it once was, if that makes sense. Pre pandemic. So it's good to see it selling out.

03:09

Yeah. And I suppose the other thing last year's, one was that they had loads of little I say small, but they had regional Ignite centres, didn't there's like one up in Manchester, in the UK and there's some around Europe and things like that wasn't there. So I suppose maybe it spread, the people count spread across those sort of events so people could experience it locally without having to travel. Now Microsoft brought it back to the central location. Yeah, exactly. Yeah, definitely.

03:41

Okay, so Sam, what are we talking about this week? Yep. So this week, it's Cosmos DB. I'm going to say it's a relatively large and unique database proposition from Microsoft. So, yeah, we use Cosmos in one of its flavors quite extensively. So, yeah, I thought it was worth doing an episode on, for sure. Yeah, definitely. And yeah, like you said, we've only used probably small chunk of its capability in our sort of day to day. Let's crack on. So I guess start off with what is Cosmos DB?

04:22

Okay, so if we talk about more traditional databases to start off with, I suppose it's probably a good way to sort of begin because I think we've got to really understand why Cosmos DB exists and what challenges it's aiming to solve. Because traditionally you would have a database server that might sit in the corner of your office or in your data center, and that would be a physical server or computer. And traditionally it would be a relational database. So something like SQL Server or maybe an open source flavor like MySQL postgres, something like that. And obviously we've had an overall shift into cloud workloads and that's obviously been driven by cost reduction, management, overhead reduction. So it's possible to spin up relational databases in Azure. There's nothing stopping you provisioning a virtual machine, installing SQL Server on it, and connecting to it well remotely, like you've done before. But modern applications and platforms need to scale sometimes beyond that. So there's an element of vertical scaling with SQL Servers. You can give it more resources, more powerful processors, more memory, but you get to a scaling limit at that point. You then look to things like clustering to effectively pull resources across multiple machines. And for instance, you can do that with SQL Server and you can do that with most modern and mature database management systems. But what they're, I suppose not great at is sort of sort of geographically and remote clustering where you effectively have nodes maybe even spread around the world. A lot of technology now needs to be globally distributed and globally available. If you've got a bunch of users in Australia as an example, and your databases and even your web servers are hosted, say in the UK, you've just got a physical transfer delay there, just literally going halfway around the world. Distributing binary content like images and files has been handled by content delivery networks. Effectively servers that sit out near your ISP, which distribute your blobs binary large objects out to the edge. But databases where all of your well, your data is stored, scaling those out and making them globally available has always been a challenge. So along comes Cosmos DB. And what Cosmos DB is wrapping up and packaging for you is a fully managed database solution. And the idea is that it is geographically scalable and also instantly scalable as well. So it might not also be that you need things spread out over long distances. You may need incredibly high throughputs. So imagine things like social media sites where users are submitting thousands or tens of thousands of, say, posts per second. You can imagine everybody on, I was going to say Twitter X posting replies to other users. The amount of data that you have to ingest and store, you need high levels of scalability there. Now ultimately you've got web servers and proxies that sit on the edge and load balancers, which help to funnel that traffic into your sort of inner platform and solution. But you have to store that data somehow. At some point, somebody wants to be able to go back and read their tweets or their responses or browse to somebody else's. So we need to be able to store those. And traditionally you would start off with a relational database and that's effectively where you've got tables of data that have relationships between them. So, for instance, a post may have many replies and they are two tables and they are related together through that parent child relationship of one post can have many replies. As a basic example, we've now got a newer way of thinking about storing data, which is called NoSQL. And it's effectively the ability to store data, usually JSON Blobs of data, but with very little relationships to other streams, collections or tables of data. So where we would have traditionally normalized data, split it all apart across multiple tables and multiple collections. NoSQL basically flips that on its head and stores everything in Blobs. Effectively, I've probably just butchered. We could have a whole episode on NoSQL and relationshipal databases and how they work, but essentially relational databases, more traditional in approach, still very relevant. I'm not saying that they're legacy and outdated. I think that would be unfair to say. And then you've got NoSQL, which is a more it's a newer technology, but it approaches storing data in a different way. So you've got these two different types of data storage mechanisms and in some applications you have a mix of both because some are better in one area than another. So what Cosmos DB is really built from the ground up to be a sort of a cloud first and modern approach to a database platform because in Cosmos DB you can actually pick from different types of database technology and effectively you can mix and match them. It's kind of a bit bizarre. So I will go through each of the APIs later. But imagine that you create a Cosmos DB account and then you can fire up any different version of an API independently of another. So it's quite weird in the sense of if you just brought up a SQL Server previously, you would just get an Azure SQL Server. But a Cosmos DB could be it could be NoSQL, it could be table storage, it could be postgres. There's different APIs that are available. And you know, traditionally you might say, oh, well, there should be a Cosmos DB postgres edition. And it's a completely separate sort of resource. But this really sort of groups them together and has them under one. Which can make it a little bit confusing really, because two people that are managing Cosmos DB accounts could be using completely different technologies, right? And I believe what Microsoft's trying to do here is to give development teams flexibility so they haven't approached it in the way of here's Cosmos DB. This is something that you have to learn and understand and SME and skill in in order to get the benefits of this global availability and scalable platform that we've built. You can use our we'll call it like our newer Bespoke approach, our NoSQL implementation. Or you can also use one of the APIs that you currently use. So from a migration perspective, that's really good for application development teams because in theory they can reuse the SDKs they currently use. So if they were currently using MongoDB as an example, and maybe they were hosting it themselves and they've realized that they've got to the edge of what they can do with high availability and making it globally available. In theory, and don't quote me on this, but you could effectively point your application at a MongoDB account with Cosmos DB and it would just work. Obviously, you might need to pre fill it with some data or migrate some data or start completely from fresh, but in theory it would work. And this ability for Cosmos so you've got the API perspective, which is really important. But really the main benefit, the key benefit is really the scalability. So being able to have multiple nodes geographically spaced that are synchronized together in a fully managed service in a more traditional setup, saying that I want my primary region to be the UK and I also want a read replica in America or a live live replica in America. And being able to know, enable that with a couple of clicks takes away a huge amount of infrastructure overhead and know that's that's what you're getting there. You're also know the throughput and the speed element is being handled by Microsoft as well. So provisioning of resource, scaling that resource to meet your demand, we'll talk about billing as well. But the level of scalability that you can get to in throughput is just insane. And also because it's got this sort of multi node architecture, this scalable architecture, it's got one of the highest business continuity and SLA availability levels of anything in Azure. It's got 99.99, which we call fifty nine S of SLA and availability, which for cloud and well, I'll call it for cloud, that is a very high benchmark. I'm not sure there's much else that's higher than it. I think it's like Azure DNS might be higher because I think that's supposed to be completely fault tolerant. So it's really high up there, basically, and really as well, because it's based on sort of a consumption model. We will talk about how you license it and how throughput is defined. But you've got the consumption scalability element of it. You can start with there is a free tier that gives you a lot of benefit, and then you can scale up and consume as you go. So you've got the ability to react to spiky and scalable traffic, but also manage your costs and understand have flexibility in the levels that you want to go to.

17:21

Wow, okay. That's definitely a lot of information there to digest. You got these other sort of types of databases, APIs sort of based into it. Do you kind of feel like. The.

17:38

Cosmos DB account is kind of like the management plane to help you do that whole like, you've got your management plane and then, hey, right now I've got a database, whichever sort of type, it's now in that ecosystem, that management plane. So then you can do what you need with it, like you said. Does it kind of feel like if.

18:00

You the way that I sort of think about it is that Cosmos DB is the wrapper, it's also the platform that you build on top of if that fabric yeah, that's probably a good word for it. The fabric that you sit on top of, it's giving you the ability to scale horizontally, globally, and also vertically as well. It's handling all of that for you and you're plonking on top your API or APIs that you want to run on top of it.

18:29

Yeah. Okay, cool. Okay, so can you talk us through the different APIs options then?

18:37

Yeah, so the first one is API for NoSQL, which effectively is Microsoft's own approach on NoSQL. So there are other NoSQL technologies, and we'll talk about some of those as we go through. But effectively it's a document store, so you load and you retrieve documents. But what is kind of interesting about NoSQL from Cosmos DB is you can actually write SQL. It's a bit weird, even though it's called NoSQL. So I don't really want to delve into it too much. But one of the sort of complexities about moving to a NoSQL system is, is the way that you filter and retrieve data. It's different from a relational and more traditional database management system. In a traditional database management system, you would write SQL and you could filter your results and you could mutate the data, you could join the data. You can do very different things with it. So when you move to a NoSQL system, you sometimes lose the ability to do that. But what Microsoft did is they allowed you to write SQL queries. It's not full SQL, you can't do everything, but it gives you some real power, especially if you come from a SQL background. So they added a NoSQL API, but they didn't forget about the types of enterprises and SMEs that they've been training for years that use SQL Server. Right there is sort of a stepping stone to pure NoSQL, but I think it's actually better than pure NoSQL personally for me, but that's because I know those tools. The other part of it as well is that you can do what's called stored procedures. So a stored procedure in the more traditional SQL Server sense was a TSQL script, I suppose is the right thing to say, which allowed you to do many different SQL operations one after another, sort of compute a result and then have an output. They don't use TSQL in Cosmos DB. They use JavaScript. Because you're working on JSON documents. It does make sense to use JavaScript there because the interaction between JSON and JavaScript is like first party, so it makes a lot of sense. So there's a lot that you can do in those stored procedures. We use them to mutate data because it's very easy in JavaScript to mutate and transpose data, especially if you're using low or no code solutions. API for NoSQL. Great. If you're looking for a NoSQL data store, but you still want to sort of have some of those niceties know SQL Server gave you, NoSQL is a really great option for you. The next is MongoDB. So MongoDB is I believe it's open source, don't quote me on that. I think it's open source. I don't know what the license is like, I haven't done MongoDB for years. But MongoDB is a NoSQL database storage mechanism. It doesn't store in JSON, it stores in Bson, but MongoDB is a completely independent database system. Now what's interesting is here, this is where you're starting to get these third party APIs that you can integrate into your previous application. If you ran a MongoDB instance and you needed scalability and you wanted to take away the Ops and all of that sort of stuff, ops being operations, you could effectively add MongoDB to your Cosmos DB account and you could connect to it and use it in effectively the same way that you did before. You just don't have to worry about scaling high availability georeplication, setting up multiple write locations, management of shards. There's lots of technology that's going on and being handled for you if you currently use MongoDB. So something to definitely look at PostgreSQL. PostgreSQL is the next element, obviously a massively popular database management system. And Postgres, it's quite interesting because it is a relational database management system, but it does actually have its own NoSQL built in column type. So it's a bit of a jack of all trades, I'm not going to say master of none because it is a master of pretty much everything that it does. But again, like MongoDB, if you're running Postgres, then it may make sense to hand off the management of it for you. It's using this, I think it's called the Citus open Source Wrapper on top. And Citus, I believe, gives Postgres the ability to horizontally scale and distribute. So this is again, Microsoft packaging that open source functionality and allowing you to integrate with it. The next two I have absolutely no experience with because they are not database management systems I've ever even tested. Actually, that's a lie. I did once try to run Apache Cassandra and I couldn't get it to work. So I just noped, out of it one weekend. But if Apache Cassandra or Apache Gremlin are your thing in your bag and you don't want to manage them, then Cosmos DB supports both of those. And I believe that Apache gremlin is a graph database. We will not talk about Graph databases and what they are in this. That is another episode in itself. But if those types of technologies are what you know, and in your tool bag, in your kit bag, then you have the ability to use them there. The last API is a more basic NoSQL. They call it the Table API and it's basically for key value storage. But what is cool about the table storage system is I believe its API and SDK are intercompatible with the table storage that's inside of storage accounts. So if you currently use table storage in an Azure storage account, then you could in theory migrate your data to Cosmos and use the same connector, but get the scalability that is bigger than one storage account.

26:22

Okay, cool. I guess my kind of statement before you sort of dived into them kind of rings true. Again, it makes more sense that Cosmos DB is that fabric to help you do that scaling and almost like your broker in some form to your multiple database capability. And you're right being able to have those database types or APIs that you may have on Prem or may have in Azure on their own and need to do that scaling to, in effect somewhat migrate it in some form to Cosmos DB to give you that global flexibility or scalability seems insane if you think about it.

27:11

Right? If Cosmos DB, the actual thing that it's bringing you, the feature that it's giving you is the scalability and the high availability as an example. Right? And we know that those two things and that's intermingled with the performance, those are three really hard things to get right at scale, right? So Microsoft approached it in the way of and there are other clouds and other technologies, right? But I think what Microsoft have done is they've gone is they've said, right, we'll build that fabric, that base, that foundation for you and all of this functionality. But then thought, what's the highest barrier to entry. Oh, somebody, you know, learning our technology, right? So they went with NoSQL, but kind of made it similar to Azure SQL. It's not the same thing, but there's some of the same DNA and elements there. There's like what? Four different open source database management system, some of them relational as well, postgres. So just having those four is a massive amount of the overall market, right? So if you've got your site reliability engineers and infrastructure engineers going, we need to move to a managed platform, then they're talking to their application development teams going, we want to use X technology. It's not a fight at that point going, oh, we've got to reskill and we've got to learn all these new technologies. If you're using Mongo and using Postgres, in theory and big Asterisks, right, because this is just a podcast you're listening to and we don't know anything about your application, but in theory you're connecting to something that's very similar to what you do today. So it's not a reimagine of your toolkit and all of those things. I think that's really smart. But they have built their own thing, NoSQL, and also table as well. So if you're building a completely new app and you're happy with vendor lock in, with locking into Cosmos, because Azure is your world and you're not going anywhere else, they've got you covered there from a completely new, I say new, but a more modern approach. But then you've also got these other scenarios. If you use Postgres, there's no vendor lock in. You can come and use Cosmos and you can get the benefit from it. And I know this is Azure focused podcast, but if you don't like it, there's absolutely no lock in for you. So they bring the platform, you bring your IP and your smarts, combine them together and they'll happily take a fee every month from you for giving you.

30:17

That flexibility and you're right as well around the migration to it. Because if it was just NoSQL, you'd might have to and you want that scalability, that reliability, you'd have to re engineer or remodel your data to fit into JSON, et cetera, or your tables, if that's all that was available. Where this is, like you said, it's maybe a slight API change because it may be a little bit different, but accessing your data and the format of it and your data model is just plow it into Cosmos DB in your MongoDB database.

30:56

Yeah, exactly. Okay, so we kind of talked about scaling and things like that, about how it can do it, but how does it actually handle it.

31:07

Okay, so it's probably worth talking about throughput and how that's handled as a starting point. So this does depend on the APIs that you're using. So I'll talk about both because kind of like databases, there's two ways that throughput and resources allocated. Let's talk about V core because that's probably the most simplistic to start off with you effectively provision virtual cores and you can scale those virtual cores up and down to meet your requirements. The other avenue is what's called Request Units or Rus. And that's really where you define it's very similar to DTUs in the database world. If you're using Azure SQL, you'll probably know about DTUs. But effectively a Request Unit is a representation of the resource required to do various operations. So I'll give you an example. A read may be just one ru. If you do an insert, that might be two because you're inserting and then you're reading something back, returning something back, an upset, the same, a delete maybe and then a query. So if you're querying the data in your data store it could be a variable number of resource units because you might bring back multiple results in one go. You might be modifying the data in some way. And effectively what you can do is you can work in a number of different ways. You can provision your throughput if that's what you effectively want to do. So you can do it in increments of 100 Rus and you can effectively just keep notching that up and just have it at a flat scale. So if you know that your application uses 100 Rus and it never changes, let's say you've got some internal I don't know, you've got some batch processing I don't know, something that just consistently that's very in my opinion, I've never seen that before. So it's not a thing that I've ever had to deal with. Any traffic or throughput that I've seen has always been highly variable. So I've never had the luxury of doing that. But you can effectively set the amount of Rus. And then there's the Serverless mode. You don't assign any throughput and effectively it just scales up and down to meet the demand of what you've set. Serverless is great for that flexibility of scaling up, scaling down rapidly. But a word of caution there when you have quotes infinite scalability you'll meet any demand that is asked of you, if that makes sense. Right? I'm thinking denial of service attacks, x, Y and Z, things like that. And then there's also an auto scaling mode as well. So you can sort of get a blend of both of those and just notch up and down as you go at different times of the day and bits like that. Some things you need to think about with Rus is sort of the items that you're querying. Like the size of the items increases the number of Rus that are consumed, whether the items are indexed indexed. There's a whole host of different metrics that go into the calculation of how many Rus are consumed. Rus are also provisioned on a per region basis. So if you have provisioned, what you can do in Cosmos is you can have two regions that mirror each other effectively an active active failover or you could have a read write replica. There's different ways of handling that, but you can effectively provision it's. Not I ops, sorry, Rus in those different regions. I won't go into it in depth, but you can define your consistency model, like how multiple nodes become eventually consistent. You write to one node and how that then propagates over to another node dependent on how you have that set up can increase or decrease the amount of Rus that are required. I'm not going to go into it because this is just a podcast. There is also a free tier. So everybody, I believe, gets the first 1000 Rus per second. Yeah, that's probably worth saying. An Ru is Rus per second is the metric. So you get your first thousand Rus per second and 25GB of storage for free. And that is enough because I think the minimum provisioned storage for a container of data I think is 400. Don't quote me on that, but I believe it's 400. Yeah, I see the 200. I'm sure it's 400. So it basically gets you at least two containers and the wording might be different depending on API, et cetera. So you basically get a really good start and especially if you're using like NoSQL because you can store different types of data in a single container and there's many different things going on there. But yeah, you're going to get that. And I believe it's per subscription, I think. If I remember rightly, I was just.

37:28

About to ask, I think it's per subscription. So it's not like you can go free everywhere kind of thing. It is per subscription.

37:36

Everybody would just have one container and then have like 1000 storage accounts. Right? And what you can also do is you can limit the total amount of throughput for a whole account, so you can set it at the per database and container level. So you can be very granular about which databases and which containers of data have the amount of resource they do. And you can set a maximum limit as well. There is a calculation to convert V cores to Rus. I can't say that I've ever used that because everything I've ever really done in Cosmos DB has been quite low throughput. I've never got to thousands or even tens of thousands of requests per second. That's just with my experience with Cosmos, I've just never got to that. That's really about how throughput and that's really all you tweak in the portal. You basically say where you want your throughput and provisioned resource to be. Maybe you want it in one place, maybe you want it in two places. You tell it how it's consistent and how they talk to each other and then you don't do anything. You wait for the portal to create your resources and then you can start building on it. Everything under the hood is completely handled for you. What's quite interesting about that is for NoSQL databases, horizontal. Scaling and replication is built into a lot of those technologies because they're more modern technologies. But I think what is particularly impressive is Microsoft's wrapping of postgres and relational databases and making them globally distributed because that is not a simple task that is getting in traditional relational databases. There's a thing called Acid guarantees, don't quote me on the acronym, but it basically guarantees that your transactions and your writes and reads from the databases won't clash with other people as you go to effectively update and read data. So having Acid compliance globally is like a very complicated thing in my opinion. So the fact that you can just provision it and get cracking is really powerful.

40:37

Wow, okay. And I think if I remember to get that SLA, the five Nine S, I think you have to have it at least in two different regions, isn't it? I think that's what you need. And then it's that across Excelling out. So I think single one isn't too bad either, is it? It's not the worst.

40:56

Yeah. I don't know. Is it four nines? I think it's four. Can't remember it's either three nines and a half or four nines. I can't remember what it is. But I believe you have to have a secondary write region, I believe, not just a read replica, if I remember rightly. But I must admit, you know that's that Word document that has the SLAs in it, I can't remember the name of it, but every single time if you search for Azure SLAs, there's SLA Charts.com or something like that, which is a really great graphical interface of it. But there's also this Word document which is like the worldwide SLA doc and I have to download it like twice a week and it doesn't update that often. I just don't keep a copy of it. I just go to the website again. But that's got a really good breakdown of what you need to get the different levels and it does explain it quite well. I'm sure the docs do as well, but I just go to the actual SLA docs to get that information.

42:02

Okay, cool. So we know how to scale and the units kind of thing. So how is it built? How much does it potentially cost?

42:15

Okay, so like Cosmos, there is quite a few options here. But what you can do is you can use request units for NoSQL, Mongo, Cassandra, Gremlin and Table. You can do Vcore for postgres and MongoDB. So postgres is only Vcore and as we know about Vcore in other areas, you're going to have quite a high barrier to entry with Vcore. It's probably worth just calling out right now. What's the best way of me explaining this on a podcast? I'm not going to go into the detail, I don't think, because it's a matrix of options here. Let me go through the high level areas that you need to think about. You need to think about whether you're going to auto scale your provisioned throughput. You're going to let it ratchet up and down but have provisioned sort of levels. Whether you're just going to have standard provision throughput like just Say no 1000 rus per second, it's not changing. Whether you're going to go serverless and not have to worry about any of that. Whether you're postgres and you're paying for Vcore per node and also whether you're MongoDB and you're paying Vcore per node. But I'll just give you some quoted figures in the standard provision throughput because that's relatively easy to understand. So the price per 100 rus per second of a single region is $5.84 a month. And I think we said that you had to have a minimum of 400 right, to have provision throughput. So you're looking at about 20 ish maybe $25 a month, something like that. But what's good is when you start to look at serverless provisioning because when you go to serverless you effectively pay, you pay $0.30 in dollars per 1 million rus that you consume. So there's no minimum limit, there's just what you consume. But my assumption would be, because I've never hit it at the high levels, that you would be paying more for serverless, for the flexibility at high throughputs, like we see in all other serverless technologies, when you hammer function apps or logic apps, it is more expensive. But the massive benefit is at lower scales you have a much lower barrier to entry effectively and there's like a cost per gigabyte. And I think it's worth saying that all of these prices are per region or per node. If you move into that scenario that Alan talked about is having multi nodes to get your high availability, you're going to be paying this in each region. There's also auto scaling provisioned throughput and this is quite a nice way to go as well because what you can basically do is you can I think the minimum that you've got to assign is 1000 ru, but it can scale down to 10% of that. So it can scale down to 100 ru, which is lower than the 400 minimum of the standard provision throughput. So if you provision 1000 ru that's going to cost you in a single region that's going to cost you around $60 a month. But if you're not using 1000 ru, it will auto scale down to 100 ru and it will manage that for you and you'll only be paying about $6 a month for it, but it will also give you the flexibility to at any time scale up.

46:38

So that sounds like you have your sort of max, what you think your throughput is what your peak is going to be and then all you do is when it's low, you pay in effect 10% of it when it's at its lowest point. Exactly. If you had 2000 rus then you'd pay minimum of 200 ru sort of thing. Yeah.

47:02

Because you can have as many containers and accounts as you want and they're all like $6 a month, but you've also got the ability to scale up, right? So that's a great starting point. Basically, we've just talked about individual regions. Basically there's also a difference in price in terms of how you set them up. So for instance, if you have a single region, right, but with a data distributed across multiple regions, you're paying that price per region. But it's probably worth calling out if you have multiregion write used to be called multimaster, where you're potentially writing to get better throughput globally. You can allow people to write from different nodes, basically and it'll sort everything out and make it consistent. You're paying twice as much per ru to have it in write mode, but for all regions. So if you want the flexibility and you want five nines, you're paying twice as much per ru. So instead of for 100 ru scale down, you're not paying $6 a month, you're paying $12 a month but you are getting a globally distributed system that has multi write support and five nines of uptime. So it seems like quite a low barrier to entry. If you're the type of organization that needs that scalability and needs that availability, that is not a high barrier to entry at all for those types of organization. And just quickly to talk about Vcore per node it is different between MongoDB and postgres, but for instance, it's just because the resources are different, but for postgres, as an example, a single V core that's burstable with 2GB of memory is $17 per month. It does jump up quite that's a burstable instance. It does jump up quite a lot if you don't go for a burstable. If you want like straight two V core with eight gig of Ram as an example, you look at $186 a month for postgres, but on the cost of hosting it in a virtual machine, that isn't, you know, crazy, but that is single node, right? Multi node. Multi node requires you to have bigger node size so you have to go to four V core per node and you have to have a coordinator as well. And the starting price of a multi node node starts at $460 per month. So if you do want that global acid guarantee and that relational side of things, your barrier to entry is going to be a lot higher. But if you're the type of organization that needs that, again, building that sort of platform yourself is going to be expensive and people cost a lot of money as well. So there's all of that to put into the mix.

50:38

Okay, cool. Yeah, there's definitely a lot of options there and like you said, some that are low barrier to entry and some that are quite high once you start getting to the more kind of thing. Okay, is there anything else that you might have think you might have missed or is there any other sort of episodes that we've done previously that might sort of tie in?

51:01

I don't think there's anything else that I want to cover. My only sort of takeaway is go and have a look at it. It might seem scary. It was definitely scary to me when I started to look at it because I was thinking, this is going to be a whole new thing and it's got all these buzwords and it can do all these things and it's going to add a lot, a lot of complexity. But actually, when you peel, and I think you've been through this more recently, Alan, once you get into actually, there's not that much you can configure in Cosmos, is there? You build it and then you use it. It's as simple as that. It's quite boring.

51:45

Like you said, the capability looks really scary, but then when you get in there, it's like few clicks and bang. I've got a NoSQL database and I can start sending stuff to it.

51:57

Yeah, exactly. Yeah, definitely. I don't have a previous episode call out because we haven't done any database thing. The only one I thought maybe about was Chaos Studio because if you're building these fault tolerant, high availability apps, you want to make sure that touchwood it doesn't. But when Cosmos DB does go down with your five nines of SLA that you've paid for isn't kicking in, you'll want Chaos Studio. And what Chaos Studio can do is it can test how fault tolerant your app is by injecting sort of fake infrastructure and platform related events to sort of introduce Chaos but manage it in a controlled way. So, yeah, check out that episode. Season four, episode eleven. Alan, your episode next for next week. What are you covering?

52:56

Yes, so I'm going to be covering Microsoft entra external IDs. So I think there's some enhanced features that have come in with this version because previously this was called Azure ID, b to C, business to customer. And I think with the sort of venture rebrand and some of the other stuff that's come out recently, the external ID has now had sort of a feature enrichment and yeah, we'll just go through that. It's not something I use every day, but, yeah, we'll definitely dive into it and continue our tour of Microsoft Ventra. Definitely.

53:34

Okay. Yeah, sounds great. I do love B to C. It's a great product. It's not called B to C anymore. I should use its new name, but yeah, it's good. It's going to be a great episode. Cool. Okay. So did you enjoy this episode? If so, please do consider leaving us a review on Apple or Spotify. This really helps us reach people just like you. If you have any specific feedback or suggestions on episodes, we have a link in our show notes to get in contact with us.

54:04

And if you've made it this far, thanks for listening and we'll catch you up on the next one. Yeah. Thanks. Soap.

Transcript source: Provided by creator in RSS feed: download file

S4E17 - Cosmos DB - a fully managed NoSQL and relational database for modern app development

Episode description

Transcript