Snapshots

Michael

00:00

Hello and welcome to Postgres.FM, a weekly show about all things PostgreSQL. I am Michael, founder of pgMustard, and as usual, this is my co-host Nikolay, founder of Postgres.AI. Hey Nikolay, what are we talking about today?

Nikolay

00:10

Hi Michael. We are talking about snapshots and not only snapshots, but backups of large databases and RPO, RTO. More actually RTO. RTO means how much time it would take for us to recover from some disaster from backups. So, is it minutes, hours, days? For example, let's take some hypothetical database of 10 terabytes and consider what we would expect from good backup system in various managed Postgres situations and also self-managed in cloud maybe bare metal as well okay pardon dirt sounds

Michael

01:00

it's a nice it's a nice sound

Nikolay

01:02

it's spring I've got to close the door and they decided to have a nest again here. This is what they try to do every time.

Michael

01:11

So yeah, good topic. Why is it on your mind at the moment?

Nikolay

01:15

Well yeah, it's a good question. I just observe larger and larger databases and I also observe different situations of managed services recently and I see that it's really a big pain when you achieve some level of scale, maybe not 10 terabytes, maybe 20 or 50 terabytes. And at this level, it becomes a big pain to do various operations, including backup, I mean, recovery, first of all, backups as well.

01:47

We will talk about them as well, but also provisioning of new nodes, provisioning of replicas or clones. It becomes more and more difficult. And the only solution I see is snapshots. And we have 2 big, I mean, 2 most popular backup systems right now in Postgres ecosystem. They are pgBackRest and WAL-G, in my opinion. All others became less popular. This is my perception. I might be wrong. I have only anecdotal data to prove it.

02:23

But it's time for both of them to seriously think about cloud snapshots in cloud environments. Snapshots of When I say cloud snapshots, let's clarify, I mean snapshots of EBS volumes in AWS, of persistent disks, PD, SSD, or others, Hyperdisk in Google Cloud, and alternatives in other clouds. I have less experience with Azure, so let's cover only all 3, But again, I have much less experience with Azure. My main experience is with GCP and AWS.

03:01

So these snapshots, I think, must be used by those who create large database systems, maintain them, and especially if you are a managed Postgres provider, it's inevitable that you should rely on these snapshots of disk and restoration. And, of course, I think AWS RDS Postgres and Google Cloud SQL, they already do it. I suspect Azure also should do it, but we have many more managed Postgres providers, right?

Michael

03:39

So many now, yeah.

Nikolay

03:40

Right, and also we have some people who self-manage their Postgres clusters. And also we have, of course, many users who use RDS or Cloud SQL or Azure. And I think it's good to understand some details about how backups of large databases are done and the role of snapshots in those backups.

Michael

04:03

So yeah, so I understand the attraction for huge, huge databases in terms of recovery time, but are there also benefits for smaller databases as well? Or is it, are we just fine just recovering from a normal backup like a pgBackRest style or WAL-G as you say I know their speed advantages or smaller sizes as well

Nikolay

04:29

that's a good question well yeah let's let's just let me walk us through the steps of backup and recovery.

Michael

04:41

Good idea.

Nikolay

04:42

With WAL-G or pgBackRest, Let's forget about Delta backups. Let's consider like the basic the most fundamental Process. So first of all, we need to take a copy of data directory Bring it to archive to backup archive. It's usually in object store like AWS S3 or Google Cloud, GCS. Right. And in Azure, it's blob storage or how they call it. And this copy of data directory is not consistent by default. And it's okay. We can have inconsistent copy.

05:25

We can copy data directory manually with rsync, with cp. It doesn't matter. It will take time to copy. And usually roughly like 4 terabytes, it's roughly like 1 hour or half an hour. If it's faster, you're lucky, you have good infrastructure. If it's slower than a 1 hour, something should be tuned.

05:46

But if you have 10 terabytes, so we're ready to talk about 5 to 10 hours If you have 50 terabytes, it's like more than 1 day right to call and It's inconsistent, but it's fine because we do it between wrapping this process up with 2 calls, pg_start_backup and pg_stop_backup. These 2 important calls tell Postgres that it should keep WALs that will be used in recovery. It's understood that the directory copy is not consistent, but we have enough WALs to reach consistency point.

06:26

So when we will recover this backup, full backup, it will be copy back to some normal disk, to another machine, right? And enough WALs and replace some WALs to reach consistency point. And here's some problem.

06:46

When you recover, sometimes you think, oh, how long it will take to reach consistency point and I have a recipe in my how to set of how to articles how to understand current position in terms of LSN and how long it is left to reach consistency point because while those WALs are applied Any connection attempt will be rejected right and we need Postgres needs to reach consistency point to open the gates and accept connections. And I think there was some work to improve logging of this process.

07:28

And maybe It will happen in Postgres 18. I saw some discussions and some patches, I think. But before that, while we still don't have transparent logging of that process, I have some recipes, so you can use them to be less anxious, because sometimes many, many minutes you wait and don't understand and it's not it's not fun right and that's it this is full backup but it's only part 1 because it will reach consistency point and it will represent backup for some time in the past.

08:07

But we also need to have to reach the latest state of database. And for that, we need a second part of backups of archives is WAL archive. Write-ahead log files, by default they are 16 MB, so on RDS they are tuned to 64 MB. Most clusters are default 16 MB, so this small file is also compressed in 1 way or another. So it takes not 16, but say 7 or 10 or 5 MB, depending on the nature of data inside it. And then when we recover, we reach the consistency point, and we start replaying WALs.

08:56

Postgres starts replaying WALs according to restore command configuration, which just tells fetch next WAL, and we will apply it, right? Postgres will apply it. And there's prefetching, because, of course, object storage is good with parallelization. It's not a good idea to use just a single process to fetch a file sequentially.

09:21

We can ask to fetch 10 files in parallel to move faster, because in a heavily loaded system sometimes we see the source, the production, can generate many, many WALs per second. And recovery, it can be like dozens of WALs per second, depending on hardware and so on. So it's worth fetching very fast. And this process can take a long, of course, like after we did full backup, many WALs can be archived until the latest point. And you can restore to the latest point or to some point in time.

09:56

It's called point-in-time recovery, right? You can configure, I need this timestamp or this LSN, and you need to wait. So, of course, it's good if you have quite frequent full backups. It means that WAL replay, this additional WAL replay, because there is WAL replay to reach consistency point, but there is additional WAL replay to reach the latest or desired point in time. It will be much less. But usually, usually like to replay whole day of WAL, it takes not a whole day, definitely.

10:29

It takes like maybe like an hour or 2 hours, it depends a lot on a particular case. But usually it's much faster than to generate those WALs. And to replay 7 days of WALs, Well, it might take up to 1 day, maybe less, it depends again. So people need to configure the frequency of full backups and understand that up to the whole period between 2 backups, we might need to replay this amount of WALs. Right. This affects recovery time as well. RTO, recovery time objective.

11:12

Right. And To improve that situation, both pgBackRest and WAL-G support Delta backups. Instead of copying whole 10 terabytes or 50 terabytes to GCS or S3, they copy only delta compared to previous backup. And then recovery looks like restore full backup and apply delta backups. It's just some diffs of file basically, which is supposed to be faster than WAL replay.

11:46

And also much cheaper because you don't need to keep, for example, for 7 days, you don't need to keep 7 days of full backups, but maybe you keep only 1 full backup plus deltas only. It's already optimization of budget and also optimization of our RTO, because replaying WAL is slower than applying those deltas. However, we still need to start and have first full backup fetched and applied. And it's definitely slower than if we had this full backup made recently, so we just apply this and so on.

12:27

Anyway, if you use pgBackRest or WAL-G, they basically ignore the existence of another alternative. Instead of copying data directly to object storage, you can use cloud-native capabilities and snapshot disk. Snapshotting disk is also bringing data to object storage in the background, but it's on the shoulders of cloud provider, AWS or GCP or Azure. And they have this automation, they use it in many processes, and these days it's already quite reliable in most cases.

13:08

Of course, this is, by the way, an objection I hear. Like copy whole directory to object storage, it's reliable. What will happen with those snapshots, who knows? Well, yes, okay. But a lot of mission-critical systems already use them for many years, right? And it's also copied to object storage but done in the background. And it's done... It's like... It's basically... You create a snapshot, it takes only minutes for a large volume, like 10 terabytes, 50 terabytes.

13:46

And it also is like, there's also a concept of delta snapshots. Like it's, it's incremental. So there's a 1 snapshot and then additional snapshots are done faster and take less space and object storage. So you pay, you pay less. And again, this is a responsibility of a cloud provider like AWS or GCP. It works. Then you can recover, restore, snapshot, restore takes only a few minutes. This is the magic.

14:17

And restore takes only a few minutes, even for like 50 terabytes, it can take like 10-15 minutes. Because I'm speaking about GCP and AWS here, There is a concept of lazy load. So it looks like disk already exists and data is already there, but in background it's still pooling data from object storage. And that's why it's good, recovery is fast, but It's slow. I mean, you try to SELECT from table, but it's slow.

14:48

And this concept is very well known for people who use RDS, for example, or work with EBS volume snapshots. It might be bad in some cases, and there is trade-off. You may say, okay, I have a recovery time of a few minutes only. RTO is perfect in our case, but we will be slow originally. We will need to warm up the disks. Or you can say, OK, recovery time RTO is bad, but we are fast from the very beginning. Well, we need to warm up buffer pool and page cache anyway, but it's a different story.

15:27

It usually takes only minutes, not hours definitely. So that's the story about snapshots. And I think that I just observe people have tendency to like snapshots and prefer them both for DR and just copy, cloning of nodes for purposes of creating replicas, standbys, or forking, anything. Once they reach level like 10-20 terabytes. Before that, it's not that obvious that snapshots are beneficial. Because, okay, we can wait a few hours.

16:10

If you have 5 terabytes, okay, we can wait 5 hours or 2 and a half hours for backup. We can wait to recover to a couple of hours, not big deal. But once we have 10, 20, 30, 50 terabytes, it already exceeds 1 day. It exceeds 8 hours of work, working day, right? So it already becomes a two-day job for someone. Yeah. To control this process, right? This is, I think, psychological threshold here.

Michael

16:44

But there are, I'm just thinking there are a couple of other use cases where that time does matter like if you decide that you don't want to go down the logical replication route for major version upgrades for example and you're willing to take a certain amount of downtime one of the steps is to clone your database right do the upgrade on the clone and then switch over as a as on a cloud like a lot of cloud providers that would be quicker or less time

Nikolay

17:12

why do you need a clone

Michael

17:15

I think if you do it in place then you're down the whole time. Whereas if you do it on a clone and then transfer any changes across that customers have made in the meantime, you might be able to minimize downtime.

Nikolay

17:26

How do you transfer changes?

Michael

17:28

Script. Let's say it's not super busy.

Nikolay

17:33

It's an interesting approach. Yeah. Sounds like reinventing the wheel of logical replication so you need if you have complex system how will you understand the diff the changes I'm

Michael

17:48

only thinking I'm thinking mostly about quiet systems where it's unlikely that there are that many changes or maybe 0. Like if you're doing a quiet time and it's like an internal system.

Nikolay

17:59

Well, for me there is in place, there is in place upgrade and there is 0 downtime involved. And logical has variations, actually. But what you propose, it's kind of like... It breaks my brain a little bit.

Michael

18:15

Well, all I'm thinking is, I can imagine cases where Being able to do take a clone in a minute or 2 Instead of it being 10 or 20 minutes is actually quite attractive for the same reason you mentioned about 8 hours It's a different I would stop doing another time. If something's gonna take 20 minutes, I'd move on to a different task. But if it's only gonna take 1 or 2 minutes, I might only check my emails, you know, like there's a very different

Nikolay

18:40

It might be you need some to do some research on the fork on clone I call it clone some people call it fork you you need to verify for example you want to verify how long will it take to upgrade, in place upgrade. For testing, it's good to have identical copy of the same machine and just run all the detail like and see the log and how else can you check it right and for if forking takes a day

Michael

19:12

well

Nikolay

19:13

and it's frustrating yeah it's frustrating But if it takes only dozens of minutes, it's good. The only catch here again, there will be lazy load. Lazy load, right? So you need to understand like you okay, you have a fork it pretends it's up and running. It's up and running, but it's slow. Does that

Michael

19:35

invalidate the test then?

Nikolay

19:37

Yeah, some tests will be invalidated for sure. Maybe not upgrade. For upgrade, what I would do for test of major upgrade in place with hard links, I would just in this case warm up only the system catalog part. Because this is what pg_upgrade does, it dumps and restores system catalogs. Then I would ignore the remainder of data knowing that it will be handled by hard links. So this is my warm up. It's super fast. So this is a perfect example then. But we need to understand details.

20:18

So we forked during 10-20 minutes, I mean snapshot, we restore snapshot, we have up and running Postgres, we dump schema couple of times, well 1 time is enough actually. And then we run pg_upgrade -k or --links, and see what happens, how long it takes for us. And We are not worried about data actually still being in GCS or S3 or blob storage. We know it doesn't matter in this particular case.

20:56

But if it's something, for example, you want to check how long it will take to run pg_repack or VACUUM FULL or pg_squeeze on a large table without additional traffic. It's also useful data. You know that additional traffic will affect timing in a negative way, but you want to measure at least in an empty space.

Michael

21:18

Or even adding a new index. Like,

Nikolay

21:21

how long would that take? Yeah, In this case, the fact that we have a lazy load problem with Snapshot Restore, well, it will affect your experiment for sure. You will have bad timing in production, you will have good timing, and you will regret you did experiment at all. You will stop trusting those experiments if you don't understand lazy load, this problem. If you understand lazy load, you first warm up particular table, just doing, I don't know, SELECT everything from it, right?

21:52

And then create index and measure, right?

Michael

21:56

So yeah, just to get an idea, how much slower are we talking? Is there some rough... How much slow? Let's say you SELECT *, how much slower would you expect it to be while it's lazy loading still versus on production?

Nikolay

22:11

Oh, it's a tough question. I don't have good examples, recent examples. First time I heard about Lazy Load was 2015 or something. This was the first time I was working with RDS. This is when importance of working with clones started to form in my mind.

22:32

I was super impressed, like how fast we can clone a terabyte Database and start working and experimenting with it but I was upset why it's slow so I reached out to support and they explained lazy load and their articles by the way AWS explains this quite well lazy load. There is explanation in various levels, I think, both at EBS, so basic level EBS volumes, and also RDS. But I must admit, Google Cloud doesn't do a good job here in terms of documenting this. I know it exists.

23:06

I know they admit it exists, but it's not well documented. There's a good opportunity to catch up here in terms of documentation and understanding. As for a question of duration, I don't know. Definitely it can be frustrating. SELECT, for example, count star from a table, if it's a sequential scan, it will have the same effect for warming up, right? It will not spam your output.

23:33

Well, there is a different way not to spam your output is just SELECT from table without specifying columns and maybe also somehow so that the race

Michael

23:44

EXPLAIN ANALYZE

Nikolay

23:47

yeah by the way yes yes yes so, so I your timing will be really high in that EXPLAIN ANALYZE if it's enabled I hope it's enabled so I think if like SELECT count star is super slow in Postgres, we know it, because it's raw storage and all the things, MVCC and so on. But here it can be order of magnitude slower.

Michael

24:14

Makes sense.

Nikolay

24:15

Maybe 2 orders of magnitude. I don't have numbers here in my head.

Michael

24:19

Yeah, I was just trying to understand if it was 10% or double or an order of magnitude. No, no, no. It's really

Nikolay

24:24

fetching data from object storage in the ground. So it can be frustrating, this is I can tell you. But you can, again, you can, if you aim to warm up a node just restored from a snapshot, a Postgres node, you can again benefit from the fact that object storage is good in terms of parallelization of operations. You can warm it up, not just single connection to Postgres. You can run 10 or 20 parallel warming up queries. And This is a recommendation on how to warm up because of lazy load.

25:05

By the way, AWS also offers a few years ago, they implemented it. I think this is called Fast Restore, Fast Recovery, something. Fast snapshot recovery, FSR or something. I might be mistaken. But they, if my memory is right, they support up to 50 snapshots marked as available for fast restore in the region, I think, for a particular account. So you can mark some snapshots and lazy load problem will disappear for those snapshots. This is a good thing to have.

25:41

And I'm not aware of similar thing for Google Cloud or Azure. But this makes, like, once per, I don't know, week, you have a good snapshot which you can recover very fast and then replay WALs. But you you have super fast recovery for experiments. It's already super cool super cool Right.

Michael

26:06

What about for the actual record like the RTO discussion? What are you seeing people with dozens of terabytes opt for in terms of that trade-off? Are they happy to accept things will be slower, it's kind of like a brownout situation for a while, but they're back online at least? Or do they prefer longer RTO but everything works exactly as expected once it is back online?

Nikolay

26:30

Well I think nobody is happy when recovery takes 10, 20, 30 hours. Nobody. Of course. Right? So this becomes a problem and the characteristics of the system become very bad. I mean, this is not... It becomes not acceptable for business also to understand that we will be down a whole day or 2 Just to recover from backups But before business realizes it operation engineers realize because it's really hard to experiment.

Michael

27:03

So, well, and the cases where this happens are super rare, right? Like we're talking, we're not talking about failover, like with all of these setups would generally be like high availability anyway, right? So we're talking only a very specific class of recovery where you have to recover from backups, like there's no failover situation, which I'm guessing is pretty rare. People have to think about it, but they don't face it very often.

Nikolay

27:28

10 plus years Cloud environments were super unreliable and failovers happened very often. Right now, again, at least for Google Cloud AWS, I see nodes can be running for years without disruption. So it's much better right now in terms of lower risks. Well, again, we have a habit to consider cloud environments as not super reliable. And for example, It also depends on particular instance classes and the sizes, I think.

28:06

If you have small, very popular instance family, very small, migration might happen often. For example, we were down because we are on GCP, I mean Postgres.AI AI. We were down because Kubernetes cluster was migrated again, like all nodes were migrated. And then we had a problem, let me admit.

28:29

We had a problem with backups because after reconfiguration half a year ago another dot history file for some timeline in future which was in the past but for us currently was in future was left in pg_wal directory sub-directory And after such migration caused by migration of VMs in Kubernetes, we couldn't start. Because promotion happened, timeline changed, and there was basically a rogue file from previous life corresponding to the same timeline number. It was a nasty incident.

29:08

I think we were down like an hour until we figured out, unfortunately, misconfiguration. Yeah, it's a bad problem. But it was good that it happened only on Saturday, so not many users noticed for our systems. And I must thank my team for a couple of folks who jumped very quickly to Zoom and helped to diagnose and fix. So yeah, back to questions. In my opinion, again, snapshots are inevitable to use if you reach dozens of terabytes scale. And again, There are several levels of trust here.

29:47

First level, are they consistent? Answer is we don't care because we have pg_start_backup, pg_stop_backup. Second question, are they reliable and what about possible corruption and so on? Answer is test them and test backups. You must test backups anyway, because otherwise it's a Schrodinger backup system. So we don't know if it will work. We must test. And once enough number of tests show positive results, trust will be built. Right? And I have several...

30:23

You know, in the beginning of the year I announced that every managed Postgres provider must consider, or highly recommend, to consider pg_wait_sampling and pg_stat_kcache as very important pieces of observability to extend what we have with pg_stat_statements. Even if you are RDS and have performance insights, still consider adding pg_wait_sampling because this will add very flexible tooling for your customers. Now I say for backups of course these Big guys like RDS already use snapshots 100%.

31:04

We know it indirectly from the suffer of lazy load of myself in 2015 or 2016. And now all competitors of RDS or alternatives to RDS, please consider snapshots, right? Because I know you don't yet so you you it's it's worth because your customers grow and once they reach 10 20 30 40 50 terabytes they will go to RDS or self-managed and stop paying 100% because it's a big pain. We just described this pain. And now those who self-manage also consider snapshots, right?

31:46

But most importantly, developers of WAL-G and pgBackRest, consider supporting snapshots, cloud disk snapshots, as an alternative, as an official alternative to full backups and delta backups you already have. In this case it would become native. And yeah, so I think it's worth doing this. Maybe other types of snapshots. So like external external tool for snapshotting, If it exists in infrastructure and natively supported by a backup tool, I think it's a good idea.

32:21

So a backup tool would perform all the orchestration and skip some of full backups. It can be a mix. Ideal backup system for me would be a mix of occasional full backups, continuous WAL stream, and snapshots as the main tool for provision nodes and the main strategy for DR. But not the only 1. So full backups would exist separately, occasionally again, like I would have full backups.

32:49

By the way, both snapshots and backups, full backups must be done, and I see it on some systems, they do it on the primary. It must be done on replicas. They must be done on replicas because it's a big stress for primary. Even snapshot is a big stress for primary, as I learned. So, yeah. It's good to do it on a replica. And pg_start_backup, pg_stop_backup, they support so-called non-exclusive backups, so it can be done on replicas, on physical replicas.

33:21

And even if they are lagging for some time, it doesn't matter because even lag for some minutes nobody will notice because we have continuous WAL stream and replaying few minutes of WAL, it's very fast.

Michael

33:36

Yeah, I saw in RDS's documentation that for multi-availability zone deployments, they take the backups from the standby.

Nikolay

33:45

That's absolutely all, like, this is what should happen. Yeah. Yeah and snapshots also should be done from there I saw well, maybe it was specific cases, but I saw some cases when snapshot of disk of primary affected performance Because in background data is being sent to object storage and it's definitely generating some disk, read disk IO for our disks.

Michael

34:16

So I'm curious though, why have, you mentioned in an ideal world you have a mix, what's the benefit of having both? Like once you move to snapshots, what's the benefit of also having traditional pgBackRest style backups as well?

Nikolay

34:34

It's a good question. I remember reading Google Cloud documentation many years ago and it said, when we create snapshot of disk, this snapshot is not guaranteed to be consistent when if application is running and writing and then suddenly the sentence was removed from Google Cloud documentation It's it's a matter of trust, right? So Now I trust Google Cloud snapshot Snapshots and a BS volume snapshots in the edible. Yes quite a lot.

35:03

I have trust but still some part of me is Like It's paranoid, you know, so so I would keep full backups not very frequent. I would keep them just Just to be on safe safe side But maybe over time trust will be complete like if you have a lot of data proving snapshot creation and recovery is very reliable. Well, consistency, by the way, doesn't matter.

35:35

The problem I mentioned in documentation, It's not about Snapshots are consistent or no consistent or no Honestly, I don't know even I don't care because I have to just start back up and stop back up and I'm protected from inconsistent state Postgres will take care of it But it's a matter of trust like lack of transparency in documentation, understanding what's happening and so on.

35:57

So only obtaining experience over time can bring some trust and understanding the system works fine and Regular testing testing should be automated recovery testing of backups and and so on so now I think It doesn't matter if they are if they are persistent Consistent or no It matters if they work, right? If they If we can recover and then reach consistency point by our means, I mean, by what Postgres has, And that's it. But still, what if I see snapshots created, but I cannot use them?

36:39

And also bugs happen, people change systems. I mean, cloud providers also are being developed all the time and new types of disks and other changes are introduced all the time. So what if snapshots will not work? If I have very reliable testing process, if I test maybe, for example, all snapshots, I restore from them automatically all the time and I see that over the last few months Every day we created snapshot or maybe every few hours.

37:10

We created snapshot and we tested them all Maybe at some point I will say okay Full backups by a pgBackRest or WAL-G are not needed anymore. But for now, my mind is in a state of paranoia.

Michael

37:26

Very reasonable. Sounds good?

Nikolay

37:28

Yeah. 1 interesting thing here is that if you use snapshots, there is a catch for large databases. EBS volumes and persistent disks in Google Cloud, they are limited by 64 terabytes. I think some types of EBS volumes are limited by 128 terabytes, if I'm not mistaken, so the limit is increased. But 64 terabytes, if you experience problems being at level of 10, 20, 30, 50 terabytes, 64 terabytes is not far already.

Michael

38:04

Yeah, that's true.

Nikolay

38:05

And then the question will be what to do, because if you use regular backup, full backup by means of pgBackRest and WAL-G. You can have multiple disks and again, it doesn't matter. Thanks to pgBackRest, pg_start_backup, pg_stop_backup, You can copy data from multiple disks, right? It can be a LVM and multiple disks organized to give you more disk space, right?

Michael

38:43

Like table spaces?

Nikolay

38:46

No, no, no, not table space. Table space is a different thing, it's a Postgres level. If you need to have, you can combine multiple disks using LVM2. And have 1 big, basically, logical volume for Postgres.

39:07

And for regular backups, when you copy data, pgBackRest or WAL-G copies data to object storage, it doesn't matter that the underlying many disks contribute to build this big logical volume because we have only files up to 1 gigabyte in Postgres so each huge table it's shrink to 1 gigabyte files so they are copied compressed and copied by our backup tool. But if you start using snapshots, it's interesting.

Michael

39:38

Well, also people at that scale, people are thinking of the companies we've talked to at that, we had a good hundredth episode, didn't we? And a lot of them are considering sharding by that point anyway. So I guess in some cases that...

Nikolay

39:53

Or deleting data, right? That's my favorite advice. Just delete something. And keep sleep well. Yeah. But Well, so the problem is that it's not a problem, again, it's not a problem that you will create multiple snapshots. If those snapshots are created between pg_start_backup and pg_stop_backup, you will be able to recover from them. The question will be only, will LVM survive this?

40:18

And this is question not about Postgres, it's a question about the cloud provider and how like snapshot orchestration. And this topic already goes beyond my current understanding of things here. I can just suggest, I can recommend, okay, use snapshots, but if you reach a limit and you need to start using LVM, it's a different story. So you need to check how LVM will survive, snapshot restore,

Michael

40:45

and test.

Nikolay

40:48

I think it's definitely solvable because like RDS did solve it, right? So

Michael

40:55

yeah, good point.

Nikolay

40:57

Right.

Michael

40:57

Well, that's actually that's a really good probably place to... I don't know how you're doing for time, but it makes me think about the likes of Aurora or people that are doing like innovations at the storage layer where this kind of problem just doesn't doesn't exist anymore, you know, they've already got multiple, well I guess we do need to still have a disaster recovery, you know, whatever, all of Aurora's down. We need a way of recovering.

41:27

But they are promising to handle that for us in this distributed world, right? That's part of the promise of keeping copies at the storage layer.

Nikolay

41:38

Yeah, well, it's definitely new approaches where data is stored on object storage originally, not only backups, but it becomes the primary storage originally, and then there is a bigger orchestration to bring data closer to compute, But it's all hidden from user. It's interesting approach. But there are also doubts this approach will win in terms of the market share. Right, so it's interesting. Definitely it's where I don't know many things.

42:18

Maybe our listeners will share some interesting ideas and we can explore them. And maybe we should invite someone who can talk about this more.

Michael

42:28

Yeah, sounds good.

Nikolay

42:29

Yeah, yeah. But anyway, I think snapshots are great. I think they can be great for smaller databases as well. Like, small, I mean 1TB still, because instead of 1 hour, it will be a minute. Right?

Michael

42:42

I think that might be... I'm curious about that area, because I think there are a lot more databases in that range of hundreds of gigabytes to a couple of terabytes than there are companies having to deal with dozens of terabytes. But if we can make their developer experiences much better...

Nikolay

43:03

I agree. A few terabytes, I would say, it's already middle market in terms of database size. So it's very common to have 1, 2 terabytes. And snapshots can be beneficial because operations speed up significantly and RTO is improved. If you, if you remember about lazy load and know how to mitigate it or accept it.

Michael

43:31

Yeah.

Nikolay

43:31

Good. Good.

Michael

43:33

Nice one, Nikolay.

Nikolay

43:34

Thank you for listening. Again, if you have ideas in this area I'm very... I will be happy to learn more because I'm learning all the time myself and I think Michael also in same shoes, right?

Michael

43:47

Yeah, I find this stuff interesting I have to deal with it a lot less than you which I'm grateful for But yeah, definitely let us know I guess via YouTube comments or on various social media things.

Nikolay

43:59

Yeah good. Michael: Nice one. We'll catch you next week, Nikolay. Bye-bye.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript