Hello, hello, this is Postgres.FM. I'm Nik, Postgres.AI, and Michael, pgMustard. Hi, Michael.
Hello, Nik.
So today we talk about locks, heavy locks, because in Postgres there is also a concept of lightweight locks in some other systems called latches. And if we say lock without heavy, it means heavy by default. Right?
I think so, yeah. I think this would be a very tough discussion or a very shallow discussion if we were trying to cover everything that uses the word lock in Postgres. But yeah, LW locks are the kind of latches, the lightweight locks, right?
So database objects and row level locks. This is what we are going to focus on today. Perfect. Yeah, more like closer to application development, actually, rather than to internals. And this is one of the very popular topics we discuss when we talk with clients and solve problems because many incidents are caused by lack of understanding when people don't realize how locking works, don't go into details there and they have storms of so-called lock contention, right?
Spikes of active sessions, waiting on lock acquisition, and sometimes database can be down, even. This is the problem we can...
Yeah, well, it came up in our recent episode, didn't it? On the 10 dangerous issues.
Oh yeah, exactly. So it's one of the 10 dangers we talked about. And where to start? Where to start? I will start with simple, fundamental idea, which should be written everywhere in bold you know. Lock cannot be released until the very end of transaction. Heavy lock can be released only at commit or rollback time, not earlier.
Yeah, that's, I like that.
Yeah, it's a good place to start. Any lock, regardless of any heavy lock, again, don't touch lightweight locks. Any heavy lock can be released only when transaction ends. This should be...
Once you understand that, the other issues start to become
more complex. Yeah, yeah, yeah. And that's why we don't want, additionally, don't want long-running transactions. And this is regardless of transaction isolation level, right? So it doesn't matter if you acquired a lock to be held until the very end, until commit or rollback. So yeah, even, even very gentle lock, like access share lock can be harmful. Right. So you should be, you should don't, don't hold locks. You don't need to do long. It's like simple advice, but this is like understanding.
It sounds super obvious, super obvious, but I wonder how come these super obvious things become obvious only became in my particular career, they became obvious only in the middle of it. Like maybe first time I heard it from somebody else was maybe like 10 years ago or so, not 20. It should be 20 years ago when I started working with Postgres and databases like deeper.
It's super obvious, but somehow you spend like 10 years of your career thinking you're expert, database expert, not realizing this obvious thing. Maybe feeling it some partially, but not realizing it. Maybe it's not delivered properly in documentation and educational courses or something. It should be delivered.
I think the documentation does lay it out quite nicely, but it doesn't give you practical tips.
As usual, we have no documentation. Lack of practical tips, yes. But this is super important practical tip. Don't expect you will release it earlier. It's super basic, super obvious to, I think, most of our listeners right now. I'm pretty sure like 90% know this, but I wanted to spend some time here because I know new people need it.
Yeah, well, I mean, it's also kind of maybe another case of that, you know, the midwit meme with the kind of the beginner has a certain opinion. As you get more experience in something, you start to think you need to do all these different things. And then as you get more expert, maybe in a lot of cases, you go back to that early opinion.
You start appreciating basics more and more because you also think about other people and you want them to be on the same page with you if you work in teams, for example, right?
Yeah, you value different things perhaps like reliability and not having weird edge case bugs and things. So yeah, actually in terms of where to start, we also have, we had a listener request for this and they asked, and I think they phrased it in an interesting way because I think it's not quite how I'd like to attack the problem. Yeah. They've said, could you explain various kinds of locks Postgres has and what to do to avoid them? You cannot avoid locks.
Yeah, but then what so what is the question? I think it's more around understanding when they're necessary and what to do to reduce the severity. So how heavy a lock or what else is allowed concurrently at the same time. Yeah. So and tips and tricks around that basically.
Right, so we cannot avoid locks. Locks are needed to work in multi-user environment. They are mechanism to handle concurrency, like how many users can work with the same thing at the same time somehow and not interfere with each other to avoid inconsistency or too many errors or something. And you cannot avoid locks. Even if you read something, you're already locking something. And what we want to avoid... So I will tell you what you wanted, right?
So right question is... I hate this term, honestly. I hate this phrase, but this is exactly when it's needed. The right question is how to avoid waiting on lock acquisition. This is what we want to reduce. Again, completely, you cannot get rid of it completely, but you want to reduce it as much as possible because this is when database cannot do work and you just have wasted time.
And eventually, again, like I just said, if you have a lot of active sessions waiting, it can even cause downtime, cause outage, right? So we want to avoid too much of waiting and too much again, remembering our very first episode, what is too much in OLTP context? 100 milliseconds is already quite slow SQL query. If we wait 100 milliseconds, 100, 200 milliseconds, people already will notice because human perception is 200 milliseconds roughly, plus minus, like depending on various people.
I'm like we are not cats. Cats have much better reaction. Imagine if everyone will be a cat. I'm glad we're
not making apps for cats.
Yeah, we would need to optimize much harder. Because they would recognize slowness much earlier. So higher standards. Anyway, this defines how long we can allow ourselves to wait, and also we need to include the actual work time, right? So ideally we should wait very like, okay, close to 0, or not wait at all. If we do things properly, we can almost avoid waiting time. So this is it. This is the right question, how to avoid waiting on lock acquisition. Agreed?
Yes, yes, I like that a lot. And it's not just about how to avoid waiting, it's also techniques to bail out. So I think maybe it's the same thing, but I think lock timeouts, for example, is not just about shortening the wait time, right? It's about bailing out and saying, actually, we don't need to do this right now, let's wait for a better time.
Great point. So you're talking about the same thing, but from different end. It's like, we want to avoid, I mean, we, like causing some queries, transactions, sending them to database, we want to avoid waiting. For example, if we say for select for bed skip locked, we won't be waiting or no wait to have an error sooner. But if we do something, it's good to think about others and don't let them wait on us too much. That's why I walk them out, right? So it's a two-sided problem.
We don't wait and others don't wait too much. I like this. Perfect. This completes the picture.
Yeah. Nice. So in terms of the first part of the question, though, kind of various kinds of locks. Should we start on the table level ones? Do you reckon?
Right, but before we go there, honestly, like, I don't keep this in my mind all the time.
Yeah, good.
Yeah, this baggage, like, I just want, like, this is simple, simple way to think about it. There are exclusive locks and share locks. There are row-level locks and table-level locks, database object-level locks. This is enough for understanding. Then you can use documentation as reference, or there was a great Citus blog post, several actually blog posts, but I especially like the old 1 by Marco Slot, Postgres rocks except when it blocks, understanding locks.
I like it because it translates the table of conflicts the documentation has, it translates it to some more practical form. What kind of operation can be blocked by what kind? So instead of, it shifts language from locks to operations we have, like selects, alter table, and so on. So I don't keep this reference style in my head almost never.
I know something experience gives me, But understanding this, there can be share lock, exclusive lock, row level, table level, it's already good enough, you know?
Yeah, because it makes you think, before I do this, what kind of lock? And then you can look that up, you can look up for the specific command or the specific thing we want to be able to do, what does the documentation say? Or, and this is maybe a good time to bring up, there's a new-ish tool, I think it came out maybe a year or 2 ago, from Hussein Nasser, a friend of the show, called... is it pglocks.org?
Yes. It's a very good attempt to translate what documentation has to some better form to consume with easier I think actually I think actually documentation could could just have it as is it would be great Maybe
that would be awesome. Yeah, but in in addition I actually thought it was a relatively good, like, it's very browsable. And I think it's quite a good educational tool. Like, oh, I didn't realize this could be done at the same time as this.
Or this couldn't be this would take this kind of level of lock so it was quite a nice way of browsing as well I thought for me personally from some commands I hadn't considered like I hadn't really considered for example the locks that for example that like vacuum I had thought about but analyze I hadn't thought about the kind of analyze from a locking perspective.
So it's really interesting seeing that in some of the lists and thinking, oh, that means you couldn't, you know, run 2 at the same time, or some, there were some other interesting things and I had never considered before.
Yeah yeah yeah so you analyze like everything is locking everything yeah even selects are locking and not only like selects if you have planning time by default you have in queries it locks all all indexes as well with share lock, right? Share lock. And we talked about it. This can be bottleneck if you need to run a lot of these queries per second, like thousands of them per second, it can be bottleneck that you need to lock a lot of queries.
There we, this topic has a bridge to lightweight lock discussion, but it starts with heavy locks. So the need to have many heavy locks can lead to lightweight lock contention. So yeah, and this means that understanding what's locking, what's blocking others is great. But again, at a very basic level, if you change something, you need to have exclusive lock. For example, if you update some rows, you need an exclusive lock on these rows.
And this means it will conflict with others who want to change it.
Yeah, so exclusive is you can only have 1 at a time. Exactly. Share means you can have multiple at the same time.
If you read rows, Well, for rows we don't have shared locks, but with table level, if you change table schema, again, it's table level exclusive lock. If you just deal with table, including selects, you need shared lock on that table, right? At table level. So of course, database level, object level locks and row level locks, they are kind of different beasts, right? Different beasts, because their implementation is different and so on. Behavior is different.
But still, the rule can be applied directly. If we have multiple reading operations that don't conflict, and they, by default, they are not blocked by others. But they can be blocked indirectly. And this is my favorite example, when we have long-lasting transaction which just read something from a table, maybe 0 rows, no rows read, like just select from table, limit 0, right? But it already acquired a share lock on this table. Then some other table comes, it waits on us.
And then we have a lot of selects after it coming to this table and they cannot read because they are waiting on this man in the middle, transaction in the middle. So yeah, we have a chain or queue of locking issues and this is a nasty situation which is inevitable if you don't have lock timeout and retries. Right? And this feels like we have a select which is blocking other selects. How come? Okay, because there is an alter table in the middle. That's why. Okay.
But yeah, I like that description because it hammers home a few things we've talked about in the past. Like, this is a reason not to have excessive long running transactions because it can be that 1 blocking or it can get in the way of a misguided migration or table or something. But yeah, then there's...
But it's not enough.
Yes, but on its own it's not enough, because we also need the lock timeout and retries for the migrations. And maybe 1 on their own would actually be fine. If you only implemented 1 or the other for a while, you might be okay. But then
the vacuum comes,
right? And
the vacuum always is blocking, but it has a detection mechanism. It's blocking always, but if it's not running in transaction ID wraparound prevention mode, it will kill itself. But if it is running in transaction ID wraparound it will not kill itself and this is terrible because like you don't have long running transactions but you still have this problem I think having low lock_timeout and retries for DDL is inevitable. It's really needed for everyone.
It's a good point though, is vacuum the only thing that will take a lock and kill itself if something important comes along?
I don't know. I don't remember either. Yeah, maybe, yeah.
Yeah. I mean, it makes sense because it's kind of like background work that can be done at a later point. There isn't much else that fits that bill. But yeah, cool.
Good. So what else I wanted to say, it's it feels like monitoring systems don't care about real problems. They show you this is number of share locks, this is number of exclusive locks, bye bye. Datadog does it, for example. It's super popular, right? And people come to us saying, oh, we have issues with locks, and they show us like big spike of locks, like access share locks. Okay, you had some intensive reading happening, so why should we care about this at all, right?
And so such thing as good lock dashboard doesn't exist yet. I haven't seen it yet. There are good attempts, but the problem also lies in the fact that it's really hard to understand who is waiting on what. So it's really easy to understand, but to reconstruct the whole chain of or tree of blocking events like processes, backends, right? 1 backend can be waiting on other, that 1 can be waiting on another, and so on. It can be a tree.
And actually, it can be a forest of trees, because there might be several roots and you have multiple trees. And in this case, what helps is a function called... Well, let's step back. So, I recommend to everyone to enable log_lock_waits, because by default it's not enabled, it's off. Everyone should have it on. In this case, you will see every time some session waits for 1 second, not being able to acquire a lock. This situation will be logged. Actually, not 1 second.
More precisely, deadlock_timeout, right? Because after deadlock_timeout, some checks are happening. This is exactly this logging can happen in Postgres.
Yet another parameter that's being used for 2 things.
Yeah, well it's indirect, it's not straightforward, and it is what it is, right. But, and also some people change it, sometimes it's 2 seconds or 5 seconds. Some people try to postpone that lock detection, thinking maybe it will be resolved.
Resolve itself.
Not resolve itself.
Well, but the problem is you might want different settings.
I would tune it in a different direction, resolve earlier maybe. It's an interesting topic, right? So maybe we should talk about deadlocks in a few minutes. So yeah, I wanted to like, speaking of observability, how to deal with locks, it's really not easy because you enable this, It's already something, it's great. You see process ID of victim, let's use this word, why not?
And- Waiter.
Yeah, offenders. Sometimes maybe multiple, right? And then you see, since it's your session who is victim, well, not your session, but it's the session for which this analysis was performed, so we see the text of the query for the session who is waiting, but we don't see details for transactions. There might be actually an idle transaction. There might be no query, although in pg_stat_activity we would see the last query executed in that transaction.
It would probably help, but sometimes it's active state, state equals active in participant activity, and if we were able to see the query, it would help. Unfortunately, in logging, you cannot. This is, this is like a little bit annoying because if you need to do post-mortem analysis like root cause analysis for something in recent past you see only 1 side query but you don't see another side and it takes time to understand like actually it's impossible sometimes to understand what caused it.
So, and I remember we even implemented our own mechanism to log additional information in such cases. So using just raise notice in PL/pgSQL, but it's not fun, honestly, to do this. And in some cases, it's reasonable to do this if you, for example, have very strong tooling for log analysis, like enterprise level. In this case, I would do more logging using PL/pgSQL, for example, and let the tool visualize the number of errors we have and provide details and maybe notifications and so on.
So in this case, the question is how to implement good monitoring for this. And good monitoring for analysis, we have a query, right? Among our how-tos I wrote, there is a huge query which had big evolution. I took it from some people, they took it from other people. So some evolution happened there. And it's great. I think 100 plus lines of query, which provides all the details. And you see the whole forest of trees, including all states, weight events, sometimes it matters, and also queries.
The only 1 problem with this, because it requires pg blocking pids call. This function gives you the whole list of process IDs which is blocking this process ID. And the problem with it, it's quite like sometimes expensive. It's not for free.
So limit yourself with some statement timeout, not to cause observer effect because when there is a huge acute spike or not acute just spike of some like storm of lock acquisition weight events in this case this function can cause additional trouble sometimes.
So is the, is, does that mean that the times it would be most useful is the most likely for it to time out?
Well, yeah, yeah. In general, in most cases, I, I warned about this, but it's not, It doesn't mean that it always happens. It happens rarely, this thing. But it's quite wise to limit yourself, I don't know, like half a second, maybe up to 1 second.
Okay, yeah, that's
quite. And not to call it too frequently, like not every 100 milliseconds, like I do sometimes with some queries, not with this. Yeah. During observing something, like manually, like I just see like almost animated state of some results. But in this case, it's better to do it less frequently with statement timeout. But once you do this, you bring this to monitoring. It's beautiful. It's great.
Yeah. With monitoring, it feels like sometimes we want to catch every example of an issue for some types of issue. But sometimes it's OK if we're just sampling and if we catch that sometimes it's an issue, that's a sign that there's a problem in our code or a problem in our team and that we don't have the education around these types of problems.
And I understand that this might be 1 of those ones where you do want to catch as many of them as you can, but I think it's more important that we realize that we even have any migrations that don't have this, or if we are doing updates in an order or transactions in an order that could end up with deadlocks or complex locking trees. I don't
know, I would prefer to have exact information here in the void sample, but I live in a world where max connections doesn't equal 10,000. If you're on RDS you can have like multiple thousand max connections, multiple thousand. In this case, you have a storm of active sessions, many thousands of active sessions. And then they like to, of course, in this case, it's a different situation. But for me, it's not a healthy situation.
I would prefer to have like we have number of cores multiplied maximum by 10, maximum. This should be your absolute maximum for max connections. In this case you cannot have too many records to analyze, right? And in this case it's good to have exact information because if you start sampling you might miss something because 1 line can matter a lot in this analysis, you know? Yeah. Yeah. So of course You have like, I don't know, 96 cores, your max connections should be 400.
In the worst case, we will have 400 backends to analyze. It's a lot, but It will be only during incident. Hopefully it's not every day. If you have incident every day, you know what to do. Ask people who can help, right? Anyway, 5,000, I agree, like 5, 000 of same thing, actually. They all are waiting on the same thing. Why should we log them all? Or why should we bring them all to monitoring? Of course, it doesn't make sense. But if it's just a few hundreds, I don't know. Should be fine.
Should be fine. Again, under normal circumstances with 96 cores, you should have up to, I'd say, 50 active sessions. And most of them are not waiting, so these analysis won't bring anything. And if your system is healthy, in most cases, this query will produce empty results. Empty. That's great. So normal state is 0 rows in the result set.
Well, I guess normal state could still be, like, normal state is still up to a few rows, but the main point is that they shouldn't have been waiting for too long. Is that fair?
Yeah, yeah, yeah. Well, you can wait a little bit, but sometimes you have spikes, some small spikes, but in healthy systems, we just see that 000, even if the system is very heavily loaded. And then some small spike and then 000, like this. Yeah. So, yeah, also wanted to mention that, again, connection to indirect connection to lightweight locks. So wait event when some backend is waiting on another to acquire a lock, heavy lock, you will see wait event equals lock, right?
Or wait event type lock. So this means we are waiting. And again, lock means heavy lock. What else?
Well, is it a good time to go to, I mean, you mentioned that these lock trees could get complex, but they could also be loops, right? In the case of deadlocks, is that is?
Oh, that's interesting. Yeah. Yeah. So it's like the only solution to deadlock somebody should die.
Well, yeah. But now I'm talking more about prevention, right? Oh, yeah. I know it can get complex, but I feel like most cases of deadlocks I've heard about have been what I would class as kind of poor design on the application side.
Exactly.
So very avoidable with the right consideration around locking and around ordering of what you do in which order.
Yeah, actually, let me correct myself a few phrases ago. I said wait event, it should be wait event type equals lock and there are several additional wait events in this category or under this type, which can help understand at which level, for example, we have a lock we are trying to acquire. Is it relation or row-level tuple or something else, like maybe advisory locks also. Yeah. You remember this? Like user-defined mechanism. So yeah.
And speaking of deadlocks, it's always Here exactly we can apply this anti-methodology by Brendan Gregg. Blame somebody else. It's always a problem with, unfortunately, it's always a problem with applications, with workloads, maybe users. So it's not a database problem. So the sequence of events is not good, and it should be changed. It should be redesigned so people don't conflict in this way.
And in many cases, I remember dealing with deadlocks was really challenging because if you if it's not you who wrote this code of And you need to explain they need to change in many cases. It's easier like to just to accept the loss You know if it's just a couple of deadlocks per day, it's okay. If you have a heavily loaded system, it's not that bad.
Well, and then often the solution is to make sure that transactions can at least be retried, right?
Because they- Definitely so, 100%. Of course, there are cases where having deadlock is a serious problem, because if money is involved, then it's a problem, because you can have some issues. And definitely, application... It's a right point. Maybe instead of trying to completely avoid and prevent deadlocks, maybe it's better to design and, okay, we have a couple of deadlocks among...
We have a billion transactions per day with 2 deadlocks, but we have retry logic, so nothing is lost and no users are complaining. That's it. It's smart. So, yeah, I agree with this.
Nice. Do you want to touch on advisory locks quickly? Or not really?
Oh, yes. We had a case recently. It was an interesting case. Obviously engineers were very smart engineers and they also read documentation a lot and so on.
Nice.
And that like quite advanced code is written and so on, but they had issues with heavy lock contention. And turned out it was storms of advisory lock acquisition wait events and we just talked through this and my advice was quite simple. Let's get rid of advisory locks because we have data in table, this data is organized as rows, we can lock rows.
And we talked about reasons why we should lock, and it was like, there are reasons originally, but when we just think, could we switch to regular row level locks? And the question was, yes. There's no big reason not to use it. And once we switch, everything becomes quite clear how to resolve contention completely. You just start, SELECT FOR UPDATE. Before you update or delete row, you just SELECT FOR UPDATE and then SKIP LOCKED or NOWAIT.
Plus retry, so if you want to fail immediately and then retry, it's NOWAIT. If you want to, depending on application, depending on logic and data and so on, sometimes you can just take next batch of rows to process. In this case, SKIP LOCKED. If you must work on the same batch, okay, come later. In this case, NOWAIT and just retry. And that's it, it's an easy solution.
Well, advisory locks can still be helpful in some cases, but working with data in general, I think it's some mechanism which feels like kind of extra and not needed. Right, if you just need to process some data in many parallel sessions, just work with row level locks.
Yeah, I feel like it's a sledgehammer and sometimes sometimes you need a sledgehammer, right? Yeah, if you think about it, we have in the UK, we keep our garden tools in a shed. Normally, not many people have a sledgehammer in their shed, like you don't need it that often, right?
I agree.
Yeah, well anyway, but the point is, every now and again if you're doing like a remodeling and you want to get rid of a wall, maybe you need to bring in the big guns and actually do something more heavy.
I use this word, I apply this word, maybe you remember, to materialized views in the past. This definitely feels like, just you solve your problem with performance but it's so like massive it always needs to be refreshed fully. So same feeling, like it's a tool, it's reasonable in some cases, but if you overuse it, you cannot do precise work like you lose some precision, right? So maybe, maybe.
Well, and going back to the original question is what can we do to avoid these locks? And we were saying well actually what can we do to avoid long waits? And what can we do to make locks lighter in general or take less time or retry when needed. This is a perfect example. Is there a way of taking less extreme locks or releasing them quicker? Or, you know, so it feels like another example of how to minimize locking or at least shorten its impact.
Yeah, so yeah short transactions don't hold locks too long. If you wait give up, give up sooner, don't wait too much because others can wait behind you. What else? Use skip locked or no wait. Select for update, skip locked, no wait. This is major mechanism. Finally MySQL even has it, right? So this is great to have it and use it
and I guess the more obvious
with only 1 comment for select for update and select for share select for share can lead to multixact SLRU issues. I barely remember already, didn't touch it for a while, but there is an interesting effect with select for update. If you have foreign keys, select for update, and you use sub-transactions, you can have issues with multixact IDs somehow indirectly and unexpectedly. So select for Update can feel like differently if you have sub-transactions.
So, again, like I'm a big fan of sub-transactions, so to speak. I would not use Select or Update and sub-transactions at the same time. This can be a problem, as I remember. But in general, select for update is great. In general.
And up to, I mean, you're talking mostly about very heavily loaded projects. And this problem of locking is a problem even, in my opinion, is a problem way before that. It can be a problem in much, much, much smaller projects.
I agree. It can be, even in a small project, it can be very annoying to have contention on heavy locks. Right.
Well and can cause downtime still.
Mm-hmm. Yeah. 100%.
Yeah. So yeah I think that's great. Anything else you wanted to make sure we covered?
No, just that's it. Like no long transactions, no long waits and you'll be fine.
Yeah. Actually I had 1, I had 1 more but I think it's almost at the risk of being too obvious. I think we had a whole episode actually on 0 downtime migrations, but there are schema changes you can make in different ways to avoid the heaviest of lock types. As simple as, you know, the create index concurrently type things, or we had a whole episode on pg_squeeze and things. I feel like that's another topic of avoiding a certain type of lock in favor of different types of lock.
You can avoid lock contention issues doing DDL but you need to sacrifice transactional capability.
Yes, yes.
This is the key, unfortunately. So you cannot have transactional and like atomic steps and 0 risk of having a lock contention unfortunately at the same time so yeah and creating this concurrently is a great example here or attaching detaching partitions concurrently and so on yeah good okay
