Hello, hello, this is PostgresFM, podcast about PostgreSQL. Do I pronounce it right, Michael?
I think so.
Yeah. And hi, Michael. Michael from pgMustard.
Hello, Nikolay.
Yeah, and I'm Nikolay, Postgres.AI. And let's... We don't have guests today, unfortunately. We have good guests coming soon, but not this time. My idea was, you know how much I don't like to deal with security. I just need to do it. Let's talk about policies, create policy. Why is it create policy, by the way? It's not create RLS policy, it's just create policy. Policy is a very general word, right? It can be not only row level security, maybe some other policies.
Do you think the decision was made to... I think this is not standard. I'm not sure because I just checked with some AI and it says SQL standard doesn't have RLS.
I looked up where did the name come from because it wasn't the title of the page in the Postgres documentation. And the time it's used is in the alter table statement. So this is a table level setting that you first enable. And only once you've enabled it on the table, then you set policies. So it's alter table, and then enable row level security, and then once that's enabled, you need to create at least 1 policy so that people, or non-superusers, can read anything.
Right, but still, it confused me from the very beginning. I think it was implemented originally very long ago, right, before Postgres 10 or after, I don't remember.
I didn't actually check this time.
Yeah, but I think it's quite old feature relatively. It definitely wasn't in Postgres 9.0, but possibly it's already present in...
I found it in 9.5.
9.5, yeah. It means 2015 or something, right? So let's talk about the problem it's solving or tries to solve. Problem is in the era of SaaS, software as a service, We have a lot of cases when 1 system running in cloud operates with many, many different users which ideally should not put at risk each other or something. There should be some segregation. For example, if we have, say, like Twitter style, right? We have users, we have... No, Twitter is not good.
Let's think about something more business-oriented. It will be more...
Or e-commerce, being able to see somebody else's orders or being able to see their address or something like that.
Let's find some B2B example because this is where it feels much more sound, so to speak. So, for example, let's take Slack and we try to implement Slack, right? So we have workspaces, many of them, and channels inside and people communicate, but each workspace belongs to some customer, some organization which uses our service. And it's SaaS, right? It's in cloud.
And is it a good idea to keep all messages, for example, in a single, maybe partition, but we forget about performance for a while, in a single table called like messages for all customers? Maybe it's a good idea, maybe not, right? But from security point of view, it's terrible idea because if selecting our messages, we have some bug or some kind of hacker attack, we can select other organization messages, right? And that's bad.
So, or maybe just developer forgot to put some condition, again, a bug or something like unintentional. And suddenly 1 organization deals with messages from other organizations. It sounds super scary, right? For platform development, I mean for the SaaS service developers, it's bad. So my point is, some SaaS systems still have this situation. Old ones definitely have this situation unresolved. And
most. I think
it
might be most, yeah.
Yeah, ultimate, so it's called multi-tenant system, right? Ultimate way is like you create a database or maybe it's like whole cluster for each, at least big customer and that's it, right? And full segregation, full separation of data, and that's great, but it's super expensive. Even if you put it in the same database, in the same schema, but in different tables, it's also quite expensive. Although we know Postgres can live with millions of tables.
I can send you a new Presentation we just recently discussed it wasn't probably Presented in Brazil. I don't remember again like hundred million tables. Is it possible? Well, it's possible, but it's difficult. So the idea of RLS comes from this problem. We want to have guaranteed protection that 1 customer sees only their data, not other customers' data. And it can be achieved by defining special rule that additional condition will be always present in selects right or updates deletes inserts
yeah and I think I think would you say it's fair that most, when I said most don't have this, I mean most don't have it at the database level. I think most people put a lot of effort into making sure this couldn't happen at the application level, but most don't also add an additional level, like layer of defense at the database level. And this is a database level, so any application sitting on top of it, or any client sitting on top of it, gets that by default.
Yeah, that's why I always say we cannot trust the application layer, because what if next year your organization decides to try a new framework or something and you have 2 applications and they behave differently or you need to deal with implementing the same rules in 2 systems and basically copy-pasting. Sometimes in different languages it's super hard to maintain. What if you have some specific interfaces connected to database and people just work with this data directly, right?
They bypass your application logic completely. This is good to have inside the database, closer to data, and applied in the same manner to all. I wanted to say this problem can be solved using views. So you can, for example, have a view which will deal only with data dynamically for a particular customer only, not seeing anything else. You just put something to the rule definition, which will involve some current setting again.
And in the current setting you have your customer ID, for example, and you say, okay, this view is selecting rows from messages table, where, and we have the check that customer, like project ID, organization ID, doesn't matter, it belongs to this customer ID, that's it. Always. It can be joined maybe. And in this case, this view can be even writable, so inserts, deletes, updates could work. And this check is automatically verified. Although maybe it's not verified at insert time.
I don't remember exactly. I think there is some caveat here. But if we think about only selects at least, it's definitely working. If you make your application work only with this view, not with the original underlying table, in this case you can be 100% sure that they will see only data they can see, not anything else. This additional filter will be always—planner will be always using it. But this is not RLS, it's an alternative approach. And I think inserts will be a problem.
I don't remember 100%, but I remember I was, I got bitten by this problem a few times in my experience, and we needed to implement triggers to double-check that inserts are protected and the same rule is applied the third time. So maybe this is like downside of this approach. But create policy statement is more like natural way to protect data for these purposes, right? So we define a policy and we alter a table saying that row level security is enabled for this table.
By default, it's not, right?
Well, yeah, in Postgres, yeah.
Yeah, and then this policy is supposed to protect us in terms of multi-tenant situations, so customers deal only with the data they can deal with, they don't see each other. Do you think it's a good protection for multi-tenant situation when we keep everything in 1 table?
I have mixed feelings about role of security. I think this is a security question in general. How much are the downsides worth it? How much is the extra complexity worth it for the additional security and yeah I think in environments where security is of paramount importance it makes sense to add security at every level including this
1 this 1 I mean you mean database level
yeah I don't see Postgres very level security in use that much so most people seem to be making the trade-off of not using it in favor well not in favor of I suspect most people using row level Security also have other measures in place. But I've seen a growing number of performance related questions around Rail-Level Security in various communities. And I think it's due to the popularity of PostgREST and the automatic creating.
Also, my theory is because people are creating these RESTful APIs directly interfacing with the database, it becomes super important to have security at the database level because people can change the parameters of the URL. And otherwise, they could just view other people's data immediately. So I think it makes sense that it's growing in popularity. But I don't think it's been growing in popularity as an additional layer. I think it's been growing in popularity as the only layer.
But I could be wrong.
Yeah, that's interesting, because exactly this alternative approach, which I explained using views, it's coming. Actually, my bachelor thesis was about updatable views, and master thesis was about updatable XML views. So it was like 20 years plus ago. But Postgres particularly provokes you to use views over tables and views. You just create a view in, say, by default, schema v1, and this is your v1 version 1 API. By creating views and controlling who can see what, you can already limit access.
You can hide some columns, just not taking them to views, and so on. And that's great, but maybe you are right, because I did see some popularity of low-level security growing from that side. Because people need more and more and more logic on database side, which is related to data, because there is no middleware. It's only this high-skill application called Postgres. And all the data-related logic is supposed to be in database, including everything related to permissions.
And all UI logic is on FAT client, right? It's like React or something. And, Yeah, I see your point. Exactly. If you have middleware, Ruby, Python, Java, anything, then people tend to implement all the security checks right there, right? And maintain them in a unified way and so on. Yeah, I agree. There's no application, middleware application here. So it goes to database or where else, right? Because it cannot go to UI. But interesting, again, views are there as well, but it's not enough.
It doesn't feel enough. And sometimes we need to deal with data from multiple clients. Sometimes, it depends, right? Because, I mean, admin mode or something, right? Yeah, in this case, ROLL security provides this bypass RLS flag for when you create a role or user, you can specify this flag, it's kind of super user, not super user, but it bypasses these security checks, right? Yeah, I agree.
And then, but my point stands as well, like old SaaS systems, they tend to not using RLS, but there are many other fresh, like last few years, a lot of SaaS systems are created, have been created, and still created, and also with AI, it's also very much standard situation sometimes think about charge BT as well It's also like kind of many messages from these chats. It can be also single table and it requires some security as well, right?
And my point is that new SaaS systems, they tend to have it even without Postgres or Hasura or something like that. Even if they use regular Django or Java or anything, they sometimes, I just observed it, they sometimes implement RLS because they think, okay, we need better protection for our SaaS system. Multi-tenant system.
Cool, I didn't realize that, that's great.
Yeah, I just see it, and then they grow and start hitting performance issues. And this is natural, And it looks like the first, let's talk about performance, if you don't mind.
No, I think the most interesting part of this
is performance. And the first thing they bump into sometimes, not everyone, is that having some condition present in the RLS definition, this filtering, doesn't mean that the planner will see it. You can have an index on a column. For example, it's a simple condition. Some column equals some constant. Forget about current setting or some stable functions for a while. Some constant. And we have an index on this column, but we don't have this condition used in the where clause.
This will lead to sequential scan, right? Because planner doesn't see what row level security is going to, which filter is going to apply, right? And this is surprise.
Yeah, sequential scan, but like with a filter so each 1 will be checked against the policy
Just use it
You might even yes, even if you have an index on that. So yeah, that's super unintuitive, I think, for people. And it tripped me up when I was reading about this. I thought, I wondered if it could use an index on a simple policy, on a column, like a user ID or something like that. And I even saw guides, in fact, there's a really good guide on row level security by the team at Supabase who heavily encourage the use of row level security as part of their offering because they're using Postgres.
But yeah, they say you can think of policies as adding a where clause to every query. But that tripped me up When it came to performance because it's not like a where clause from a performance perspective.
It's like a where clause from like a user experience perspective like from a from what you see at the end of the Data coming back might might look the same as if you'd feel today using a where clause But performance like the planner doesn't have access to that to choose to do a different type of scan at the beginning. So yeah, super important distinction and 1 of their tips on the performance side is therefore to always include all the filters you want anyway in the queries.
Yeah Yeah, that's interesting. But it's easy to solve, I think. Just add it there, into the where class, and that's it. Let's talk about, step back maybe a little bit about not performance, because I will forget about this, but I wanted to share some feature stuff. First of all, I saw the case that if we have foreign keys and so on, referential integrity checks are not covered, right? This is like, like, the integrity is more important than role level security, right?
Did you see any interesting cases related to this?
Well, I only read about, I thought it was a really fascinating concept that by enforcing referential integrity you could be leaking security information like if you...
Or inserting something which is not...
Yeah exactly if you're 1 user trying to insert something and you find out that it that it fails to be inserted then you know that thing exists but from some other user.
Insert is a bad example, because foreign key doesn't help with insert. It helps with, for example, deletes, right? Like, SK deletes. So you, for example, can delete what you cannot see or something like that.
Yeah, Yeah, sorry, good point. Maybe it's not relevant then, but I did read in the documentation that you need to be careful with leaking information by checking...
Ah, I see how. For example, if you try to delete something without cascade and it says it cannot be deleted because there is a row in different table which we reference to, it reveals the presence of this row, but maybe you are not supposed to see it. Yeah, you like, it's not your role. It's somebody else's role. Something like this, interesting. Yeah, and then like from error, you can conclude that there is some data there, right? Maybe you can understand the values. Yeah, it's interesting.
But I never saw such situations in production. Another thing I had experience with is that pg_dump. Yeah, you mentioned also before we started recording that it's bad for backups.
No, it's not bad, but you just have to test your backups.
But as I said, the dumps are not backups. I'm still on this idea that by default backup is physical backup. If we say logical backups, okay. It's good, it's bad for, I mean not bad, exactly. It can be tricky and surprising for logical backups. For example, you dump your table but you see only rows you can see. Surprise.
Yeah, well, exactly. I was just imagining using a service like Supabase and wanting to keep a semi-regular dump of the data somewhere else just for my own peace of mind.
No errors, just lack of a lot of data.
It was more that if you're not super careful about which user you're using, like making sure that's from a super user or some role that can see all the data, it'd be very easy to think I've got like I've backed this up and then if you actually ever need it you're in real trouble you've lost a lot of data.
Yeah you can have it unnoticed.
Yeah a credit to the Postgres documentation They called that out as something to be aware of.
In the same area of dumps, a pg_dump doesn't have skip policies, or pg_restore doesn't have skip policies flag, as I remember. And this was a problem because some users of our Database Lab Engine, DBLab Engine, they... This is how actually I noticed that more and more people start using row level security. I just like They start asking questions, okay, we have some low-level security here, but we restore it under, like in this non-production environment for dev test activities.
And we don't need row level security at all here, right? How to skip it? We just want to check various stuff. And pg_restore has a lot of flags, to skip this, to skip that, many things. But as for policies, skip policies, it's lacking.
And We needed to deal with feature supports, hyphen l, small l, lowercase l and uppercase L, to have a list of objects present in the dump, and then just have filter added, like it's a dictionary of everything that was dumped, and we just removed policies and then restored without them. And we automated this in the Database Lab Engine. So basically, the idea is, if I had a little bit more time, I would probably code it and pg_restore could have it.
It feels like some good addition for future Postgres versions.
I understand the attraction, but I'm also kind of against it. Like, partly because we're about to discuss performance issues, right? I'd want to catch those earlier. I'd want to know. I'd probably want to still have row level security on in my dev environment.
Good point, actually. Yeah. Removing policies, this is what we didn't think about. We just received idea, let's remove it, and we implemented it. Now I think, oh, indeed, you're right. You can... You don't see the whole... Not plan, but all the problems that policies can bring and we will discuss in a minute Maybe 1 of the biggest right about stable function and so on Yeah, so it's a good point But it means also that if you want to have policies, you want to reconstruct your users.
Well, this is a good idea as well. So I have 2 do ideas, 2 to do ideas. Anyway, skip policies is valid flag for pg_dump. In some cases, it's up to the user, right? It should be possible to decide. Because it supports the removal of skipping of many other things. True. Like permissions, for example, ACL, right? It's booming, it's skipped. But ownership, but RLS now.
But I have to do for Database Lab Engine as well, like we should find a path to simplify restoration of users and to preserve all policies so we see performance. I mean, this is 1 of the purposes of the Database Lab Engine, to experiment with plans, right? And see how indexes help or not help, and so on. Yeah. Good. Let's talk about performance, back to performance. So we discussed that you must put this filter from policies.
You must duplicate filtering in the where clause to achieve good performance. Another thing is that it's natural to use function current_setting, for example, or something like this. Some function which current_setting is just getting the value of a variable. It can be standard GUC, grand unified configuration, or GUC variables, right? Let's call them variables. Or it can be user defined variable. It has to have namespace or blah, blah, dot blah, blah, right?
And you can use in set or set_config function. You can set it to something. And then you can use it, you can access it either using show, but you cannot use show inside SQL query. To access it inside SQL query, you can use function current_setting. And naturally for Postgres, Postgres, and I think for Hasura as well, many things are coming inside these functions already preset by this thin layer which provides API. For example, some headers are there.
And in headers, and also like processing of JWT tokens, basically, authentication, authorization is done through this mechanism. And it's natural for you to include it into queries and say, okay, current_setting and blah, blah, blah, and we see which user is working right now. And Again, it's natural to put the current setting into definition of your level security policies.
And you say, okay, I don't like customer ID equals what you have in some headers or something coming from JWT token, anywhere. And you just use current setting to identify your user and Apply the security rule to filter out on the rows. This user can see only right So you have current setting used in row level security And here we come to 2 problems. First problem is 2 separate problems, actually, but they are connected.
First big problem is that if you have select count dealing with many, many rows, like a million, 10 million, or it may be another aggregate. We know Postgres has slow count, right? But if you have row level security, slow count is order of magnitude slower, even worse, because row level security check is applied to each row. This is terrible.
We already have very slow aggregates, which makes you think even more about something like materialized views or continuous aggregates in TimescaleDB or some other stuff, like other kind of pre-calculation, like denormalization with pre-calculation. I don't know, like something like this. So not providing actual values in real time, because It's a huge penalty, additional check, even if it's just a simple check with constants and that's it. This is the number 1 problem.
It feels terrible and I don't have a solution. We don't have good solutions, actually, honestly, to slow count in Postgres because it's a row store. It's really expensive to calculate the sum of 100 million values. But now we all additionally need to check every row. Can we deal with it? Can we deal with it? Oh, it's not our row. Wow, it's super slow.
But additionally, another layer of penalty comes if you use a function which is not immutable, which is stable, for example, or volatile, which is worse. But current setting is stable function. It means that Postgres also, even with its current setting and just some variable which is just set and that's it, it will call it for each row. Boom, boom, boom. Right? And this is insanely slow already. Insanely slow. So people can think Postgres has slow count.
People with RLS can think, oh, it's very slow count. People with RLS and current setting inside RLS think, okay, it's terribly slow. Absolutely terribly slow, right? Yeah, but for this last problem with current_setting, there is a few ways to avoid this behavior. I like the way just to put it inside parentheses and write SELECT word. So Postgres Planner moves current_setting call into, it's called init plan, right? Another node in the plan, and it will be called once, and then just used.
That's great. So this is easy solution. Just surround by parentheses and... Just surrounded by parentheses won't solve it. You need the word SELECT. So it becomes sub SELECT that goes to a new plan. This is also interesting behavior. Also quite surprising to some folks. But anyway, there is some solution here. However, problem like we need to check every row in real time, it cannot go away. This is actually the number 1, maybe the only 1 reason I think RLS, row level security is painful.
Like how to solve it?
I was reading the, there's a really good gist put together by I think 1 of the team at Supabase and or at least they reference it in their docs. And they do mention a kind of a trick where if as long as the function isn't leaking any information, you could bypass RLS. So you can, I think it's called security definer? You can set it up in a way that the function can then bypass row level security. But yeah,
I- But it feels like we're turning off. Why? I can say, OK, let's run all massive aggregate calls, like counts, sums, anything. Let's just run them using a User which bypasses RLS. That's it. If you want to present some big count somewhere, just do this. But if it's global it makes sense, I would say okay here we need to count everything bypass RLS, we would do it anyway. But if it's inside single organization it feels like breaking a hole in our wall we just built.
I don't understand it enough to understand why that might be okay and I think you would have to be careful about not leaking information but because of that first tip where we're still providing all the where clauses we need to only be accessing that organization at the query level. We've still got 1 level of security there. So I can see why it's attractive, but that's the only solution I saw to avoid doing it on every, to avoid that check.
It's logical problem. If we must definitely check, and we do it after the player already did everything, and basically after the last stage of execution, right? It's going to be super expensive if you have a lot of rows before producing the final result. But I feel kind of like Postgres could say, okay, we have it in the workloads, we already applied this rule, let's skip it. This could be some optimization, but it needs to be done inside Postgres somehow. I'm not sure if it was discussed.
Well, like all planner things, it would work, even if it was implemented for simple conditions, it wouldn't work for some more complex. There would be a never-ending list of quick, because these policies are so flexible. That's 1 of their strengths, right? You can set pretty much anything as a policy. You can do it at the operation level.
You can do it at a role level, you do so many different levels, and then you can pretty much, I think you can use any SQL query to set up exactly the policy you want.
Yeah.
Then you've got a whole new planning issue, like the planners are already pretty complicated, So I get why they haven't.
I agree. For example, you know, in the check constraints, you cannot refer to other tables. They are simple. But here in these policies, like rules, we can do whatever we want. This means it's super, like what I just proposed, probably it's impossible to implement, right? If it was just this table, no reference to other table, Maybe current setting can be used and the seat maybe there would be a way to Connect to the planner and see okay.
It's already applied this rule that we can skip it Skip it in a safe manner 100% safe man reliable manner, but if we can refer to any table write anything there it's impossible
or much more
so many like predicates like yeah so I agree with you here
there's for people that want that I will share I'll share a link to that gist there's also a really good talk by Paul Copplestone, CEO of Supabase from the last POSETTE conference I can share as well does a beginner's guide to RLS but also covers some quite in-depth manages to in about 20 minutes cover some quite in-depth performance tips I think most of which we've covered now, but it's visual and you can see him going through them, which is nice.
Yeah. Well, I'm quite sure, like Supabase and other systems which work with Postgres or Hasura or something, these guys are at the frontier of RLS usage, right? And for sure a lot of experience is in the discussions there, which is great. But I just see, still I feel like we already had bad Aggregates you you made them worse Starting starting using RLS. So yeah So By the
way, I had a question for you. Did you know how many times we've had this topic requested for the podcast?
I have no idea.
4 times. 4 different people have requested something. That's by far the most we've ever had for a single topic.
Okay, in this case we should have a lot of feedback, I guess. Well,
yeah. There are also a couple of people asking about, like, when asked for more information, like what exactly they wanted us to talk about. They asked about considerations around performance, like how to then diagnose performance issues, but I don't think there's anything unique about it, right? Like mostly it's still... Whoa, it's tricky,
it's tricky, yeah. My, like, general rule is just to compare with and without RLS For example, you can use user spaces. Yeah. Yeah, because it can be tricky I of course you like filter and so on but I feel like in the plans or maybe I'm missing something but I feel in the plans where like we can start guessing how much time do we spend on an RLS check.
I well I think I think personally if you're if you've already gone with RLS if you've already decided that and you're well into your project, I think the RLS checks don't matter that much. It's more indexing. I still see more people with indexing issues than RLS performance issues. They look like RLS issues because people might think that... I agree,
I agree. But I'm talking about this massive count, for example, like, you have a count on them and how much of that time, like, how much, how big is the overhead from our list? It's quite tricky without like comparing 2 cases. I wish the EXPLAIN ANALYZE show this. By the way, We just have BUFFERS by default committed, right? We forgot to celebrate a little bit. Maybe it should be a separate episode about this. I don't
know. I think I want to wait until Postgres 18 to celebrate that, or at least far into the future to release candidates.
I cannot believe this. It's super great, right?
Yeah, a couple of things people to thank on that front. I think David Rowley did a really good job and also Guillaume Lellouche did a lot of the heavy lifting as well. So those 2 in particular, there was a really good conversation on the mailing list that I think you kicked off actually.
I didn't expect at this time it will go through. I was discussing things near this topic, right? But I'm happy to see it and I guess I will need to find another topic to be upset with. Because it was more than 2 years I was talking about how I'm upset that BUFFERS are not default. And it's going to be over. I feel happy and sad at the same time Another thing was UUID version 7 finally Right, and you know what like my perception was we were waiting until full finalization of RFC, right?
Oh, I forgot to say too.
Yeah, because everyone...
Is it not being finalized?
It was developed before Postgres 17 released and then the decision was made not to take it into 17 because RFC is not fully finalized. At the same time, everyone already started releasing UUID support like Node.js, Go, like Google decided to release it, everyone already, but Postgres decided to stay conservative. The thing is that, unless I'm fully mistaken, I see RFC is not fully finalized yet.
Amazing. Still. So maybe it will be before 18's released?
Andrey told me not to shout about this.
Probably sensible. Andrey and I have similar philosophies then, wait until it's actually released before celebrating.
Yeah I bet we can talk about these things here because hackers don't listen to us right so.
I think a couple days.
They won't go and say let's revert it I hope. Anyway congrats to Andrey because this is who coded it, This is him who coded this during our Postgres TV live coding, hacking sessions with Kirk and Andrey. So yeah, this week was, to commit, surprised me.
Both things I'm like very passionate about, like buffers by default in Explain, Analyze, and UID version 7, I think, again, like people who listen to us, don't wait until the release of those functions, UUIDv7, start using it right now because it's just, basically, it's synthetic sugar. Not fully, but you can start using UUID data type and insert UUID version 7 generated on application side, or using some SQL function, PGSQL function. I have examples in my how-tos.
And when this feature becomes native, switch to native. But it's hard to overestimate the importance of this support because you know I can convince a couple of guys to start using right now. Some guys can guess but 90% will follow default path right?
Same with buffers.
Exactly.
People can start using it now, there's no reason they can't use it now, it's just more will once it's there.
We will finally analyze plans with buffers always and let's like to wrap it up, when you see overhead from RLS, it's not about buffers, usually. Unless it's a lack of index. But if this is like current setting, we talked about current setting on each row, it's called. Buffers are like tiny, but timing is awful. And yeah, this is 1 of the rare cases where buffers... Well, buffers don't lie, right? They don't lie. They show there is no I/O Here. And it's already useful information. But...
But you need both.
You need both, right. And you know, sometimes I say, most of the time we optimize targeting buffers. In this case, we don't because it's just not about I/O at all. But seeing that I/O is low, it already gives us idea. Maybe it's RLS, right.
Nice 1, Nikolay. Well, thanks again and catch you soon.
See you later, bye bye, thank you.
