Hello, hello, this is Postgres.FM. I'm Nikolay, Postgres.AI. As usual, my co-host is Michael, pgMustard. Hi, Michael. How was your week?
Hi, Nikolay. I'm good, thank you. How was yours?
Perfect. Very active and a lot of stuff is happening. So we needed to miss last week because I had even more stuff last week, but I'm happy we continue, right? We don't stop.
Oh, yeah.
Yeah. I remember in the beginning I was always against skipping any week because for me it would be a sign that we probably stop Which I don't want so Yeah, right now. I'm already we already proved that like during couple of years we a Couple of years almost how many years?
3 maybe
almost 3 It's like this this July it will be 3 years and I already proved to myself, we proved to ourselves that if we skip 1 or 2 weeks it's not game over.
Yeah, this is me as the European convincing you it's okay to have a week off every now
and again. Yeah, exactly, exactly. Okay, if we stop, that's it. I don't want it. Yeah. Good. And today, this was my choice and the topic is like, it's less technical, but although we will talk about technical stuff as well, and topic is how Managed Postgres services, how they help us or don't help us. Customers, I mean, I'm in a different situation probably, but of course, sometimes I'm just a customer or I'm on customer side.
And there's a problem when the fact that we cannot have access to the cluster, and we have some issue, there's a whole big class of problems how to deal with it. And maybe we should create some best practices how to deal with support engineers from RDS, Cloud SQL, I don't know, and all others, right? And let me start from this. I learned important lesson, I think in 2015, 16, when I first tried RDS. I liked it a lot because of the ability to experiment a lot.
Before cloud, it was really difficult to experiment because for experiments you need machines of the same size usually for full-fledged experiments for a very limited amount of time. 15 years ago or so, we were buying servers and putting them to data centers and experiments were super limited. Cloud brought us this capability, great. And with RDS I quickly learned how cool it is to just create a clone, check everything, how it works, throw it out, and then rinse and repeat many times.
And then when you deploy, you already studied all the behavior. And I remember I was Creating a clone, but then it was so slow. RDS clone, I think it was 2016, maybe 15. Why is it slow? Okay, a cluster is like maybe 100 gigabytes. Today it's a tiny cluster, not tiny, small. But back in those days it was quite already a big 1. And I restored and somehow it takes forever to run some SELECT. And experienced AWS users know very well this phenomenon.
It's called lazy load, because the data is still on S3, and you have EBS volume which only pretends to have data, but data is still there, lazy loading in the background. And I reached support because we had good support. And the engineer said, oh, let's diagnose, it's some kind of issue. So it was hard to understand what's happening and so on. And I spent maybe an hour or so with that engineer, support engineer, who was not really helpful.
Right. And, and then someone, I don't know, like maybe my experience of managing people by that time, I was already, Hey, I had already 3 companies created in the past. So I like learn something about cycle psychology and so on. What I did, I just closed the ticket and opened another 1. Although usually any support would hate it, like don't duplicate, right? But this helped me solve the problem in a few minutes because another engineer told me, oh, that's just lazy load.
And I googled it, I quickly educated myself, okay, what to do? Oh, just SELECT * from your Table to warm it up. Okay. And since then I have a rule and I share it with my customers all the time. If you are a managed Postgres service and you need to deal with support sometimes, it's like roulette, right? Like it's 50 50. It can be helpful, can be not.
If it's not helpful, don't spend more than 10 minutes and just close the ticket, say thank you and open another one because if it's a big company who has big support, probably you will find another engineer who will be more helpful. Actually, I use this rule with in other areas of my life as well, for example, talking to some support people in like bank, right? Credit cards, debit cards, anything, it's not helpful?
Okay, thank you, and you can just call again and another person will probably help you much faster. What do you think about this problem?
Yeah, I think you must have different banking services to us because if we need to call the bank, you're guaranteed to be waiting 20 minutes on hold.
So- Oh, yes, it's terrible. It can be hours. I think we'll have a day when someone will create an AI assistant serving on human side, not on company side.
Oh, interesting. Yeah.
Yeah. So they should wait on that line and ask me to join only if everything is ready and small details already negotiated, some approval is needed and that's it.
Yeah, so
maybe one day we will have such systems.
Yeah, I think at big companies that makes a lot of sense, at smaller ones much less so. I think there are some smaller managed services out there. But yeah, maybe this problem happens less. I was gonna ask, because sometimes they have the ability to escalate, right? Do you have any tips? So let's say you've got a support engineer that wasn't able to work out the issue.
Do you
have any tips for getting them to escalate the problem to a second tier? Or do you always go to, like, let's open another ticket and hope that they escalate it?
That's a great, great question. And I think we... ...Position where, I don't know about RDS, by the way, but what I see in many cases, there is no such ladder built yet. So in case of big corporations, banks and so on, there is such option. You can ask to like senior manager, blah, blah, blah. Especially if you go offline, it's definitely an option always. Right. So please let me speak to like another person and you escalate and so on.
Like, but what I observe and recently what happened, I, we had a client who experienced some weird incidents. And those incidents require you to have low-level access, which you don't have on RDS. You need to see where Postgres spends time, for example, like using Perf, for example, or something. But you cannot connect, so it's all in their hands. And you need also to grant them approval to allow them to connect to your box and so on. So a lot of bureaucracy here.
And I told them like you need to escalate. And of course, like it's normal, but I don't see this option working. If like if you say escalate, it looks like they don't understand how, like what's happening here, right?
Really?
Well, you can try, like you can try and have some problem and some difficult problem, bring some difficult problem and try to escalate. Will it work? Is there any official option? Because if it's not official and it works sometimes, it's okay. Again, it's like gambling. Like I said, like it's similar to closing and reopening the issue and hoping next engineer will be more helpful. Escalation is also not guaranteed. It's like In many cases, it's good, right?
Because they are probably, they will try to solve. Well, I also have several, actually, I have several recent cases, very interesting. I cannot share all of them, but let me share another 1. Another company, they are on different, not RDS, not CloudSQL, and they had issues, a bunch of them, like 10 issues, different kinds. 1 issue eventually was identified with mutual effort as don't run backup push or how you call it on the primary. If system is loaded, do it on replicas.
We talk about it from time to time when we touch backups, right? And this was an issue on that platform. But what I observed is, like, trying to work with engineers, support engineers, and also ultimate escalation if you go to CTO or CEO level and say, oh, look, like, you know, CTOs are talking, right? And this is ultimate escalation. And it's also not helpful sometimes, right? In that case, it's like there was some chunk of disappointment, what I observed, like this was feedback I heard.
Right, so escalation is interesting, but my point is like, we probably need to learn about escalation ladder and practices from other businesses, obviously, right? And I still think it's not fair that customer pays bigger price and doesn't have control.
Yeah, sure, well actually on this topic, I was gonna ask, do you think this is less of an issue as for the... There are managed service providers that give more access. Like we had an episode on super user, for example, and it's come up a few times. Obviously that's still not like you're talking about running perf for example but I'm guessing a whole category of issues just don't exist if you've got superuser access So is it less of an issue on those?
I will tell you a funny story. It was with CrunchyBridge. I respect CrunchyBridge for 2 reasons, already for 2. It was 1, now for 2. 1 is super user. I don't know any other managed service yet which provides you super user. It's amazing. You can shoot off your feet very quickly if you want. It's freedom, right? And another thing is that they provide access to physical backups, which is also nice. This is true freedom and honoring the ownership of database and so on.
Because without it, maybe you own your data, but not database. You can dump, but you cannot access PGDATA, but physical backup, nothing. And also, you own your data conditionally because if bugs happen, you even cannot dump. Right? And that sucks completely. And I'm talking about everyone except CrunchyBridge, all managed services, they all like steal ownership from you. That sucks. So, the final story... I think
there is at least one other, but like, I think they're quite small. I think maybe Tembo give super user access. I haven't
actually checked. Oh, maybe, yeah, maybe. Yeah, apologies if I missed something. Yeah, let
us know.
Of course, I work with a lot of customers and expanding my vision all the time, but of course, it's not 100% coverage.
Yeah, of course.
Definitely not.
And definitely the big ones don't.
Right, exactly. And they say this is for your own good, but it's not. So let me talk a little bit about CrunchyBridge. It was super funny. We needed to help 1 customer and reboot a standby node. And it turned out CrunchyBridge doesn't support rebooting, restarting Postgres on standby nodes. They support it on primary or whole cluster, but not specific standby node. It was very weird. I think it's because they just didn't do it somehow. Like, it should be done, it should be provided.
But we could not afford restarting whole cluster. We needed just 1 replica. And then I said, OK, we have super user.
Yeah.
What we can do, copy from program, right?
So you crashed the server.
Not crashed, why crash? pg_ctl restart, like, it's all good. Just a -m fast. All good, all good. Yeah, there are some nuances there.
But on that, let's go back to the topic briefly, because it's relevant.
Let me finish. Copy from program doesn't work on replicas because it's a writing operation. Oh!
So you had to contact support, right? That's where I was going with this.
Well, support says this feature is not working. I mean, it's not supported.
But they could do it for you, no?
No, no, no. I needed the part of automation we were building. It was part of bigger picture. And we needed this ability. So what we ended up doing is copy to program, writing to
a log.
And this worked on Replica, but we were blind a little bit. But then I talked to the developers and realized we had an easier path in our hands. It's python -u. Anyway, if you have super user, you can hack yourself a little bit. It's your own right. If you broke something, don't do it.
Yeah, it's a really good point. So that was kind of my questions. If you've got more access, I presume there are fewer issues that you need support for. But that does raise a good question because there's kind of 3 times you need to contact support, right? We've got an issue right now, maybe urgent, maybe not. I've got a question, how does something work? And then the third category is feature requests. Like I'd like to be able to do this, but which we can't currently do.
My experience of feature requests or like looking at different forums of different managed service providers of where they ask people to go to request and vote on features. It looks a little hit and miss. How like what's your do you have any advice in terms of how to do that?
We have 2 paths here advice to whom to users or to platform
users. I'm thinking for people listening mostly users.
Well it's a bad state right now. Again, I think managed services should stop hiding access. They should like, they build everything on top of open source. And they charge for operations and for like support, good, good, good. But hiding access to purely open source pieces, it's like, it sounds bullshit to me. Complete bullshit. Actually, it makes me angry even. So amazing, like yesterday I saw an article from Yugabyte.
Yugabyte suddenly, I feel it like Tembo actually released DBA AI going outside of their platform. And Yugabyte did a similar thing. They went outside of their Metaverse product and platform, and they started offering a tool for 0 downtime upgrades, compatible with Postgres running on many managed service providers like RDS, CloudSQL, Supabase, CrunchyBridge and so on. And that's great. That's great.
They did wrong a little bit because they called things like blue-green deployments while it's not... They did similar mistake as RDS did, we discussed it, right? They, this...
I, yeah, but I saw your tweet about this and I'm going to defend them because I don't think it's their fault. I think the problem is RDS broke people's understanding. Look, I'm, wait
a little bit. I'm going there, exactly. I'm going exactly there. So blue-green deployments, according to Martin Fowler, 15 years ago, he published an article, they by nature must be symmetric.
We did an episode, remember?
Yes, exactly, Criticizing RDS implementation. And Postgres definitely supports it. We implemented this, like some customers use it. That's great. And what my point is like, probably you go by it, hit the same limitations we hit. On RDS you cannot change things, it's not available. And since you don't have low-level access, you cannot change many of things. And this limits you so drastically. And it feels like some weird pendulum key. If you want RDS, okay, good, I understand.
But you cannot engineer the best approach for upgrades. And you need to wait how many years? Okay, blue-green deployments, they released. I see better path for blue-green deployments. And it's my database and I cannot do it, I need to go out of RDS. At the same time, if they provided access, more access, opening gates for additional pieces of changes, it would be possible to engineer blue-green deployments for me or for third parties.
Like, okay, you go buy this third party, they want to offer or sell some product or tool compatible with RDS, but since they don't have access to recovery target LSN and so on, they are very limited, right?
Yeah, but It might be exactly for that reason. If we're talking about the reason for needing it, one of the reasons is migrating off, migrating out, then you can see the incentives.
And for upgrades, things are becoming much better in PostgreSQL 17. And blue-green deployments, it's kind of not only for upgrades. If we eliminate the upgrade idea, we can implement blue-green deployments on any platform right now. Because you can skip many LSNs in the slot and just... How is it called? Not promote, because promote is different. I forgot.
Like shift position of logical slot and synchronize its LSN position with the position we need, and then from there we can already perform this dance with blue-green deployments. It's doable. But if you want upgrades, OK, we need to wait until 17, because there is low risk of corruption. You mean 18? 17. 17 has pg_createsubscriber CLI tool. And it also officially supports major upgrades on replicas, logical replicas.
So yeah, These 2 powerful things give us great path to upgrading really huge clusters using 0 downtime approach. Well, near 0 downtime, unless you have PgBouncer. If you have PgBouncer, you have Pulse Resume, then it's purely 0 downtime. Anyway, my point is, since they partially vendor perform this vendor lock-in, they hesitate opening gates.
Customers cannot diagnose incidents and they also cannot build tools and third parties like Yugabyte or for example Postgres probably would also build some tools compatible with many other platforms. Not other, we don't have platform, right? We help customers regardless of location of their Postgres database. So if it's RDS, okay, CloudSQL, okay. But building tools for them, it's very limited right now because we don't have access to many things and we don't have super user and so on.
So yeah, that's bad. But back to support, if, like, my main advice is just gambling advice. Just gamble, guys.
Well, I have some, like, I think a lot of people have very high trust when they request features, like, very, or very high belief that people will understand why they're asking for it and I don't I think a lot of people don't include context when they're asking for they don't include why they want the feature or what it's preventing them or what it might cause them to do if they if they can't get it or what their alternatives are going to be.
So I think sometimes when you make products people just ask for features and you have to ask them why do you want this, like what are you trying to do because without that context it's really hard to know which of your potential solutions could be worth it or if it's worth doing at all. But most vendors I've seen just don't ask that question, like people ask for a feature
or
you know or a new extension to be supported or something and there's no, there's no, even if that extension has multiple use cases, there's no question back as to why they want that feature. Like it's so...
Value, right? Goals.
Yeah, well exactly. And sometimes 5 people could want the same feature but it's all for different reasons and that's like...
Which shows bigger value if there are many different reasons.
Yeah, or maybe an issue. Like maybe it's actually less of a good idea because they're actually going to want different things from it. Like it's going to be harder to implement, unless it's an extension and you get them all straight away. But I think in terms of customers asking for things, I've not seen this work from managed service providers specifically, but for products in general, I think it is helpful to give the context as to why you're asking for something.
The only other thing I had to add from my side was if and when you are considering migrating to a managed service provider, so either at the beginning or when you've got a project up and running.
I see quite a few people on Reddit and places at the moment looking at moving self-hosted things to managed service providers you know as they're gaining a little bit of traction and My I've seen a I've seen at least 1 case go badly wrong when the person didn't contact support At the beginning of the process, you know They tried to do everything self-service and actually it would have been helpful for them to contact support earlier. I think there's 2 good reasons for that.
1 is to make sure the migration goes smoothly, but the second is test the support out. How Does it work for you? Is it responsive? What kind of answers do you get? Is it helpful, that kind of thing?
Yeah, we need to write some automation to periodically test all the supports using LLMs. I'm joking, of course. But I just know, it's your database. Even if you have, like, consider them a cat, like, microservices, it's not pet, it's cattle. But it's still like you, you like being maybe DBA, DBRE, SRE, doesn't matter, back-end engineer, you are very interested to take proper care of database and so on. And support, you're just one of many. Your database is one of many.
And they also have their own KPIs. Like your question closed, okay, goodbye. And also like, okay, do this, and so on. And since we don't have accesses and so on, and kind of, I just feel the big need, like this is a big imbalance. If you ask something support, they can help. I saw many helpful support attempts, very helpful, very careful, but it's rare, right? And Postgres experts also rare Not many
right yeah,
and and this like closing ability to third party for example if somebody is involving us we immediately say okay this you need to put pressure on their support we cannot help.
Okay so what do you mean by putting pressure do you mean like following up regularly what do you mean by putting pressure on?
The opening, escalating and so on, like explaining why, for example, like a big company can have various support engineers. Yeah. And for example, if there is a hanging query in its RDS, it's a recent little story, And they suddenly say, okay, we solved. Query is not hanging. And I wonder, how come? It was hanging because it cannot intercept signal, blah, blah, blah. It was hanging many hours. How did you manage it? OK, support said RDS support. We managed to, OK. Did restart happen?
Yes, it did. And in logs, we see the signs of kill -9. So this is what support engineer did. This support engineer should be fired, right? In RDS team, this is my opinion, but I'm just saying it's hard to build super strong support team and it will be always lacking and it would be great if company would allow third party people help. If you check other aspects of our life, for example, if you have a car or you have recently I replaced tankless heater in my house.
If you go to vendor, sometimes vendor doesn't exist. For example, my solar is very old, but anyway, you can variety of service people who can help and do maintenance. If company, Even RDS is limiting maintenance aspects only to their own staff. It always will be very limited because Postgres expertise is limited on the market. They should find a way to open gates. This is my... It's already a message to platform builders. What?
Well, I mean, I understand where you're coming from as a...
I'm coming too, it's not from, it's too, it's future.
I mean, I understand where you're coming from that they can't hire all of them and actually there's benefit in terms of other people being able to provide support. But if Postgres expertise is so limited, where is everyone else going to get their support from? It's not... It's open
market and competition.
Yeah, exactly. So you're saying there is plenty of Postgres expertise?
Well, the company should only benefit if they open the gates and allow other people, help other people whilst they are still on the same platform. Because otherwise, concern and level of disappointment about support can raise until the point they go off. Which is actually probably not a bad idea. I also believe that slowly our segment of market will start to realize that there should be like this self-managed, there is managed, but probably there is something in between.
And I know this is some work I cannot share is happening. So something in between where you truly own but still have benefits of managed services. This should happen and I think multiple companies are going in this direction.
Or, and I'm seeing this more from kind of smaller companies, quite established in terms of like their database and team but not brand new startups necessarily, Moving to services where factoring in support as one of the main things they're looking for in a service provider. I think in the past, like people would factor in, people look at a lot of things, right?
Price,
ease of use, region,
like,
yeah, they look for a bunch of features but don't always factor in support as one of those key factors and I think I like to see when people do factor that in and take it seriously. So that's the alternative, right? Is pick your managed service provider partly based on how good their support is.
I'm talking about absolutely new approach when a service is not self-managed, not managed, but it's very, very well automated and you can hire if you're not satisfied with some company who helps you maintain it, you can switch the provider of this maintenance work, right? This should be
like co-managed.
Yeah. Co-managed. Yes, exactly. It's, it's, It's great because market is growing and competition is growing and we see, like I just provided a few examples about several managed services, we see bad examples all the time and the problem is systematic. It's not just like some company is bad and others are good or vice versa. It's systematic problem rooted in the decision to close the gates and not allowing others to look inside.
I also think providing good support is expensive. Deep Postgres expertise is expensive. I'm a bit surprised by your experience with escalation. Most companies I see do have escalation paths, but I don't deal with Postgres managed service providers support that often. So I'm surprised to hear they don't have good escalation paths. But yeah, if that's the case, I feel like there must be opportunity for people. And I know some do really.
I have a question about this area. About like, if you, you're also running something on cloud, GCP, right? Yeah. Cloud. Do you have Kubernetes? Yeah. You use it, okay. So Can you, so you use GKE, right? Yeah. Google Cloud Engine or Cloud Kubernetes Engine, right? Yeah. So if you go to Compute Engine, they call it Compute Engine, right? Where you can see VMs. Yeah. Do you see VMs where this Kubernetes cluster is running? I guess yes.
See the pods and the, yeah, see the pods.
No, not the pods, the VMs. Can you SSH to those VMs?
Oh, I have a show, Yeah.
So Google provides Kubernetes engine, automation, everything, and you still have SSH access. Yeah. Why cannot be done the same thing for managed Postgres?
Oh, okay. Yeah. Well, Good question.
If you have SSH access, well, you can break things. Well, okay, I know. I know. If I open my car, I can break things there as well. This is interesting, right? I know companies who provide services to tune, maintain Kubernetes clusters. And this is a perfect example, because for them, There is great automation from Google. Everything is automated.
But if customers have specific needs, and Google cannot meet those needs because they have limited hands, number of hands still, right, and attention, and so on, Company can hire another company who are experts in this particular topic, they can go and they have everything and they have SSH access to this fully automated thing. Interesting, right?
Yeah. Well, any last advice for people, like actual users?
Well, yeah, I know I'm biased towards platform builders because I'm upset and angry and I hope I explain origins of my anger but Yeah, put as much pressure to support as possible. Politely, but very firmly and explaining. Like, I think it's possible to, you had a great point that reasons and final goals need to be explained, right? And also risks, like what will happen if we don't achieve this? Sometimes up to okay, we consider switching to different like approach, provider or something.
Yeah, I think just like people should be more like detailed and putting more pressure to support to squeeze details from them. In this case, I'm very interested because many like managed Postgres users come to us more and more recently and they ask for help and if support is doing their job great, it helps us as well because yeah, it's like beneficial for all because we help to level up health of Postgres clusters, get rid of bloat, add some automation, tuning and so on.
But if support does poor job, well, customer starts looking at different direction where to migrate, right? And yeah, so my advice to users, pressure, details and so on To support.
Is there anything to be gained in the cases where they give exceptional support? You mentioned rare cases where the support is very good. Is there anything that we can do in those cases to, like, not just say thank you, but say this was really good or feedback that this
is what I liked a lot is when support engineers formatted responses very well and I knew it's not a little m actually but maybe partially but It was human behind that for sure because I saw it. Well actually, who knows these days, right? Yeah, and in this case I would say thank you for well-formatted, well-explained, well-structured response and so on. Definitely. So you try to find good things and mitigate my anger, calm me down. Thank you so much. Thank you for everything.
Well, it's good. It's interesting. I found this one interesting. Thank you.
Yeah, less technical discussion today, but I hope it provokes some thoughts a little bit. I think changes are inevitable. I'm very curious in which direction the whole market will go eventually, but let's see.
Me too.
Good.
Well, have a good week and catch you next time.
Thank you. See you.
