PostgreSQL Server Programming - podcast episode cover

PostgreSQL Server Programming

Jul 23, 202533 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Focuses on advanced programming within the PostgreSQL database server itself, moving beyond simple SQL queries. The text covers various topics, including writing functions in different procedural languages like PL/pgSQL and PL/Python, creating and managing triggers, and debugging techniques. Additionally, it explores extending PostgreSQL with C functions and scaling databases using PL/Proxy for data partitioning, highlighting the advantages of server-side programming such as performance, ease of maintenance, and security. The excerpts also provide details about the authors and reviewer, along with information on Packt Publishing's customer support and philosophy.

You can listen and download our episodes for free on more than 10 different platforms:
https://linktr.ee/cyber_security_summary

Get the Book now from Amazon:
https://www.amazon.com/PostgreSQL-Server-Programming-Hannu-Krosing/dp/1849516987?&linkCode=ll1&tag=cvthunderx-20&linkId=3c63881fdf24c1d94c6bdcf795d51989&language=en_US&ref_=as_li_ss_tl


Discover our free courses in tech and cybersecurity, Start learning today:
https://linktr.ee/cybercode_academy

Transcript

Speaker 1

Okay, let's unpack this. If you think of Postgres School server is just you know, place to store data like a big passive bucket, you're missing out on a tremendous amount of power. Today we're diving into Postgres School server programming. We're drawing from some really insightful excerpts from the technical book Postgreschool Server Programming. Our mission really is to show you how Postgres School can actually be seen as an application development framework.

Speaker 2

Now.

Speaker 1

Pushing logic directly to the database layer offers well shortcuts to building faster, more secure, and highly maintainable applications. Yeah.

Speaker 2

And what's fascinating here is we're not just talking about database administration and the traditional sense, not at all. We're exploring how the database itself can become like a proactive, intelligent part of your application's core logic.

Speaker 1

Right.

Speaker 2

It stops being just passive storage and starts being an active participant, your single source of truth for business rules. You know, hashtag tag the transformative power of server programming.

Speaker 1

So let's challenge that whole data bucket philosophy. Yeah, imagine a common situation like a bank transfer. Say Bob wants to send I don't know fourteen dollars to Mary h The typical way you might write separate sequl commands from your application, right, want to take money from Bob a debit, another to give to Mary a credit? But I mean think about the problems there at first security, Do your app really need to know Bob's balance or just can't he send the money?

Speaker 2

That's a big one.

Speaker 1

Then speed, you've got multiple round trips to the database slows things down. And the big one integrity. What if the system crashes halfway through, Bob's money is gone, Mary never gets it.

Speaker 2

Nightmare scenario exactly, And that's precisely why server programming is such a well a game changer. By executing that transfer logic directly on the database server, you're centralizing it. It's not just tidy code. It's a fundamental shift. You're putting the rules in the most robust place, the database itself. It becomes this active.

Speaker 1

Partner active partner.

Speaker 2

I like that, and it cuts out all those network round trips, so it's way faster. You write a function. Maybe in plpg's it does all the checks, does Bob exist enough funds, is Merry valid? And it just returns okay, or you know, not enough funds. The client app doesn't need the details, just the success or failure. No sensitive data leakage.

Speaker 1

That is incredibly powerful for security, for consistency, yeah yeah, But what if you want the database to react automatically like inventory management, stock levels, change offers, need updating instantly, no app involved triggers.

Speaker 2

That's exactly what triggers are for magic pretty much. They let operations in the database automatically cause other things to happen. So say you've got your fruit sinstock and your fruit offer tables, A trigger can make sure that when you update an offer amount for apples, uh huh, the reserved count in fruits in stock just adjusts automatically. It stops you over reserving, stops you selling stock that's already spoken for, even across different tables. It enforces that link.

Speaker 1

Okay, that's clever.

Speaker 2

And building on that automatic reaction idea, triggers are fantastic for auditing, indispensable. You can set one up to log every single action on important tables like imagine a salary's table. An audit trigger can grab the username, the timestamp, the operation type, insert, update, delete, and even capture the old salary and the new salary. Maybe store the whole old and new row as JSON for a complete record. It's

a rock solid audit trail. It's automatic, totally automatic. Plus triggers can enforce data consistency, simple things like automatically converting an employee name to uppercase whenever it's inserted or updated. Ensures uniformity. You know, no need to rely on every client app remembering the rule.

Speaker 1

So functions, centralized logic, triggers automate reactions. But what about teaching postcress new tricks entirely? What does extensibility really mean here? Is it just more functions?

Speaker 2

Oh, It's much more profound than just functions. Extensibility means you can actually extend postgres school itself. It's language, it's capabilities. You can define your own custom data types like uh fruity maybe that understands bushels. And you can teach it custom operators like how to compare apples and oranges directly in a query? H seriously, well metaphorically maybe, But you can define operators for your custom types. And yes, user

define functions. UDFs are a huge part, and these functions can get sophisticated. They can handle complex arguments like XML or JSON or return results in those formats too great for modern architectures like SOA or micro services.

Speaker 1

Right makes sense, and you mentioned performance earlier. Extensibility ties into that too, especially server side caching.

Speaker 2

Absolutely, server side casing is a classic pattern enabled by functions. The logic is simple. Really, inside your function, first check if the value you need is already cashed, maybe in a table. Is it there, is it recent enough? If yes, great, return it. If not, compute it, store it in the cash, then retun it.

Speaker 1

So next time it's fast.

Speaker 2

Exactly. Think about calculating total company sales that might involve querying thousands of rows cash the result. There's this great story about a Java e email system. It was sending maybe a few hundred emails a second, okay, but not amazing. Then someone rewrote the email generation part as ap peel Peril function inside.

Speaker 1

Postgres school, inside the database.

Speaker 2

Inside the database, suddenly it wasn't sending hundreds, It was spewing out tens of thousands of emails per second.

Speaker 1

Wow, that's that's incredible.

Speaker 2

It was all because the computation happened near data minimal network latency. It went from like a two point two millisecond query from the client to zero point one to two milliseconds inside the database. Huge difference.

Speaker 1

That is a massive jump, and you can see how that ties into maintenance and security too. If all your core data logic is in database functions or views, upgrading is just running a script, a detail.

Speaker 2

Script, YEP, data definition language. You update the function in the database.

Speaker 1

No downtime, no complex roll out across all your different.

Speaker 2

Client apps, exactly. All clients just start using the new logic immediately. It simplifies maintenance enormously. It's security big win there too. If you grant database users permission to execute only specific functions, not direct access to tables, then even if an attacker compromises your web server, say, they can't just run select from users and dump all your data.

Speaker 1

They can only run the functions you've a looked.

Speaker 2

What you've designed to be safe right to only return specific non sensitive info or perform validated actions. It drastically reduces the attack surface hashtag tag tech trag y. Postgress School the environment advantage.

Speaker 1

So, okay, putting lodging in the database is powerful, we get that, but why postgres school specifically, what makes it the go to platform for this kind of server programming.

Speaker 2

Well, first off, the cost of acquisition.

Speaker 1

It's zero. Can't be free, right.

Speaker 2

Compare that to commercial databases SEQL server Oracle. You're talking potentially thousands, even tens of thousands in licensing fees. Postgress will costs nothing to download, nothing to use, and this directly leads to a low cost of failure. When you're starting a project things are uncertain, you might experiment change direction. Being able to try things out, even scrap an idea without being locked into expensive long term licenses, that's a huge advantage. It frees you up to.

Speaker 1

Innovate definitely, and that ties into licensing more broadly, doesn't it. I remember the unease when Oracle bought my sequel that vast and thunderous sucking sound.

Speaker 2

Does someone put it huh, yes, exactly. There is a lot of uncertainty about the future of my squels licensing. Postgres School's license, though, is incredibly liberal bs D style. Basically, you can have the source code, do anything with it that you want, redistribute it however you jolly well please, and those rights extend indefinitely.

Speaker 1

Wow.

Speaker 2

For companies, that removes massive legal risk. You know, the terms won't suddenly change. You know, you can build products on it without unexpected fees popping up.

Speaker 1

That's huge peace of mind.

Speaker 2

Then there's predictability. Postgreds School is known for sticking really closely to SQL standards. Now, sometimes that might feel a bit stricter, maybe less forgiving than other databases if you write slightly nonstandard sequel, but the upside is fewer unintended behaviors, fewer surprises. You know it's going to work the way the documentation says it works. That means less chance of your application breaking in subtle ways down the road because

of some weird database quirk. It provides a really stable foundation. You can trust it.

Speaker 1

Okay, stability, predictability, Yeah, what else? You mentioned something about the community earlier.

Speaker 2

Ah, yeah, the community aspect is actually really interesting and often overlooked. With commercial databases, trying to talk to the actual core developers, good luck, you're usually dealing with layers of.

Speaker 1

Support stat right ticketing systems.

Speaker 2

Exactly with postgres School, many of the core developers hang out on IRC channels, they go to conferences, you can actually talk to them.

Speaker 1

And they like beer apparently.

Speaker 2

Well the book mentions that, But the point is there's this direct connection. It fosters a real sense of shared ownership and concern for the project's health. Expert help is often just a question of a way. It's a very vibrant ecosystem.

Speaker 1

That's a really different feel.

Speaker 2

Definitely, and technically a huge advantage is its support for multiple procedural languages. It's not just plpgcquall the built in one. You can write server side functions in Python, Perl, TCL, even C.

Speaker 1

And you can add or remove these languages on.

Speaker 2

A running server, no downtime needed to install, Say peel Python support and think back to our earlier example. The bad way of doing things maybe with PHP connecting, pulling data, figuring things out client side, then sending an update back.

Speaker 1

Right, the multiple roundtrips, the potential integrity issues exactly.

Speaker 2

That leads to terrible scalability. The right way is pushing that logic into a database function using one of these powerful languages. A single efficient.

Speaker 1

Call makes sense. What about transactions? How does post grace handle those?

Speaker 2

It's very sensible, very safe. By default. The standard isolation level is read committed me, meaning your transaction will only ever see data that has been fully committed, fully saved by other transactions. You don't get dirty reads seeing incomplete changes that might get rolled back later. That avoids a whole class of bugs.

Speaker 1

That sounds safer it is.

Speaker 2

And by default, each individual seql statement you send acts as its own little transaction. It commits immediately, though client tools like fusequal often wrap things in a transaction block automatically. If you're typing multiple commands okay, and one more thing on control postgrescool gives you more control over how your functions execute, specifically around security, their security and voker. That's the default. The function runs with the permissions of the

user calling it, but there's also security definer. This is really useful. The function runs with the permissions of the user who created it.

Speaker 1

Ah, so you could let a less privileged user run a function that needs higher privileges for a very specific controlled.

Speaker 2

Task, precisely temporary safe privilege escalation. They can't access the underlying table directly, but they can call the function which performs a specific safe action on their behalf. And you can also define a cost for a function. This is like giving a hint to the query planner, the.

Speaker 1

Part of the database that figures out how to run queries.

Speaker 2

Yeah, you can tell it. Hey, this function is really expensive to run, so the planner will try to avoid calling it too often if it can find a cheaper way to get the result. Helps optimize overall performance. Hashtag hashtag deep dive into language capabilities and advanced techniques.

Speaker 1

Okay, let's dive into some of the actual coding. Then the warkhorse so PLPG SQUL you called it the workhoorse. It's built in, no extra setup needed. What makes it so fundamental for server side logic and postgress.

Speaker 2

It's just incredibly well suited for combining SQL with procedural logic, you know, if statements, loops, variables. It's great for encapsulating those multi step operations we talked about, like the bank transfer. You can access function arguments easily, either by their position or by a name. You've got your standard of penols, your case. Statements for conditional logic loops are there too. But a really important warning. Looping through query results inside

plpg'sl Generally that's considered doing it wrong. It's usually very inefficient, high processor cost, high memory use. SQL is designed for set based operations. You should almost always try to find a way to express your logic using SQL sets rather than row biro looping in plpg sql let the database engine optimize the set operation.

Speaker 1

Good tip. Okay, what about getting results out of functions?

Speaker 2

Right? You use perform if you just want to execute a query for its side effects like an insert or update, and you don't care about the result. If you do need the result, you use select into to put it into a variable.

Speaker 1

Makes sense. And you can return more than just single values right like lists or tables?

Speaker 2

Oh yeah, absolutely, you can return a set of integer like generating the Fibonacci sequence. You can return set of table name, which effectively returns rows matching that table structure. You can define functions with out parameters which become part of the result row, or use the very clear returns table syntax to define the output columns right there.

Speaker 1

What if the structure isn't known beforehand?

Speaker 2

Good question. You can use return set off record, but then you have to provide a column definition list when you call the function. More flexibly, you can use polymorphic types like set off an element or set off any compatible for functions that can work with or return different but related types. And there's variatic arguments, which lets a function accept an arbitrary number of arguments of a certain type past as an array super flexible.

Speaker 1

You also mentioned cursors briefly before for large results.

Speaker 2

Right reef Cursors specifically, cursors are internal structures that let you fetch query results incrementally lazily, instead of the database computing the entire results set and sending it all at once, which could take ages and lots of memory. For huge queries, you get a pointer kinda yeah, a reference. Then your application can fetch rows from that cursor as needed. It's great if you only need the first few rows quickly, or if you want to return multiple independent result sets

from a single function. Call PLPGCQ trigger functions Automated guardians no.

Speaker 1

Get back to triggers. These automatic guardians you said, they get old and NW records. How does that actually work in the code.

Speaker 2

So when you write a trigger function one that returns trigger postgresscool automatically makes special variables available inside it. Old is a record variable holding the row values before the operation for update and delete. New holds the values after the operation for insert and update. And you also get tg variables like TGP tells you if it was an insert, update or delete tg table name gives you the table name and so on. Lots of context, so.

Speaker 1

That audit trigger example, it uses those exactly.

Speaker 2

It would check TGOP. If it's an insert, it logs the new record maybe converted to Jason. If delete, it logs old. If update, it logs both old and new, along with the username from current user timestamp now table name and operation type creates that perfect audit law.

Speaker 1

Very cool, and you mentioned preventing actions.

Speaker 2

Yeah, you could have a simple trigger function that just raises an exception if t GOP is delete or truncate on certain critical tables, just stops the operation cold or more subtly. In a before trigger one that runs before the change is actually made, you can modify the new w.

Speaker 1

Record uh like the upper case name example.

Speaker 2

Precisely we're setting timestamps. A common pattern is a function that sets create at and created by only on insert, and updated at and updated by on update, ensuring those created fields are immutable after the initial insert. But again, the reminder triggers are powerful, but use them judiciously. For complex application logic, they can make debugging harder because the logic isn't always obvious in the main application code. Keep them focused.

Speaker 1

If you can seeing inside.

Speaker 2

That debugging point is key. If you've got complex logic hidden away and functions and triggers, how do you troubleshoot when things go wrong. It's not like setting a break point in your Java or Python code.

Speaker 1

Is it. It's definitely different. The simplest approach is what you might call manual debugging. Using raise notice, you just sprinkle raise notice reached point a with value my variable throughout your code. The messages show up in your client like psdes col like print statements.

Speaker 2

Basic.

Speaker 1

It's exactly simple, no installation needed. Great for checking values in regression test scripts. The downside is it can clutter your output if you leave them in. For actual error handling, you use reez exception. You can provide a custom error message and even a specific SQL state code which client applications can catch and react to programmatically. Okay, And if you want messages to go to the server's log file instead of the client maybe for background tasks or just less noise, you use.

Speaker 2

Raise luck right, So print debugging essentially, what about stepping through code?

Speaker 1

For that? You need the visual PLPG SQL debugger usually integrates with GUI tools like PG admin once it's set up, and the setup can be a bit of a pain. Fair warning is pretty powerful. You can set break points in your plpg sql.

Speaker 2

Code like in a normal id YEP, step.

Speaker 1

Through line by line, step into, step over, inspective values of variables, even change variable values on the fly to test different scenarios. You can also set global breakpoints like stop whenever any code calls. This specific function really handy for tracking down unexpected.

Speaker 2

Calls and the advantages.

Speaker 1

The big advantage over raised notice is it doesn't use server resources when you're not actively debugging, and you don't need to modify your function code with print statements. The disadvantage that installation process can be fiddly. Using unrestricted languages expanding horizons PL Python.

Speaker 2

Okay, let's shift gears beyond PLTG squel. You mentioned unrestricted or untrusted languages like pl python. That untrusted label, why is it there? What's the deal? Right?

Speaker 1

The untrusted language is pl python, PEEL, perileu, PLTCLU, peel, JAVAU. They're called that because they break out of the database sandbox. They can by default access the server's file system, make network connections, send signals to other processes, basically do anything the underlying language, Python, Perol, etc. Could do on the host operating system.

Speaker 2

Ahotential security risk.

Speaker 1

Huge potential security risk. That's why you absolutely should not grant permission to create functions in these languages to arbitrary database users. You don't fully trust. They could potentially read sensitive files or disrupt the server.

Speaker 2

Got it, so why use them?

Speaker 1

Power and convenience, Python, for example, has vast libraries for everything imaginable. It's often much easier and requires less boilerplate code to do complex text processing, interact with web services, or use external libraries and peel Python that it would be in plpg sql or especially C plus. Many developers are already comfortable with Python and peel Python handles the data type conversions between postcrescool types and Python types pretty transparently for you.

Speaker 2

Okay, so if I'm writing a pel Python function, how do I run SQL queries from within that Python code.

Speaker 1

Peel python provides a dedicated module usually accessed as plea high. It has functions like peelpi dot execute to run arbitrary SQL queries. It also has peelpi dot prepare, which lets you create a prepared statement plan. This is really important for performing. So if you're running the same kind of query multiple times, maybe just with different parameters.

Speaker 2

Prepare ones execute many times exactly.

Speaker 1

Preparing can be costly, so caching that plan makes subsequent executions much faster. And when you're writing a trigger function in peel Python, instead of old and new to W variables, you get a Python dictionary usually called TD, which contains all the trigger context td ol, td new, tdvn td, win, et cetera. Right, what about dynamic sqel building queris on the fly? Huge risk of seql injection There absolutely critical point.

You should never just concatenate strings together with user input to build SQL queries in any language, including PL Python. Plupy provides functions specifically for safe quoting Peelpi dot quote item for table or column names, peelpi dot quote literal for string values, and peelpi dot quote nullable which handles nls correctly. Always use these when incorporating external data into dynamic queries. It's your primary defense. Good advice. How does

error handling work within PL Python? Does it interact with database transactions? You use standard Python t dot accept blocks to catch errors that might occur within your Python code or from SQL queries executed via PLI. For transactional control within your function, pl Python provides PLPI dot subtransaction. This

lets you create save points. You can wrap a block of code and a subtransaction context manager, and if an exception occurs inside that block, only the database changes made within that block are rolled back ah.

Speaker 2

So you can contain failures exactly.

Speaker 1

It's important to note, though, that this only rolls back database changes. Any changes to your Python variables within that block will persist unless you explicitly handle them in your accept clause. Python's atomicity is separate from the databases.

Speaker 2

Right and debugging pl pythons still no visible debugger I guess.

Speaker 1

Not built in no, so you fall back to similar techniques as with plpg sqel, but using pythons tools dot PLIPI dot notice, PLPI dot warning, pelpi dot error to send messages standard Python A certain statements are useful too. You can also redirect pythonsys dot sit out and syst dot sitch a dor within your function and to send output to the postgrec will server log file, which could

be helpful. But the real power here is doing things you just wouldn't normally think of doing inside a database, generating image thumbnails using Python's PIL library, sending emails using SENTIMENTPLIP, Interacting with external APIs all possible directly from database functions. With pl python. It really encourages thinking outside the SQL box, writing advanced functions and c raw power.

Speaker 2

Okay, peel python opens up a lot, But then there's C writing functions directly and C that sounds like the ultimate level of integration and probably complexity. When you absolutely need C, you go to C.

Speaker 1

When you need the absolute bare metal performance, or what do you need to interact with Postgrescol's internal APIs at a level that interprets languages just can't reach. Think extremely computationally intensive algorithms or functions that need to process millions of rows per second with minimal overhead, or maybe tightly integrate with operating.

Speaker 2

System features, and how does it work? Is it complicated?

Speaker 1

There's definitely more ceremony involved than with plpg sql or peel Python. You need to include specific postgress will header files postgress dot HfN gr dot H. You need a special macro pg modumi magic to identify the compiled object as compatible. Each C function needs to be declared with PGF function in ov one and to use specific C macros like pg targin thirty two to get arguments and

pg turn in thirty two to return values. Some more boilerplate, A bit more, yes, but the built process is actually quite streamlined now thanks to pdxs posts wll's extension building infrastructure. You typically just write a simple make file, run make and then pseudo make install and postcress will knows how to compile and link your C code correctly. Then you just create function and SQL to link it.

Speaker 2

Okay, not too bad. What about advanced features in C functions? Handling NL's, For.

Speaker 1

Instance, C requires explicit handling of nls. Unlike plpg sql, you use the pgr nl macro to check if a specific argument is NLL before you try to access its value with pg turner, and if you need to return NL you use pgr turnal forgetting. This as a common source of crashes, C is less forgiving.

Speaker 2

What about very attic arguments.

Speaker 1

Those arrive in your CE function as a Postgres school array type. You need to use specific C functions to deconstruct that array, getting its dimensions, checking its element type, and then iterating through the elements. A bit more work than in popg sqel.

Speaker 2

Memory management must be crucial in SEA leaks.

Speaker 1

Crashes absolutely vital. You must use postgrescols memory management functions primarily pallic and free, not standard Sea mallek and free. Palic allocates memory within postgres School's current memory context. The beauty is that memory allocated in shorter lived contexts like the protuple or perfunction call context, is automatically freed when that context is destroyed. This prevents many common memory leaks.

Ah that's clever it is and always zero fyll allocated structures using memset to avoid subtle bugs from uninitialized memory. For air reporting, you don't just print. You use postgres School's air function dot airport is very structured. You specify at the severity level error warning notice, log et cetera, and soul state code, primary message detail hint. It handles localization and routes the message correctly to client or log

error level aborts the current transaction. Importantly, unlike exceptions in plpg sql, air report messages are generally sent to the client immediately, not buffered until the end of the function.

Speaker 2

Okay, and how do these C functions run SQL queries back into.

Speaker 1

The database that's done using the SBI, the server programming interface. Inside your C function, you call SBI connect that you can execute queries using functions like spy ASEC or SPIKESIGI plan for prepared statements. You process the results, and critically, you must call SBI finish before your C function returns. SBI handers transaction context correctly, but understanding data visibility rules.

What changes your function can see from other transactions or even earlier commands within the same transaction is important for complex scenarios, so.

Speaker 2

C offers the ultimate power what other advanced things become possible.

Speaker 1

You can do some really sophistic icated stuff. Write extremely high performance triggers in C for logging changes with minimal overhead. Register C functions to be called automatically on command or on rollback of a transaction. Great for cleanup or post commit actions. You can even interact directly with postcres schools shared memory and use lightweight locks for synchronization between different

back end processes, enabling complex interprocess communication managed by the database. Honestly, the best way to learn advanced SEA extension techniques.

Speaker 2

Is let me guess.

Speaker 1

Use the source looke aha exactly. Look at the code for Postcrescol's own contrip modules. They cover a huge range of examples, from custom data types to procedural languages. It's all there, open source, ready to be studied, hashtag tag tax stealing your database with po proxy and polishing extensions.

Speaker 2

Okay, let's shift focus again. We've built these powerful functions, maybe even in c our single server is humming. But what happens when the load gets truly massive? How do you scale postcrescool horizontally?

Speaker 1

Right? So you've optimized your single server, added indexes, tune queries, maybe even us some of those c functions, but your database is still creaking under the load. Maybe CPU is pegged at eighty percent plus four hours a day. Traditional scaling options have limits buying bigger server Vertical scaling gets expensive. Fast master slave replication helps with read loads, but not write loads. Multi master replication adds complexity for really heavy workloads,

especially right heavy ones. Horizontal partitioning or scharding becomes the logical step. You split your data across multiple independent postgress wile servers partitions.

Speaker 2

How do you manage that split? Doesn't that make the application logic way more complex?

Speaker 1

It can if you're not careful that's where proxy databases come in. The application talks to a proxy layer which knows how to route queries to the correct data partitions. This keeps the partitioning logic transparent to the application itself.

Speaker 2

Okay, let's ground this. You mentioned a chat application example earlier. Yah, how did optimization play out before even thinking about charting?

Speaker 1

Yeah, the example started with simple user info and message table and basic functions user log in, send message, et cetera. Initially, fetching just seven new messages took something like seven or and sixty milliseconds pretty slow. But then just by adding the right indexes on foreign keys and message time stamps, and crucially setting the fill factor on the message table to allow for hot t.

Speaker 2

Updates, stay shot TA updates, keep.

Speaker 1

Only touple updates. It's a postgres will optimization where if an update doesn't change any index columns, postcres can sometimes update the road directly in place on a data page without needing a new index entry. It massively reduces right overhead for certain update patterns. Just those changes brought the query time down from seven to sixty meters to under three meters, and then with further tuning down to about one point nine meters, a huge improvement without adding any servers.

Always optimize first.

Speaker 2

Good point. But okay, say we do need to shard. How do we split the data?

Speaker 1

You need a partitioning key. If you have integer primary keys, a simple modul operation like user in three bitwise and d with three equivalent to modulo four for positive numbers can distribute users across four partitions. If your key is text like a username, you typically hash it first hashtext username and then apply the modular to the resulting integer hash for even distribution. And to manage the routing from the proxy to these partitions, postgrescool has a specialized tool, peel proxy.

Speaker 2

Another procedural language YEP, but this one isn't for general logic.

Speaker 1

It's specifically designed for database charding. You install like any other language, create extension pal proxy.

Speaker 2

How does it syntax work? How do you tell it where to send in the queries?

Speaker 1

It's quite declarative. Inside a peel proxy function, you use connect to specify the connection string for a remote database partition. You can group these connections into named clusters. Then the core is the r on clause you tell it which partitions to run the remote function on. This could be based on a routing function partition, part funk user, or you can specify any run on any single partition, maybe for reads that could be anywhere or all, run the

function in parallel on all partitions. Perhaps for broadcasting a command. You also specify the remote function call using select or target.

Speaker 2

Okay, it seems logical. You mentioned parallel execution with run on all.

Speaker 1

Yeah. And there's also the spli statement, which is really powerful for distributing bulk operations. Imagine a function create users that takes a raise of usernames, passwords, and emails. You can write a pl proxy function that uses SPLT to take these arrays, figure out which partition each new user belongs to based on their username hash, and then efficiently send batches of users to the correct partitions in parallel. Very slick.

Speaker 2

How does pl proxy know about the cluster members?

Speaker 1

Two main ways. The older way uses special configuration functions like plproxy dot get cluster partitions that you define to return the connection strengths. The more modern and generally recommended way is to use sql MED, the standard SQL management

of external data features. You create foreign data wrapper pull proxy, then create server for each partition and create user mapping to store credentials securely pl proxy, then queries these standard caps objects define the cluster configuration right.

Speaker 2

And migrating existing data.

Speaker 1

Usually involve setting up the schemas on the partitions, then carefully copying data row by row or chunk by chunk, using the partitioning key to determine the destination. It could be complex for live systems. It's careful planning.

Speaker 2

Okay, so we've staled out. Now let's talk about sharing all this cool stuff. We might have built custom functions, types, operators. Why package them as a postcrischool extension. What's the benefit over just shipping seql scripts?

Speaker 1

So extensions are a massive improvement over the bad old days of just providing dishquare yell files like the old contrip modules.

Speaker 2

Why were those bad?

Speaker 1

They had no concept of dependencies, no versioning, no clean installation, and crucially no upgrade path. If you install version one of a script and a version two came out that changed a function signature or a table definition, you often had to manually drop everything from version one, losing data, and then load version two. Painful yikes. And extension solves all this. It's a formal package containing all related sequl objects, functions, tables, views, types,

operators can figuration parameters. To install it with single command create extension my extension. Postgrescool tracks the installed objects as belonging to that extension and the killer feature upgrades. If you release version two point oh you provide an extension one point two point oh dot s well update script. Users just run ulter extension my extension update and postcrescool

applies the necessary changes cleanly, preserving data. Extensions can also declare dependencies on other extensions.

Speaker 2

That sounds much much better. How do you build one again?

Speaker 1

You typically use the PGXS build system with a simple makefile. It handles compiling any c code and packaging the SQL files correctly. Pseudo make install puts it where postgrescool can find.

Speaker 2

It and sharing it.

Speaker 1

You publish it to the Postgres School Extension Network PGXN PGXN dot org. There are tools like pgx and utails to help package your extension correctly. You need a metada adjacent file describing your extension, its licensed dependencies, and crucially it's version number. Using semantic versioning, this allows users to easily search for, download and install all extensions using clients like PGX and client. It really makes postgress will incredibly

extensible and inviting for the open source community. Compared to other databases, this standardized, robust extension system is well pretty revolutionary. It makes it easy for anyone to extend postgreds to do almost anything they want. Hashtag taghtag outro eerro.

Speaker 2

Wow, we've covered a ton today. We really have taken a deep dive into POSTGRESSWIL server programming. It's clear it transforms the database from just you know, storage into a really dynamic application development framework. From the basics of peelpgseql, through the power and risks of Peel, Python and see

and even scaling out horizontally with Peel proxy. The common thread is pushing logic closer to the data, and that leads to apps that are faster, definitely more secure, and easier to maintain.

Speaker 1

Absolutely, and we just sort of zoom out for a final thought. In a world that's increasingly driven by data, really understanding and using these service side capabilities isn't just another technical skill. It's more like paradigm shift, you know, a different way of thinking about where your critical logic should live, which maybe raises a question for you listening

right now. What untapped potential, what hidden power are you overlooking in your current database solutions by treating them as just simple data buckets

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android