jOOQ

Michael

00:00

Hello, hello and welcome to Postgres.FM, weekly show about all things PostgreSQL. I am Michael, founder of pgMustard. I'm joined as usual by Nikolay, founder of Postgres.AI. Hey Nikolay.

Nikolay

00:08

Hey Michael. How are you?

Michael

00:10

Nope, not falling for that. And we are joined by a wonderful guest, Lukas Eder, founder of Data Geekery and the creator of jOOQ, a very popular framework. Thank you so much for joining us, Lukas.

Lukas

00:23

Thank you for having me, Michael.

Michael

00:25

It's our pleasure. So, how do you describe jOOQ and what's it helping folks with?

Lukas

00:31

So, jOOQ is an internal domain-specific language written in Java where it models the SQL language directly as a Java API. This helps Java developers to write SQL queries more efficiently and more correctly within their Java applications. There's a code generator that generates the entire schema as Java objects and you can interact with those objects and create type safe queries. That's the base of it.

Michael

00:55

Yeah, it's super cool. But I'm super impressed with how much you focus on developer experience in general. I hear quite a few people say they really love jOOQ and they give about 15 different examples of things they love about it. But what are the main things you think that people particularly like about it?

Lukas

01:11

Well, the first thing is, and I've studied the market extensively before making jOOQ and the first thing is it really looks as much as SQL as it's possible to do when you model an internal DSL API in Java. So, of course, there's some limitations. But when you have in mind there's a group by clause, you just start typing group by where you expect it, and then it just starts compiling. So there's no surprises. There's almost no surprises in terms of SQL syntax.

01:38

So if you know SQL, then you immediately know the jOOQ API. You don't really have to learn it. I guess that's 1 of the most popular things, But other than that, it's also very nicely integrated into the Java language or Kotlin and Scala, that works as well. And if you've ever coded PL SQL in Oracle or PL/pgSQL in Postgres, You kind of like the way how the SQL language is embedded in the procedural language.

02:04

So if you loop, if you iterate over a result set in those procedural languages, you have the same kind of type safety and you have the possibility, for instance, to insert bind variables into your statements at the right spot, and you don't have to worry about the infrastructure and the logistics of connecting to the database, and it just feels embedded.

02:24

And I think that's what a lot of people really want to do, even more than having the type safety, is this feeling of embeddedness into the target language. And when you map your result sets to Java objects, that also feels very native. So it feels like the database is part of the Java of the JVM.

Michael

02:42

Yeah. Even to the point of getting like auto-complete suggestions in IDEs, but like that is super nice.

Lukas

02:50

Yeah. There are little details like this. So when you say autocomplete, there's also possibility to comment on your database tables and comments and that translates to Java doc. So whatever you comment directly in the database, it's translated to documentation inside of your job program you hardly ever have to go to the database to see what's there so.

Michael

03:10

Nice.

Lukas

03:11

Yeah.

Michael

03:12

In terms of the history how like I did look it up in or at least the GitHub history How long have you been working on it and how does that feel?

Lukas

03:20

I think the first public version started in 2009. There were prototypes before that in 2008 I think but 2009 was the first public version, not on GitHub then, it wasn't first port. I don't think GitHub was very popular at the time. And then Subversion. That's how old it is already. I had the idea already back during my university studies at EPFL in Lausanne. There was a professor who said you could model any kind of programming language as an internal DSL. I'm not sure if you use that term yet.

03:53

I think DSL was more coined by... More recently, but he said you could model any language in terms of an API if you just have Java interfaces that return interfaces that are return interfaces. And this kind of idea stuck with me for all this time, but I didn't have a use case back then. And when I started working in the industry, in 2006 was my first Java job. I did some PHP work before, but Java started in 2006.

04:21

I've seen that all the companies that I've applied to, and even the 1 that I worked at, they implemented dynamic SQL libraries because this was, this was something that everyone needs, right? So everyone has dynamic SQL queries. When you have a UI with various input boxes and they're all optional, so you have to compose your SQL queries in a dynamic way. And everyone wants to avoid string concatenation because it's not safe and it's boring.

04:46

So they did that for both, they had one for SQL and they had one for Hibernate query language, which is essentially the same thing, just more limited, but at the same concept. And no one, no one actually thought about making this a true DSL. It's well, it was always just a query builder with some weird API. So you had Java style methods where you add clauses to the query object. And it didn't feel like SQL.

05:12

It felt like a tool to solve exactly this dynamic SQL problem, but it didn't feel like you didn't enjoy using this thing. And I think JPA still has this thing with their criteria API, which is still very useful if you want to do dynamic JPQL. But I've never heard anyone enjoy using that API because it's just, you have to learn it. So you have to learn one more thing and it only serves this one purpose or at least that's how, what it feels like. It serves this purpose of dynamic JPQL.

05:43

And that's when I started prototyping it. And in the very beginning, I had to first implement the infrastructure wasn't very diesel style API either so I created the query builder that could do some select from where very simple conditions But then I started to really implement this idea because I remembered, okay, this professor, he had this idea and I'm going to try this. Is it really possible? And it turns out it was.

06:10

And it's really crazy how many SQL statements today we have in Juke that are all using this kind of approach where you just start with the same object and then you start typing and you auto complete your SQL syntax and it's all one huge graph of methods that helps you construct your SQL queries. So Juke has come a long way since 2009 I'd say.

Michael

06:34

Yeah, right. I was looking in your documentation about how many different, how many different, you call them families of SQL dialects that you support. So how many different databases, but also the different versions of different databases that may or may not support different syntax and SQL features. How do you maintain that?

Lukas

06:56

Well, first off, luckily the database products are all very, very backwards compatible, If we take one or two of them, we don't really care as much, but most of them really value backwards compatibility a lot. So supporting new versions is just looking at the new features and adding support for them, but the old stuff still works. So at least that's already covered there, more or less.

07:17

But other than that, I mean, supporting so many dialect per se is, yeah, it's a lot of work and a lot of testing, of course, a lot of automated regression testing to make sure nothing breaks. And also a Juke really values backwards compatibility a lot, just like the database products. So when you embed the joke query into your database application, you don't want stuff to break. So if I'm not allowed to break stuff, I'm probably not breaking stuff.

07:42

So this kind of simplifies maintenance as well, because stuff that once works is probably not going to break just like that. I mean there's some internal refactorings but there are no really major, how to put it, major paradigm shifts where everything works in a different way now and I was breaking everything. So with automated regression tests, it's not... It's hard, but it's not that hard, I think.

08:10

The hard part is just to find out some weird syntax that implements a standard feature on this particular dialect. And that's quite creative at times, especially with the newer dialects that have just been released recently that are not that mature yet, or Some dialects are very opinionated. So you kind of have to think in terms of why did they make those decisions and how can I map standard SQL onto this database?

08:40

But I think that's 1 of the things that people really like when they work, especially when they work with different database products. So if you have a company that works with maybe Postgres and SQL server, you don't actually have to think about the differences all the time. So you can just write substring in the jOOQ API and you don't care if it's subster or inster or whatever the database product calls it. It's just top string and jOOQ will translate it for you.

Nikolay

09:07

But there are differences as well, right? And JSON is a big topic, right?

Lukas

09:12

Like huge. JSON?

Nikolay

09:14

Yeah. Yeah. Those functions and yeah.

Lukas

09:17

It's crazy. I mean, Postgres was 1 of the first to actually support JSON, but it didn't actually

Nikolay

09:22

before standardizing

Lukas

09:25

stuff. So someone has standardized it, Oracle did a couple of years ago, and now Postgres started implementing those standards from what I've seen. I've still not engaged with that part yet, but you think now that there's a standard, things have stabilized, but still every Database product does it differently. It's crazy. The devil is in the details.

Nikolay

09:46

Yeah. And have you, like, do you need sometimes to check the standard or like you don't need it at all?

Lukas

09:52

I do it

Nikolay

09:53

all the time.

Lukas

09:53

Yeah. Okay. I'm trying to, the Juke API, if it's not about a vendor-specific feature, the Juke API really tries to follow the SQL standard in terms of syntax. So a vendor-specific feature would be Postgres on conflict clause, which I think was a mistake. They should have done merge from the beginning and not invent something new, but now we have both. And in that case, Juke also models the on-conflict part like Postgres did, and then SQLite copied it and a couple of others, I think.

10:23

But for instance, merge is a standard SQL feature and many database products have some extensions which I then study, but I always first look at the standard. What does the standard do? Because that's a thing that's going to still be there in 50 years. And different implementations might have their quirks, which sometimes I hope they deprecate eventually and move towards the standard. Because usually I think the standard is quite nicely...

Nikolay

10:49

Well, they behave differently, so many people got used to on conflict.

Lukas

10:56

Yeah, I get the point of doing on conflict. I mean, it's simpler for 80% of the use cases, obviously.

Nikolay

11:03

Yeah, also merge only recently received support of returning clause, which is not standard at all, right? As I remember.

Lukas

11:11

The SQL standard has the table, the data changed out of the table, which was implemented by DB2. And then some minor database products like H2 implemented as well. And it's not as powerful as returning, especially because it only allows you to fetch either the data before or after the updates for update statements, which is also part of merge. You maybe want to return both versions to the version before and after the update. With returning in principle, that's, that's a feasible.

11:44

So Oracle implemented this recently. I'm not in follow. Did Postgres allow for accessing both versions of the row?

Nikolay

11:54

I don't remember as well. I'm only like returning, well, if you DELETE, definitely you have ability to return old thing you deleted.

Lukas

12:03

Well, there's only the old thing. Old thing is deleted.

Nikolay

12:06

Let me quickly check.

Lukas

12:07

Why not? But with updates, I mean, SQL Server always supported both versions. So before and after. And DB2 with the standard syntax supports only, you have to choose which one, but you can't return both. And Oracle 23 AI now supports both as well, like SQL Server. Oracle uses the Postgres syntax, the non-standard syntax also with the returning keyword, so maybe you can start from there if you don't do it yet in Postgres.

Nikolay

12:36

I think in Postgres for updates we cannot return old data. We have access to it in triggers but it's different, right? So only the new value can be seen there, but for deletes, it's

Lukas

12:53

probably more useful.

Nikolay

12:54

Well, I can imagine some cases when we want to report, for example, what happened.

Lukas

13:00

Yeah. When you do auditing, you want to have both.

Nikolay

13:03

Yeah. Maybe let's add it. Because

Lukas

13:04

standard supports this data change delta table, and it accepts a merge statement as well. So in principle, in the standard, you could have returning with a different.

Nikolay

13:15

But it feels like more like a heavier approach in terms of manipulation. Returning is just like one line, right? Return a star, that's it.

Lukas

13:24

Yes, yes. Super easy. It seems easier, yeah. Also, there are tons of limitations. I mean, you put your data change delta table inside of a SELECT query, but then you can hardly do anything with that SELECT query for instance, I'm not sure what exactly what is forbidden, but unions I think are forbidden and joins are forbidden, so maybe even aggregations. I'm not sure anymore, but you're very limited with what you can do.

13:50

So I'm not sure if there's any benefit with allowing that in the SELECT statement as opposed to just returning. But ultimately it's kind of the same thing as what Postgres does when you put the ML statements in, in with. So that that kind of behaves the same way, at least from a user perspective.

Nikolay

14:07

Speaking of with, I know Juke supports CTEs. Does it support recursive CTEs? Yeah, yeah. Yeah. So everything, lateral join and so on, everything. Yeah. Cool.

Lukas

14:19

There's a high chance if SQL supports something and it's not really very funky like the match_recognize clause, then Juke will support it as well. So match_recognize could be supported as well in Juke, but it's such an edge case and still only supported in Oracle and some very esoteric database products that Juke doesn't support yet.

14:38

So, so I'm skipping this for now, but with is, is everyone uses with, so Juke supported as well, including recursive with, I mean, there's some table valued functions that are like generate series in Postgres, which have to be emulated elsewhere. So if you want to ever, probably hardly anyone does that migrate from Postgres to whatever, and you want to translate your old generate series queries, then Juke will translate that to a recursive with.

Michael

15:06

Cool. Nice. Nice. In fact, I feel like you skipped over that a little bit, but that's such a cool feature that basically... Translation? Yeah, exactly. We can get access to standard features that Postgres hasn't implemented, or that whatever database we're using hasn't implemented yet because you do the transformation, I guess. How do you, translation did you say?

Lukas

15:32

Yeah, I call it translation. So there's even, you can use Juke as a purely translate product. So some customers actually do that. Mostly when they migrate from Oracle to Postgres, you can just embed Juke as a JDBC driver into, into your Java application. And you don't actually have to use Juke directly.

15:48

So let's say this is a legacy application and uses JDBC or something JDBC based like Hibernate or MyBatis, and you just put Juke in the middle as a translating layer and probably 80% of your queries will work directly on Postgres. Of course, there's always edge cases and you have to do testing, but at least your migration project will go much, much faster because you can directly use Oracle queries on those other database products.

16:13

Like for instance, Oracle has connect by and that can be translated to with queries with recursive. That's very hard. So some cases don't work, but maybe many cases do. And you don't have to actually look at those anymore. Or Oracle special outer join syntax can be translated to left joins in, in, in standard SQL. So your migration just goes much faster.

16:37

You can also use it not as an embedded library, but just as the web, the website is free or you use it locally as a CLI tool, which is to have files, you have input files and Juke translates those directly. And you can manually check if it's correct. So translation is a, is a, I'd say minor use case. So a lot of companies work with only 1 database product and maybe 10 years, they, they might reconsider their choices and migrate.

17:03

But even then most, most companies stick with the database product they started with. And so many Juke users don't actually use this feature, but some do.

Nikolay

17:14

May I ask slightly off topic question about JDBC since you spent a lot of time with Java, obviously. There is such thing as extra float digits. Right?

Lukas

17:24

Extra what?

Nikolay

17:25

Extra float digits. Okay. There is such setting, and by default it's 0 in Postgres. Or 1. Yeah, no, 1. It's 1.

Lukas

17:34

I've never heard of it.

Nikolay

17:36

Okay, but yeah, I remember some bugs in my code, and I usually try to use pure SQL and psql, and you know, if I can. I Remember that Java developers implemented the very same thing, same queries, behavior was very different. And then I noticed that the DBeaver, who else, other Java-based IDEs also behaved the same way Java application did. It was not the same as I saw in psql. It happened with these extra float digits. If you didn't notice this, this is maybe some very edge corner case.

18:17

But I wonder like, why did the JDBC keep different default is like no layers of some defaults. This can be very annoying. But okay.

Lukas

18:28

Well, the main thing that you always see The main difference when you work with JDBC is, as opposed to psql is, in psql you usually don't work with bind variables, so you don't have the whole kind of effect that you get when you have bind variables, although in Postgres This might not be that big of a difference, but when you work with JDBC, you always have prepared statements and bind variables. And let's say my background is mostly Oracle.

18:57

So with bind variables, you have an execution plan cache and the cache will store some assumptions about your bind variables, like a normalized distribution and expectation regarding cardinalities, et cetera. So your actual bind variable might be, if your dataset is skewed and your actual bind variable may be completely different from what Oracle expects. And, and then the execution plan will be bad. But from what I take, Postgres doesn't have this problem because it doesn't have this feature.

19:24

So this would be 1 of the main differences. But you were referring to some logical error, not...

Nikolay

19:31

Yeah, Default and values returned are different from what I see in PC. Yeah, but

Lukas

19:36

I've never heard of this.

Nikolay

19:38

Well, okay, maybe it's already fixed by the way. It was like 5 years ago or so I saw it. So, yeah. Well, on the topic of JDBC,

Michael

19:47

I mean, do you have any issues with it? It seems like it's a remarkable piece of software that's just chugging along, but equally it doesn't get much love. It doesn't get much love. Like I only, I generally only hear negative things about it, but it must be amazing given how much it's used.

Lukas

20:03

I think it's the best API that Java has, even better than collections and everything else. It's really, I mean, okay, maybe they stole it from Microsoft from ODBC, it's kind of the same thing, but It's really the best thing that Java has because it's such a good standard. I mean, everything can build on top of it and you don't even have to write SQL queries.

20:23

So there's, for instance, Neo4j, which is a NoSQL database with their own query language and you can use JDBC with that database as well as long as you wanna have some tabular results And it's a very, very good abstraction of the network protocol layer. And the reason why people hate it, it's not targeted at developers. So you can use it directly if you want, But it's a bit hard to use. You have a lot of objects.

20:48

You have the connection object and you have to create a statement object and you have to remember to close it. And you have to remember to create a result set object and close that as well. And it's not integrated with the rest of the JDK libraries. Like for instance, when you iterate the result set, you have your own API methods instead of just a list or iterable. So you can't use the for each loop on a result set, for instance. So the ergonomics of JDBC is not, not up to date.

21:13

I once tried to convince the spec expert group, it's a mailing list to, to update their API. There's a couple of things that I think in, in at the time it was Java 8, but even now you could improve a couple of things without changing anything fundamental in JDBC. For instance, you can skip the statement part. A lot of times you don't actually have to think about preparing the statement explicitly. You have to do it when you want to reuse the prepared statement.

21:40

So you can save some time in case there's a resource or some cache connection directly. But a lot of times that's not the case in some database products and a lot of times it doesn't even matter at all. So you could just have a connection and run the query. So that's a minor improvement that would just reduce 1 step for each statement you write. Yeah, but Oracle couldn't allocate any resources to such a project, even if the people on the list agreed that many ideas were reasonable.

22:12

So I guess that's it. It's an amazing piece of software for integration products like, like Juke or Hibernate or MyBatis and everything else to build on top of and all the drivers they just work. So a database company can just publish a driver and it will work in all kind of software. For instance, if you, if you work with DBeaver, it just works with all the database products, right?

22:33

So you don't have to think about how do you, how do I connect to this new database thing, like the cows or, or DuckDB or whatever, it just works. All these database products, they just work because they all use the same API in Java. So it's really a wonderful piece of technology, I think. So I totally agree.

Nikolay

22:51

Yeah, this is exactly where I saw this problem, comparing to, I mean in DBeaver, comparing to Java. And I realized, okay, it's also Java application, so yeah. Good.

Lukas

23:04

Yeah, but that was probably just the driver bug. I don't think it has to do with the database as an API. I mean, obviously the drivers can have bugs.

Nikolay

23:13

Right, right. So You see bugs often, right? I mean, in this...

Lukas

23:18

Yeah, not necessarily in the drivers. I mean, the drivers, they really solve a very basic problem. So it's mostly about network connections and these kind of things that no 1 actually cares about, unless you have to optimize something, like multiplexing or whatever. But I think this stuff kind of just works. And when I find bugs, it's more SQL related. So it's inside of the database and I do find a ton of them very rarely in Postgres.

23:45

I have to say, I think in my career I found 2 or 3 bugs in Postgres only, as opposed to hundreds.

Michael

23:54

We didn't pay you to say that.

Lukas

23:56

No. It's really very good. It's surprisingly good. I mean, also, absolutely no regressions at all. So if there's a bug, it's an edge case that no one thought about, then it's really a very weird kind of thing.

Nikolay

24:10

Yeah. By the way, how do you check regression? Just conversion from...

Lukas

24:15

In the database products?

Nikolay

24:17

Yeah. For Juke, for example, some code is written and then you just check the conversion to SQL or you check results as well.

Lukas

24:26

No, you can't just search the SQL statement. I mean, Juke's integration tasks have a standard database, which has about 5 tables. It's very simple. So there's a bookstore with books and authors, and it's not a big database. So it has 4 books and 2 authors, 4 languages, this kind of thing. But then I just know there are 4 books and they have these titles and every kind of query has to return exactly the same result.

24:54

So there are so many queries being run in each database product, maybe about 10, 000 queries, I think, and also update statements and all kinds of statements that make the assumption that if you have this kind of input database, then this must be the output. Or also, if I create a table with Juke, then it has to have these properties and throughout the API, it has to be the same thing for every database product.

Nikolay

25:20

Right. I just wonder, earlier you mentioned that usually database systems maintain reverse compatibility, but sometimes it's broken. For example, in Postgres 12, CTEs, before they were materialized, like optimization fans. So every step is materialized before 12, but in 12 default behavior changed. Right. So, so is it something that Juke should care about or no?

Lukas

25:46

I don't think that's a logical regression. I mean, it was a conscious decision and it's only affects the performance. There's no logical difference of the result. So I mean, databases are allowed to do these kinds of things. I re I recall there was once a logical change in the UPDATE statement in Postgres, which was a incompatible change when you UPDATE a row.

26:12

So you had used the row syntax, you use parentheses And you UPDATE 2 Columns at once and suddenly both required the row keyword, but only there. This was a very weird.

Nikolay

26:23

1 more recently, 1 more change related to subqueries. I don't remember exactly. It was a demonstration included in the generate series, I remember. And yeah, maybe it was like up to 5 years ago, but there was some change which was logical as well. But I understand, you're like SQL standard, don't care about Indexes, care only about results. Yeah. Right, I see, I see.

Lukas

26:49

No, I mean, a juke doesn't make any guarantees at all with respect to performance either. So, so from a Juke perspective, Juke doesn't try to fix your SQL. If you write a really crap query with, I mean, crap, what does it even mean? So if you write a bad query with a lot of nested subqueries, et cetera, et cetera, in principle, there's nothing wrong with it. Right?

27:07

So there's, you will find hundreds of blog posts that, especially from, from old times when optimizers were really bad still, where people advised against using derived tables, then they advised against using correlated subqueries, and then, but there's nothing logically wrong with these things, right? So you can nest as many levels as you want, as long as it's correct, it's correct, and the optimizer should figure it out. And Who is Juke to judge you for your query style?

27:34

So Juke doesn't fix these things or even assert anything, and it would be completely wrong for Juke to do these kind of things, because you want to write exactly this SQL, and Juke should render exactly this SQL. You have your reasons, right? So...

Nikolay

27:47

I understand. And my question is, I'm curious, how do people usually approach performance? For example, if some query is written using Juke with a chain of SELECTs from blah blah, And then it turns out it's slow. And then we find the fix, and we need to adjust something in the query if it was a rock SQL. Is it usually simple to adjust a Juke version of.

Lukas

28:17

Yeah. I mean, there's no difference. I mean, you're probably going to use the, I'm not, I'm not sure anymore what the Postgres version is called, but you have your statistics views in your database, your performance schema, and you query those to see what's, what went wrong and then you make assumptions, you rewrite the query and you test again.

28:36

Maybe even using a benchmark, of course then the benchmark is written in Java, but I don't see a difference when you write jOOQ query or native SQL query, It's the same process.

Michael

28:46

I was watching a really good video by Kevin Davin, who did a talk on jOOQ, but 1 of the things that he really liked about the product was it logs out to the console, to the actual SQL statement as well that it ran.

Lukas

29:00

When you do a debug level.

Michael

29:02

Yeah, which is super helpful for performance. So if that was slow, you can then run it in your favorite editor in PSQL wherever with EXPLAIN, ANALYZE, et cetera, et cetera. BUFFERS. Yeah, always BUFFERS, hot topic this week. But yeah, so that seems like a really developer friendly way of getting back to the SQL and then diagnosing the performance issue.

Lukas

29:24

This was a feature from day 1. My assumption was that when you develop, you have debug mode logging. And when you go to production, you switch that to info or even warning. So you're not going to have the performance penalty of re-rendering the query. It's even formatted, so you can actually see the query. It's not just 1 line, which is when you execute the query, it's just 1 line of SQL string, so to have it more compact. But when you debug log it, the version contains the bind variables.

29:51

So you can just copy paste the whole query with the bind variables in a formatted way and study it. I thought that was what everyone wanted. And Also you get the first 5 records of the result set also in the debug log, because that's probably also what you want while you develop at least.

Michael

30:07

Yeah. In a pretty little table.

Nikolay

30:09

It's nice.

Lukas

30:10

So you can immediately see what's going on. And even then, I mean, 1 of the things, 1 of the reasons why I implemented it this way is when you have a huge result set with a lot of columns, that bothers you while developing. So you kind of start thinking, do I really actually need all these columns? Because if you don't, then you have a better debug log. So I kind of think you have to punish developers as early as possible for their performance problems. I mean, jOOQ doesn't judge you.

30:37

It just gives you some tools.

Nikolay

30:41

Just giving you some convenience looking at logs. Okay. Yeah, that's interesting. So, reduce number of columns. Good.

Michael

30:51

Changing tact a little bit, I think you've got a really interesting perspective on Postgres from a broader landscape of SQL dialects and databases in general. What kind of things do you see that are relatively unique to Postgres? Or do you see people in the Postgres world particularly using or liking?

Lukas

31:09

I mean, the 1 thing that is very unique to Postgres, there are 2 things from my perspective. The first 1 is it's very developer-centric. So you can see that with other database products like Oracle, DB2 or SQL Server, they're very production centric, very operation centric. They used to base their entire sales on these kinds of things. So this means you have a lot of production tools as well, which are great actually.

31:32

So I'm still waiting for the like of Oracle Enterprise Manager, if you know it, for Postgres where you can analyze production workloads and, and, and query what, what kind of workload you had 5 months ago, and it's still there. It is still happy. I mean, you have to pay for these kinds of extras, but it's still there. You can still analyze anomalies in terms of performance, what happened 5 months ago.

31:53

So these kinds of tools are a bit lacking, but on the other hand, because it's so developer focused, you have this whole extension system, which is unprecedented. And I haven't seen it since either. So anyone can extend anything in any layer of the, of the database product.

32:11

So you have these wonderful extensions like PostGIS for instance, which is, It's not really an extension because it's also part of the SQL standard, but it has been implemented as an extension just, I guess, to show what is possible and such a huge additional product can be made as an extension. And you have indexes and data types and all kinds of things that other vendors have to either offer out of the box or they don't have it. You can't extend Oracle really.

32:38

So that's really something very, very amazing. But as I said, from the other perspective, from an operations perspective, this is something that personally, I think Postgres is still very much behind Oracle, for instance. So I'm an Oracle guy. I can't really comment on SQL Server or others, but the Oracle database where I worked at before I made jOOQ, it could easily handle very, very complex queries that produced 500 lines of execution plan on billions of rows, and it would run in milliseconds.

33:09

And I don't think you can do that with Postgres. I wouldn't risk it. So we all remember the days when there was a hard limit of number of joins in Postgres, from which, starting from where, you didn't have any smart optimizer anymore to reorder the joins in the right order. I forgot what the limit was, but I think you had to be very careful if you have more than 10 joins or something like that, because then it would just join from left to right syntactically, and this is horrible.

Nikolay

33:37

Join collapse limit should be 8.

Lukas

33:39

Yeah, that's the join collapse limit. So, once you know these kind of things, obviously, make the right choices, and you probably won't run into big issues, but it's just, it's a very unique focus that I often missed in the past from Oracle, for instance, or completely ignored developer experience for decades. It was like, you don't even have a Boolean type, right? So this is hard for me to understand. It's such a simple thing. You think it's such a simple thing.

34:06

So very, very useful, but Oracle wouldn't have implemented it until recently. So this is a focus that Postgres has and few of the others had, even MySQL wasn't very developer friendly from that perspective. So this is very unique. If developers were the only ones to choose the database product, it's always Postgres, because I mean, you're just gonna implement everything with Postgres. You can write stored procedures, you can have your own indexes, you can extend it yourself.

34:36

Probably you don't have to because someone already wrote an extension for whatever you need. Oh, the second thing is, it's amazing how standards compliant it is. So a lot of times, Postgres really waits until something is standardized before they implement it instead of innovating.

Nikolay

34:51

Right now we are waiting on UID version 7 because of standards. Like it didn't get into version 17 of Postgres because RFC was not finalized. Every library already started to support it, but Postgres decided to wait on RFC, which is like very conservative decision, right?

Lukas

35:09

Most of the times. Yes. So I think that's a good thing in the end, because the standard, in my opinion, in most parts is well-written. It's a very complicated document, of course, but I think in terms of syntax, it's kind of everything feels like SQL. So some database products think they have to invent some syntax And it just doesn't look like SQL anymore. It's just very weird, funky thing that doesn't fit the rest of the language.

35:37

And the standard is very, very consistent, I think, in most parts. And to wait for the standard to appear is a good thing. So sometimes there's an exception to the rule, as I said, on conflict was 1 of these exceptions. JSON as well, in case of which I think it was something everyone needed and the standard was severely lacking. And Postgres just did their own thing. So I kind of think a lot of people kind of regret the heterogeneity of the JSON API.

36:06

So you have to always consult the docs to see if something uses a JSON path, which is a standard thing in the JSON world, or if it's just an array of path elements, or if this function returns a table or a set of something, or whatever data structure. So I think now with the JSON standard, this is going to be much cleaner once it's completely rolled out. So I'm not up to date where the Postgres stands.

36:34

I'm still not supporting that syntax, but from what I've seen, it's going into the right direction.

Nikolay

36:39

Yeah, some parts support it, some parts I think are still bending. Yeah. Yeah, because I

Michael

36:44

think A lot went in in 17.

Nikolay

36:47

Right.

Lukas

36:48

Yeah. And speaking of this, I'm going to be really curious. I saw a commit recently regarding temporal primary keys. Is that correct? And it is absolutely not the standard. So this is very curious. I'm very curious about this. Are you aware of this commit?

Michael

37:03

I'm not.

Lukas

37:04

It's still on a branch. I think it's not, it's not going to be merged, but I mean, Postgres has these range types. Right.

Nikolay

37:11

Right.

Lukas

37:11

And they're useful obviously, but the SQL standard since 2011 defined how you should write temporal databases, both by temporal. So you have system versioning and then logical versioning. And this is something that is still lacking in Postgres. And I think a lot of people would like to see that.

Nikolay

37:29

Yeah. In Postgres, there is an opinion that it's in the past, because it was

Lukas

37:34

in the past,

Nikolay

37:34

it was removed.

Michael

37:36

So that's a nice temporal joke.

Lukas

37:39

An extension I think?

Michael

37:41

It was in Postgres, the original Berkeley version.

Nikolay

37:46

This is how MVCC inherited from these versions of tuples. So your time travel basically was supported very, very long ago, but then it was removed. So like interesting idea.

Michael

37:58

I think for performance reasons.

Nikolay

38:01

Yeah, So I think so. But yeah, this is a super interesting, like specific topic. Sometimes we do need to implement something with triggers and so on, like, which is very

Lukas

38:11

Yeah. Yeah. It's very hard to do manually. I think the standard helps a lot with, with, I mean, you can write an update statement and it's automatically transformed into a delete and insert. So if you have to split the record logically, this is really a pain to do manually, right? So it's

Nikolay

38:27

not super pain because you can have, for example, Postgres, you know, like I remember your very old post, everything is a table. Or maybe it was a post of some 10 tips or something, and 1 of the tips was, consider everything as a table. In Postgres, you can collapse all columns into a single record type and insert. For example, in JSON form right now, I would insert it and I would have a shadow table. I implemented this several times.

38:52

You have a shadow table which serves as like a guy for all tables, which has, has trigger to track changes. Right. And then you, you just track timestamp action, like it was delete or update, who did it, and then whole row you pack it in a single column, breaking first normal form.

Lukas

39:10

But then you normalize in a very bad way, and secondly, this sounds like a lot of work.

Nikolay

39:16

It's not a lot of work, it's like 3, 5 lines of code, that's it. But it's not possible in jOOQ 3 version because triggers are not supported, right? Triggers are supported only in Enterprise version.

Lukas

39:28

I mean, if you want to write the trigger with jOOQ, yes, then you have to use the commercial additions, but hardly anyone writes triggers with jOOQ. I mean, this is only useful when you have to have, when you support 5 database products and you want to have, you want to write the trigger only once you write it with jOOQ. But I mean, jOOQ doesn't care if there's a trigger on the database table, right? So if there's 1, so it's transparent to jOOQ. Yeah, okay.

39:56

You can't really say jOOQ doesn't support them. It's just, you can't create them with jOOQ.

Nikolay

40:01

Great answer. Free version, you mean like an

Lukas

40:03

enterprise version? Free version, yeah.

Nikolay

40:04

Yeah. I'm very curious. I'm not against this. It's good. I mean, it's great that you build both like open source product on Apache tool license and business. I was just curious, like, have you ever had thoughts about going full closed source or fully open source? Well,

Lukas

40:26

it was fully open source in the beginning. So when I started this, I had no idea this could be a business. I mean, I just had to try it and then put it out there for people to see and give me feedback. Is this a good idea? Should I do it like that? And there was a very active early community of people who worked with the all open source jOOQ. And then I started seeing even more and more banks and insurance companies using it, but never contributing as you know, the story.

40:51

And I thought it was my fault.

Nikolay

40:53

I get it. Pong Jones, let's, let's generalize Oracle users.

Lukas

40:57

Well, that was the choice I made, but, but it doesn't matter. You know, It doesn't matter what they're using in the end. I just never got any... I mean, I got contributions, but I didn't care about code. I cared about money much more than code. So, to make them a sustainable business, in the end, I thought... There are models where you pay for support, but in my opinion, if you want to make money with support, then you have to earn your money by working for it.

41:25

So I earn money when I work, and if I have a licensed model, then I earn money when customers work, right? So I thought this was going to scale much better.

Michael

41:33

I like that a lot. I also think if you charge for support, you're incentivized to hurt the experience.

Lukas

41:40

And I think, but you're going to say that, but that's how I think.

Michael

41:43

But I think, I think it's not surprising that as somebody that focuses so much on developer experience and wants to keep pushing that forward and making that good, it's unsurprising that you don't want to be incentivized to make it bad like that. It makes perfect sense to me.

Lukas

41:59

Yeah. I mean, that's a, that's my thought as well. I, I don't want to name companies or names, but I've heard some open source products are very, very unusable and you really need the support and not all of them. Of course. I mean, Postgres and Linux are examples, which aren't, which this isn't the case, but there are some products that are super complicated and they're just pushed out there and you're free to use it, but you won't figure it out.

42:25

So you have to call someone even to install it and that's what I'm thinking. So in those cases, you need a sales representative that pushes the product down someone's throat and they make a lot of money with the support contract. And I think it's less honest than licensing, which people think about, okay, is this a good product?

42:45

Should I really spend money on it because I have to pay it up front right from the day 1 I have to pay money to use this product and they're gonna test it really really really well and there's a lot of competition so I mean people could use hibernate or whatever but people choose juke despite the cost which in the end is very cheap for a company.

43:04

So I think cost is never the issue, but, they're still going to evaluate it because they have to still go through purchasing and make it a business case for, for, for the purchase. So I really want to have a very, very good experience. And 1 part of this is I also always answered all the questions on Stack Overflow within a short period of time, because if I answer it, it's gonna be authoritative and I'm gonna answer with all the details and all the caveats and all the edge cases.

43:34

So anyone else who finds the question, we'll find this question and the answer. And they're not going to ask me again. Right. So Google is my friend and now Chat GPT is my friend as well, because Chat GPT can answer jOOQ questions very well because of all the answers I've given on Stack Overflow.

Nikolay

43:49

Yeah, but open knowledge, right?

Lukas

43:51

Yeah. I don't want to earn money with knowledge. I mean, I could sell books and all these kind of things, but it's just the wrong kind of effort, I think.

Nikolay

44:00

Destruction, right? Yeah. From the core. I wonder for enterprise, like paid versions, is source available or it's like kind of

Lukas

44:12

yeah you get the source code

Nikolay

44:13

well that's great

Lukas

44:14

the right to modify so you can if you have a bargain and

Nikolay

44:17

you can fix it yourself and propose a fix.

Lukas

44:20

Or maybe it's not even a bug, but you think it's a bug, but you disagree, you can fix it. And some companies do that. So this

Nikolay

44:28

is what we like in commercial software, like full transparency and see how it works

Lukas

44:34

inside. I mean, most people don't actually fix stuff, but maybe they want to debug stuff to better understand it. And if you've ever worked with a commercial JDBC driver, like I do, with the Oracle driver, it's always like, Oh my God, there's a null pointer exception in the middle of the driver and I have no idea what I'm doing wrong from an API perspective. So I have to reverse engineer it.

44:55

Not, not with the code, but just try an error with API calls until I find out why the exception, why the bug inside of the driver happens. And with source code, it would be much easier, right? So bugs happen, but if you know exactly why the bug happens, then you have a much easier life than if you have to guess.

Nikolay

45:13

Yeah. Last question from me. What do you think about nulls in general?

Lukas

45:18

Nulls? Nulls, yeah. Or they're a blessing in a curse. I mean, I understand why they have been introduced, because it was easy to do, right? And I mean, There are 2 takes here. So nulls in most languages are mostly a pain because they lead to exceptions. And null in SQL is mostly a pain because it doesn't lead to exceptions. It's just a value like many others, but it's a special value and it just propagates. It's these kind of philosophies and both are right and wrong.

45:51

But what else would you want to do, right? I mean, you kind of need an absent value, right? I'm not sure if the idea of the unknown value is so useful, but an absent value is very useful. I mean, sometimes you just, you can't normalize everything to 6 normal form to avoid NULLs, right? So it's very pragmatic.

Nikolay

46:12

It's very pragmatic indeed. Yeah. But in SQL, we have so many places where people make mistakes all the time.

Lukas

46:20

Yeah, SQL is weird about it. I mean, this unknown value, I used to do a SQL training and it was about 30 minutes about it was just NULLs and how it's different here from there. For instance, 2 NULLs are not distinct, right?

Nikolay

46:34

Yes.

Lukas

46:35

But they're not the same either. So there are different terms in the SQL standard, same and distinctness. And then even operators are inconsistent. So personally, I think Oracle did the pragmatic thing when they concatenate a NULL value to a string, you probably don't want the result to be NULL, I think. You would just want to ignore the concatenation. This is pragmatic, but it's very unlogical and leads to a ton of side effects, which probably means that it was a bad idea to start with.

Nikolay

47:05

And there are NULLs in other languages, I mean, in regular application languages. And I remember how I struggled trying to explain this 3 value logic and like, observing what people do in code, in regular application code with NULLs. They do things I would never do like in SQL, right? So different philosophies, right?

Lukas

47:25

Yeah, there are different philosophies. But I'm not sure if the SQL philosophy is really useful. I mean, I'm trying to explain how it works by explaining NOT IN in the NOT IN predicate. When you have a NULL in NOT IN, everything falls apart. I mean, it's very logical and consistent. You can explain it with 3 value logic, but it's never useful, right? This kind of invalidates the whole predicate.

Nikolay

47:48

Yeah, it's a bomb.

Lukas

47:50

It's a bomb? And I mean, usually people don't put a NULL value in a NOT IN list, but what about a subquery? So NOT IN SELECT something, and then you have a NULL value in there, and you don't get results. And it's not obvious from just reading the code or the query. there, and you don't get results. Why? It's horrible. Horrible. We

Nikolay

48:05

had the whole episode about no, so if Michael long ago, yeah, yeah, let's stop because definitely it's not, it's not going to go away. And I, I'm quite sure I will make more mistakes with nulls in my life.

Lukas

48:19

But maybe the 1 thing I really think SQL should have done differently, and there are some database products that actually do it differently, is the default data type should be not null, right? So when you create a table, When you don't write anything at all, then it should be not null. The fact that the default is nullable is wrong.

Nikolay

48:36

This forces you to define default. Or, well, no.

Michael

48:42

The default word is confusing. You mean, if I create id int8, I shouldn't have to say not null.

Lukas

48:51

Yes, you should have to say I want this to be nullable, because then it's a conscious decision and they know what you're doing. But the fact that maybe by accident you have something not null, something nullable. Yeah. It's just wrong.

Nikolay

49:04

And this isn't standard, right? Or if you remember?

Lukas

49:09

I don't remember. I think maybe the standard says it's implementation specific.

Nikolay

49:13

If it's a standard, We definitely

Michael

49:15

can't change it now though.

Lukas

49:18

No, you can't. But I remember Sybase is 1 of the database parts that does it differently and ClickHouse as well. Interesting. Another 1 I forgot.

Nikolay

49:30

So in ClickHouse every column is by default not null? Yes. That's cool. Okay.

Lukas

49:35

Although they messed it up as well. So you can still insert null in a not null column. You just get a 0. What? It makes it worse.

Nikolay

49:43

Okay. Okay.

Lukas

49:44

That's terrible. Amazing. I think this is a performance idea. You actually, if you just ignore the check, you kind of don't check for NULLs, then you're faster, but whatever.

Michael

49:58

Lukas, I'm super conscious of your time. Is there any last things you wanted to say?

Lukas

50:03

I hope there are many many more exciting Postgres releases in the near future. I'm looking forward to all the good stuff that's coming. Especially if it's ever coming, temporal tables from SQL 2011, that will be very exciting.

Michael

50:16

Nice, okay cool.

Nikolay

50:16

Thank you for coming, I enjoyed.

Lukas

50:16

Thank you for having me.

Nikolay

50:16

Thank you.

Michael

50:16

Yeah, very much so, and catch you next week Nikolay.

Nikolay

50:16

Bye bye, have a great week.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript