Persistence in PHP with the Doctrine ORM

Speaker 1

00:00

Okay, let's unpack this. Imagine writing like ten thousand lines of pristine, beautifully architected, object oriented PHP code.

Speaker 2

00:09

Oh that's the dream, right right.

Speaker 1

00:11

You have your classes, your properties, your inheritance trees, all your objects are just neatly interacting with one another.

Speaker 2

00:16

Yeah, a perfect domain model exactly.

Speaker 1

00:19

And then after all that elegant design, you have to just, you know, smash that entire architecture into a flat, rigid, two dimensional grid of database tables.

Speaker 2

00:29

It really is the ultimate architectural heartbreak. I mean it's a huge problem.

Speaker 1

00:33

It totally is because your PHP application it speaks this rich language of objects, but your database it only understands tables, columns, rows, and foreign keys.

Speaker 2

00:44

Right. They are two completely different paradigms, and.

Speaker 1

00:47

Bridging that gap manually usually means writing just endless, repetitive seqle queries just to save affect your data. It completely bogs down your whole workflow, it does.

Speaker 2

00:56

I mean, that fundamental disconnect is honestly one of the biggest hurdles in opp location development. Oh. Absolutely, You spend all this energy building a domain model that represents your business logic, only to have to write dozens of mapping.

Speaker 1

01:08

Functions have to tear those objects.

Speaker 2

01:10

Apart, exactly, you tear them apart just to fit them into a relational database structure. It completely disrupts your focus.

Speaker 1

01:17

You stop thinking about how your app should behave and you start agonizing over like how to keep your memory state synchronized with a hard drive.

Speaker 2

01:25

Which is exactly what we want to avoid.

Speaker 1

01:27

So for today's deep dive, we're looking at the solution to that disconnect. We are talking about the doctrine ORM or object relational mapper, a.

Speaker 2

01:36

Life saver for PHP developers, truly.

Speaker 1

01:40

We're going to decode how doctrine acts as the ultimate highly skilled UN translator, working flawlessly between those two entirely different languages.

Speaker 2

01:49

Like that analogy a UN translator, right, And we'll.

Speaker 1

01:52

Use a standard blog engine as our guiding example today to see how it takes all that heavy lifting off your plate.

Speaker 2

01:57

Sounds perfect.

Speaker 1

01:58

So to understand how doc doctrine pulls this off, we have to start with the brain of the operation, which is the entity manager.

Speaker 2

02:04

Yeah, the core of doctrine.

Speaker 1

02:06

Now, in doctrine, the actual data objects like your blog posts or user profiles. Those are called entities. But these entities, they're just plain old PHP objects.

Speaker 2

02:18

Right, They don't extend some massive database library or anything.

Speaker 1

02:21

No, they have absolutely no idea that a database even exists.

Speaker 2

02:25

And you know that is very much by design. It relies on a specific architectural pattern called the data mapper.

Speaker 1

02:31

Okay, data mapper, How is that different from what other frameworks do?

Speaker 2

02:35

Well? In some other frameworks, you might see the active.

Speaker 1

02:37

Record pattern, oh right, where the object itself has a save.

Speaker 2

02:41

Mefit exactly, it talks directly to the database, but doctrine deliberately separates those concerns.

Speaker 1

02:46

So the entity's only job is to just hold your data.

Speaker 2

02:48

Yes, and the entity manager is a completely separate service that handles the really complex job of mapping that data and saving it.

Speaker 1

02:56

Using something called the unit of work.

Speaker 2

02:57

Right, Yes, the unit of work. It's a crucial concept here.

Speaker 1

02:59

The unit of work concept is honestly brilliant. I like to think of the entity manager as a highly experienced waiter taking your order at a really busy restaurant.

Speaker 2

03:08

Okay, walk me through that.

Speaker 1

03:10

Well, the application you're making changes. You tell the waiter you want a steak, then you change your mind to chicken. Then you add a side of fries that maybe cancel a drink order. That waiter doesn't sprint back to the kitchen, which is our database in this case, every single time you open your mouth.

Speaker 2

03:28

Now, if the waiter did that, the kitchen would be completely overwhelmed exactly, which is exactly what happens when an application fires off a new sequel query for every single tiny variable chain.

Speaker 1

03:39

It's chaos. So instead, the waiter stays at your table writing everything down on their night pad.

Speaker 2

03:45

And that notepad is the unit of work spot on.

Speaker 1

03:47

It tracks all the modifications, the new additions, the items you want removed. And it is only when you are completely finished and the waiter walks over to the kitchen enhance in that final consolidated ticket that the database actually gets to work.

Speaker 2

04:01

And in doctrine, handing in that ticket is done by calling a specific method called flush flush. And what's fascinating here is the optimization that happens when you call that flush method. It's really the core power of doctrine.

Speaker 1

04:14

You mean an example of that optimization.

Speaker 2

04:15

Well, let's say you retrieve a blog post entity from the database. In your code, you execute a change like you set my entity name to my name. Then maybe immediately after a different function changes that name again to let's say.

Speaker 1

04:30

Kevin, so change twice in memory.

Speaker 2

04:32

Right, Doctrine's unit of work is tracking both of those changes on its notepad. But when you finally call flush, Doctrine doesn't waste time running two separate update queries.

Speaker 1

04:42

Wait, really, it just skips the first one exactly.

Speaker 2

04:45

It compares the original state of the object to its final state, realizes only the final name matters, and issues a single sequel query. Oh wow, yeah, just one update my entity set name equals kevin where it equals thirteen to twelve using prepared statements.

Speaker 1

05:00

That is incredibly efficient.

Speaker 2

05:02

It is, and it handles the other entity manager states just as seamlessly, like find to retrieve an object, or if you create a brand new entity, you use persist.

Speaker 1

05:12

Which basically introduces it to the entity manager. Right tilling Doctrine like, hey, watch this new.

Speaker 2

05:16

Object exactly, and if you want to delete something, you call removed.

Speaker 1

05:19

So you just schedule it for deletion. Doctrine notes it on the notepad and waits for the flesh.

Speaker 2

05:24

Yes, and by sinking everything in one massive batch during that flesh call, Doctrine can wrap all those changes in a single database transaction.

Speaker 1

05:34

Okay, why is that transaction part so important?

Speaker 2

05:36

Because if anything goes wrong during the flesh, say you're saving a new user, generating their profile, and creating a welcome post, all at once, a lot of inserts, right, and if the very last insertion fails because of a network kickup, the database automatically rolls back to its exact previous state.

Speaker 1

05:54

Oh that's huge.

Speaker 2

05:55

Yeah, keeping data perfectly consistent, you are completely protected from ending up with half saved.

Speaker 1

06:00

Data, which could corrupt your entire system. Okay, so the entity manager is our waiter, perfectly batching our changes. But it can only do that if it has a map. It has to know that the title property in our PHP object corresponds to a specific text column in our SQL database. Naturally, but we just established that our entities are plain PHP objects with no database logic. So where does this map actually live?

Speaker 2

06:29

So doctrine uses doc block annotations to build this map annotations right, These are specially formatted comments placed directly inside your PHP code, right above your classes and properties.

Speaker 1

06:41

Wait, like literally in the code comments exactly.

Speaker 2

06:44

You use specific tags like at entity or at table above your class to declare it as a database table, and tags like at column at id or at generated value above your properties to define exactly how they map to the columns.

Speaker 1

06:58

Here's where it gets really interesting, though, because putting database configurations right inside PHP comments doesn't that clutter up the code? I mean, because my entity is suddenly deeply aware of my database schema. We're usually taught to keep those things strictly separated, right.

Speaker 2

07:14

Yeah, I mean that's a very common point of friction for developers transitioning to doctrine. I can imagine the purest view is that your domain objects should be completely oblivious to how they're stored, maybe using separate XML or Yammel configuration files right. But practically speaking, keeping the mapping information right next to the code vastly improves readability and maintenance.

Speaker 1

07:36

Because it's right there in front of you.

Speaker 2

07:37

Exactly when you look at a property, you instantly know how it behaves in the database. You don't have to hunt through massive external files to see if a string can be null.

Speaker 1

07:47

I guess I could see the practical trade off there, And the way doctrine handles mapping types is pretty cool too. Oh.

Speaker 2

07:52

The type casting is a massive timesaver.

Speaker 1

07:55

Because doctrine types aren't purely PHP, and they are purely SQL.

Speaker 2

08:00

Right right. They act as a transparent bridge. For instance, Doctrine's text type becomes a standard string in your PHP.

Speaker 1

08:06

App, but in the SQL database.

Speaker 2

08:09

It's automatically stored as a c lobrie, a character large object suitable for massive blocks of texts.

Speaker 1

08:15

That's so seamless it is.

Speaker 2

08:18

But we do need to point out a really critical limitation here. While Doctrine handles the mapping, it does not do data validation. If your annotation says a column has a max length of fifty characters, Doctrine builds the database column to that exact spec. But if a user submits one hundred characters, Doctrine won't stop them. It won't stop them, it will try to save it, and the database will throw a fatal error. You still have to validate user inputs separately, So it's.

Speaker 1

08:44

A mapp or not a bouncer. Yeah, good to know. But honestly, the absolute best part of defining all those annotations is the command line magic.

Speaker 2

08:53

Oh, the schema tool.

Speaker 1

08:54

Yeah, by running I think it's a an orm colon schema tool Colon create, that's the one. Doctrine reads all those annotations and automatically generates the entire underlying database schema.

Speaker 2

09:06

Cables, columns, primary.

Speaker 1

09:07

Keys, without the developer writing a single line of sequel. It completely removes the busy work.

Speaker 2

09:13

It's incredible for bootstrapping a new project.

Speaker 1

09:15

Okay, so mapping a single blog post is straightforward, but things get complicated fast when we add comments and tags right.

Speaker 2

09:21

Enterprise apps aren't just isolated tables exactly.

Speaker 1

09:24

Relational databases use foreign keys and joining tables for this. How does doctrine handle these complex webs well.

Speaker 2

09:31

It manages them through association types, also defined using annotations like at onetominy for a post to comments relationship.

Speaker 1

09:40

Or at minutomany for posts to tags exactly.

Speaker 2

09:43

But to make these relationships work, you have to understand a concept that trips up almost everyone.

Speaker 1

09:49

At first, let me guess the owning side versus the inverse side.

Speaker 2

09:53

Yes, using the inverse buy and mapped by attributes, it is a notorious stumbling block.

Speaker 1

09:58

It totally is. I like to use a dog leash analogy for this one.

Speaker 2

10:01

Okay, let's hear it.

Speaker 1

10:02

So the owning side is like holding the leash of a dog. The dog, which is the inverse side, might know it belongs to you. But if you don't actually grab the leash, meaning setting the owning side and the codoctrine won't save the connection.

Speaker 2

10:16

That's a great way to put it because the crucial rule is that doctrine only manages the owning side of an association. So if you have a method like ad comment on your post entity, you can't just push the comment into an array. That method must include a line like comment arrow set post.

Speaker 1

10:32

You have to explicitly grab the leash exactly.

Speaker 2

10:36

If you don't, the relationship evaporates when you flush.

Speaker 1

10:39

And speaking of arrays, doctrine doesn't actually use standard PHP arrays for these relationships right now.

Speaker 2

10:45

It uses a special class called array collection.

Speaker 1

10:48

Why is that If it's just a list of comments, why not just use a normal array?

Speaker 2

10:52

Because a normal array is completely passive. Doctrine needs to track the internal state of that collection.

Speaker 1

10:59

Oh, to know what to sync with the database.

Speaker 2

11:02

Right it acts like a standard PHP array, but provides the hidden hooks. Doctrine needs to figure out what to update or delete.

Speaker 1

11:09

And that enables some really cool features like orphan removal equals.

Speaker 2

11:12

True yes, automatically deleting tags that are no longer linked to any post.

Speaker 1

11:17

Or cascade equals persist.

Speaker 2

11:18

Or saving a post automatically saves all the brand new comments attached to it.

Speaker 1

11:22

But if we connect this to the bigger picture, all these webs of objects must impact performance.

Speaker 2

11:28

Right massively, especially regarding how the data is loaded. By default, Doctrine uses lazy loading, meaning.

Speaker 1

11:35

It only fetches the relations when you explicitly ask for them.

Speaker 2

11:38

Right, If you ask for a post, it leaves the comments behind. If your code asks for them later, it quietly fires off a second query to fetch them, which.

Speaker 1

11:47

Saves memory initially. But what if I know I need all the comments? Isn't that second query wasteful?

Speaker 2

11:52

It is? That's the classic N plus one query problem. To fix it, you switch to eager loading using fetch equals eager in your annotations, so.

Speaker 1

12:00

It fishes them immediately in one big joint exactly. Okay, so we successfully save these interconnected webs of objects, but how do we get them back out efficiently without resorting to raw SQL.

Speaker 2

12:11

We need a new way to ask for our data, and that is Doctrine query language or BIKIL. Yes, and the massive paradigm shift here is that DQL queries the object models entities, not the database tables.

Speaker 1

12:23

Right, So you're not writing select star from posts left joint comments on.

Speaker 2

12:28

Now you write something like select pc from blog slash entity slash post p lft joinp dot comments c.

Speaker 1

12:36

That is wild. You just use the properties.

Speaker 2

12:39

It's extremely powerful. And to keep these organized, Doctrine uses entity repositories, which.

Speaker 1

12:44

Is the table data gateway pattern exactly.

Speaker 2

12:47

The base repositories give you magic methods like fine by title where you don't even write query logic.

Speaker 1

12:52

But for complex stuff we use custom repositories and the career builder.

Speaker 2

12:55

Right. Yes, you build a query by calling methods like create, query builder left, join ad select. It translates seamlessly into SQL.

Speaker 1

13:04

So what does this all mean for security? Because I'm thinking about SQL injection? Like if someone types quote r quote a quote equals quote.

Speaker 2

13:12

A ah, the classic exploit.

Speaker 1

13:14

Right, does the query builders stop that?

Speaker 2

13:16

Absolutely? The query builders set parameter method acts as a built in security guard. It automatically escapes inputs some malicious code can't sneak through.

Speaker 1

13:26

Oh that's such relief.

Speaker 2

13:27

And to boost performance, Doctrine caches these generated DQL queries. It parses at once, cashes the sequel, and just swaps the parameters next time.

Speaker 1

13:35

Wow.

Speaker 2

13:36

You can even use aggregate functions like count dc dot I to return custom two dimensional arrays, like getting a post with its total comment count.

Speaker 1

13:44

Okay, so we've mathtered standard relations, but let's push the limits. What happens when our data model hits a classic OOP concept that relational databases despise.

Speaker 2

13:54

You mean inheritance inheritance? Yea, relational databases really hate inheritance, right?

Speaker 1

13:59

So if I have a post author and a common author that both extend an abstract auter class, how does doctrine handle that?

Speaker 2

14:07

It offers three inheritance strategies. The first is mapped superclasses.

Speaker 1

14:11

Okay, what's that?

Speaker 2

14:13

It's good for sharing properties in your PHP code, but at the database level, the child classes get completely separate.

Speaker 1

14:20

Tables, so no real relation in the dB. What's the second?

Speaker 2

14:23

Single table inheritance. It puts everyone in one massive table, using a discriminator column usually called D type to tell them apart.

Speaker 1

14:31

I love my closet analogy for this one, but you're the closets. So single table inheritance is like throwing everyone's clothes into one giant closet, but slapping a sticky note like the D type on them.

Speaker 2

14:42

Highly performant, but a bit messy because.

Speaker 1

14:44

You have a lot of empty space for properties that don't apply to everyone exactly. But the third strategy class table inheritance. Yeah, that's like giving everyone their own separate closet. Right.

Speaker 2

14:54

Each class gets its own table.

Speaker 1

14:56

But if you want to put together an outfit, you have to open all the closets, which.

Speaker 2

15:00

Which means your queries require massive joy and ends. It's flexible, but much slower.

Speaker 1

15:05

This raises an important question, which one should I actually use?

Speaker 2

15:09

Well, it's about architectural trade offs. Single table is usually best for performance. Class table is really reserved for highly complex models where performance isn't the primary bottleneck.

Speaker 1

15:19

Makes sense now real quick? Before we wrap, we have to mention doctrine's event.

Speaker 2

15:24

System AH life cycle events right.

Speaker 1

15:27

Like using at hass life cycle callbacks and at pre persist to automatically stamp a publication date right before insertion.

Speaker 2

15:34

It's very handy, but be careful. More complex logic like an insult event listener that scans for bad words or emailing an author, look at that in the entity exactly, Keep entities clean. That logic belongs in external event subscribers.

Speaker 1

15:48

Good rule of thumb. Wow, Okay, we've journeyed from the raw connection of PHP to the elegant abstraction of doctrine.

Speaker 2

15:54

Today we covered a lot of ground we did.

Speaker 1

15:56

We saw how the Entity Manager Association's DQL and advanced inheritance make handling data an object oriented dream.

Speaker 2

16:04

And for you listening, this means faster development, cleaner code bases, and letting the framework handle the heavy lifting.

Speaker 1

16:11

Yeah, database, synchronization, security, you name it. You can just focus on building features rather than writing boilerplate sequel.

Speaker 2

16:18

But consider this, doctrine does provide native queries and DBL access for when you need raw database specific power, right, which.

Speaker 1

16:27

Makes me think, As ORMs become more magical and do more of the thinking for us, do we risk losing our fundamental understanding of what makes it database actually performant under the hood.

Speaker 2

16:37

It's a valid concern.

Speaker 1

16:38

Does the ultimate abstraction eventually abstract away our core engineering skills? Something for you to ponder before your next deep dive.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript