#108 Modeling Sports & Extracting Player Values, with Paul Sabin - podcast episode cover

#108 Modeling Sports & Extracting Player Values, with Paul Sabin

Jun 14, 20241 hr 18 minSeason 1Ep. 108
--:--
--:--
Listen in podcast apps:

Episode description

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!


Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!

Visit our Patreon page to unlock exclusive Bayesian swag ;)

Takeaways

  • Convincing non-stats stakeholders in sports analytics can be challenging, but building trust and confirming their prior beliefs can help in gaining acceptance.
  • Combining subjective beliefs with objective data in Bayesian analysis leads to more accurate forecasts.
  • The availability of massive data sets has revolutionized sports analytics, allowing for more complex and accurate models.
  • Sports analytics models should consider factors like rest, travel, and altitude to capture the full picture of team performance.
  • The impact of budget on team performance in American sports and the use of plus-minus models in basketball and American football are important considerations in sports analytics.
  • The future of sports analytics lies in making analysis more accessible and digestible for everyday fans.
  • There is a need for more focus on estimating distributions and variance around estimates in sports analytics.
  • AI tools can empower analysts to do their own analysis and make better decisions, but it's important to ensure they understand the assumptions and structure of the data.
  • Measuring the value of certain positions, such as midfielders in soccer, is a challenging problem in sports analytics.
  • Game theory plays a significant role in sports strategies, and optimal strategies can change over time as the game evolves.

Chapters

00:00 Introduction and Overview

09:27 The Power of Bayesian Analysis in Sports Modeling

16:28 The Revolution of Massive Data Sets in Sports Analytics

31:03 The Impact of Budget in Sports Analytics

39:35 Introduction to Sports Analytics

52:22 Plus-Minus Models in American Football

01:04:11 The Future of Sports Analytics

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi...

Transcript

Folks, you may know it by now, I am a huge sports fan. So needless to say that this episode was like being in a candy store for me. Well, more appropriately, in a chocolate store. Paul Sabin is so knowledgeable that this conversation was an absolute blast for me. In it, Paul discusses his experience with non -stats stakeholders in sports analytics and the challenges of convincing them to adopt evidence -based decisions.

He also explains his soccer power ratings and projections model, which uses a Bayesian approach and expected goals, as well as the importance of understanding player value in difficult to measure positions and the need for more accessible and digestible sports analytics for fans. We also touch on the impact of budget on team performance in American sports and the use of plus -minus models in basketball and American football.

Paul is a senior fellow at the Wharton Sports Analytics and Business Initiative and I like truer in the Department of Statistics and Data Science at the Wharton School of the University of Pennsylvania. He has spent his entire career as a sports analytics professional, teaching and leading sports analytics research projects. This is Learning Visions Statistics, episode 108, recorded April 11, 2024.

Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible. I'm your host, Alex Andorra. You can follow me on Twitter at Alex underscore Andorra, like the country, for any info about the show. LearnBayStats .com is Laplace to me. Show notes. becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on Patreon, everything is in there. That's LearnBasedStats .com.

If you're interested in one -on -one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate .io slash alex underscore and dora. See you around, folks, and best Bayesian wishes to you all. Welcome to Learning Vagin Statistics. a full conversation in French as we just had before recording. Well done. It used to be though. Go back two to three hundred years. Maybe you just don't go to Africa enough. That's where French is spoken a lot now too.

Exactly. But other than that, you can see French used to be a very international language because in my travels, almost all the time people tell me, yeah, I studied French in high school. And the only thing they can say is just a few words. Which is normal, like if you don't use it, right? But yeah, you can see that because French is still, or was still taught in high school and now less and less. So yeah, so well done Paul for that. I know, I don't think French is an easy language to learn.

What has been your experience? I'm actually very curious. You know, it's hard to say, so this is a statistics pod or data science podcast. So I guess I can't really, I can't really compare it to anything else. That's the only other language I've learned besides my native English. So, you know, I guess, you know, one sample size for me, I took it in high school as well. I hated it.

I had, so, you know, coming from America, you know, so the reason I chose, you know, seventh grade is when I had to choose whether I was taking French or Spanish. And I'm the youngest of four kids in my family growing up. And my older siblings told me that the Spanish teacher was really mean. And that's originally why I took took French. and then I took it for the required two to three years. And then I was done.

I had in high school, I had this teacher from Belgium and I still remember her name, Madame Vendon Plus, and I couldn't stand her, but come, come to find out looking back in life that she was actually a really nice person. She was just Belgian. And the cultural, you know, like Americans think they're the best and the French language in Europe people also think they're the best because they ruled the world in the 17 and 1800s and America felt like they've ruled the world for the last 100 years.

And so when you get into a room together and you think both of your cultures are superior, you know, that doesn't go well together. But actually, so after that, I didn't... speak French at all. And then I did church service for my church for two years and I lived in Montreal, I lived in Quebec, not actually in the city, I lived in a lot of rural small town. And so I studied French really hard. I had to learn the very strong Quebecois accent.

And then when I went back to school, it's when I like really honed in my French. I was very conversational, could speak very fluently in Quebec, but then, you know, I had to learn the grammar a little bit more. in depth. So then I studied French as well at university as well. So, you know, immersing yourself and the actually like learning languages because when I learned it in school, it didn't never made sense to me.

But when I studied it on my own and I studied conjugation and all these things, it became kind of like a math problem. And so when I would speak a sentence in my head, I'd always be like, I need a subject. I need to conjugate the verb. And then I need to say like what I'm, you know, just do an adverb or an adjective after it. And like it made sense in my head, but that's not how I was taught in school. I was taught, I had to memorize all these words, like everything in the kitchen.

How do you say dishwasher? How do you say refrigerator? How do you say fork? How do you say spoon? I couldn't learn like that, but at like living and like thinking about French as a math equation, it made sense in my head and I was able to pick it up. You know, sure. I made tons of mistakes and embarrassed myself, but it wasn't too bad. And that's how you learn. Yeah. So I'm guessing. Like from that answer, I'm guessing people already know why I invited you on the podcast.

Very nerdy answer, your put languages, that's perfect. Thanks a lot. And yeah, I completely relate actually. I learned English and German in high school and yeah, kind of the same. I always hated formal language learning. And like in the end I learned these languages and Spanish that was the same and Italian that was the same, just going to the country basically. And yeah, as you were saying, I think also what it adds is you've got skin in the game.

You're in the country, you're having a conversation with someone. If you're not able to talk, you look extremely stupid. So it's a very good incentive for the brain to step up and learn. And that's really awesome. And then when you are in the situation that you... don't know what to say, you remember that. And then when you learn, this is what I should have said, it sticks with you because it has an emotional attachment to it. Yeah. Yeah. No, exactly.

And I mean, and that's going to be a good segue to my first question to you, but I think it's also one of the situations in life, where you can really, feel and see your brain learning. So that's why I also really love learning new languages and going to countries to do that because. Like you arrive in the country, you don't know how to say anything.

And in just a few weeks, your brain starts picking up stuff and you can really, really feel your brain doing its amazing work that it's been like conditioned to do from years of evolution. And to me, that's just absolutely incredible that the brain is able to do that. Even when you're like in your thirties and beyond, you can do that. And it's just, I found that absolutely incredible. And that's kind of like a Bayesian.

neural network, you know, so I mean, see that segue, I should definitely have a podcast. So actually talking about base. Yeah, I invited you on the podcast because you do absolutely awesome work on sports modeling. And people know that I'm a big fan of a lot of sports. I love modeling sports and so on. So I'm super happy to have you here. And I have a list of questions that is embarrassingly long.

But maybe can you tell us if you are actually yourself using some basic methods, if you're familiar with those or not? And yeah, in general, what does that look like in your work? Yeah. So yeah, I mean, just a quick background about myself, right? I've worked in sports, what we call sports analytics for almost 10 years now.

Out of actually, I was getting my PhD. And statistics, and I, you got, there was this job opportunity at ESPN, you know, which is a sports broadcasting television channel in the U S and a few other countries. And, you know, I got the job offer to work on their sports analytics team where essentially what the team there does is make forecasts so that, you know, they can show on TV, you know, on the bottom line, like who's expected to win, or they can, we will run simulations on.

you know, who's likely to win the championship, you know, all throughout the season. And so, you know, you can tell stories with that saying, you know, the team was just like the beginning of the season. No one thought they were going to be any good, but just look how it, you know, they got better or the opposite. Like they were supposed to be really good and everything just went wrong.

And so in my field in sports modeling, I would think actually you can't, you can't do it without being Bayesian. And so when I would interview people, I'd always focus on, on those. So as people coming out of school, sometimes they don't always learn Bayesian methods very well. And the reason is in sports, sample sizes are very small and you have to make forecasts with very limited data. And the great thing about Bayesian is statistics is that you actually have more data.

You just haven't observed it. You have expertise or you have opinions, but those opinions actually matter. And so maybe we'll get into this, but I'm actually a very strong advocate because of my field of being a subjective Bayesian analysis. It's okay to insert some information into your models and it usually makes them better. Yeah. Well, awesome. couldn't have dreamt better and I have to fully structure.

I didn't know Paul was going to answer that because that's not really, I haven't seen that in your, you know, on your website or else, So before, while preparing the episode, I didn't know if you were already using Bayesian methods or else. But definitely, definitely happy to hear that. And so that people know that was not a conspiracy. I didn't know anything that Paul was going to say. OK, so that's awesome.

So I'm an open source developer, so I'm always very curious about the stack you're using. What are you using actually when you're doing Bayesian analysis of a spot model? So in my career, I almost always use R and Stan. So if I'm doing Bayes analysis, I write a lot of Stan code. It's gotten easier with the Chat GPT. It doesn't do it all the way, right? But if it's like, hey, I want to build this kind of model, it'll at least give me a good framework.

And then I can adjust it and edit it as I want from there. Yeah. Yeah. And I mean, for sure, you cannot go wrong with the. with R and Stan. So yeah, definitely. And we've had the, one of the creators of Stan, Andrew Gellman, was back on the podcast a few weeks ago. It was not released yet, but through time travel, it's gonna have been released when your episode is out. So folks, you can go back to - Right, because I am definitely a lesser draw than Andrew Gellman is, but that's great.

No, yeah, so if people are curious about what Andrew has been up to, lastly, it's the third time he's been on the show and he just released a new book, Active Statistics, that I definitely recommend. It's really fun to read.

It's like, it's how to teach statistics with stories, which actually relates to something you just said, Paul, about the, like, cool and fun way to relate statistics to... non -stats people was to be able to tell stories about a team's probability of winning or any forecast like that. So that's definitely interesting to hear you talk about that. And actually I'm curious because I've been following that field of spots analytics for a few years and I've seen it personally mature.

quite a lot and evolved quite a lot when it comes to the technology and the data availability. So I'm curious what an expert like you think about that evolution of technology and data availability and how that changed the landscape of Spots Analytics. Yeah, I mean, it's exploded in the last 10 to 15 years. So I mean, if people are familiar with the book slash movie Moneyball, which is 20, about 20 years, the book is about 20 years old now. The movie is about 12, 13 years old now.

you know, back then in baseball, baseball was the sport that sort of took off in sports analytics. I mean, for a couple of reasons. One, the game is very discreet. So their start and their stopping points. So you can measure. Right. Discrete events very well in baseball, but two, like they're the only sport that actually had a really long running data set.

And that went back and they've been keeping statistics in baseball and you can actually go back to the 1800s and find out how people were playing baseball in 1895. No other sport has that. So that's, that's probably the reason why baseball took off. but since then, you know, every sport for a while after that, every sport had what we call play by play data, which is like, this is what happens. Soccer had a, a version that was called event data. So would people would.

watch a game and every time someone touched the ball or made a pass, they would mark, the ball was touched here on the field and it was passed to there or they dribbled from here to there. So it was, they kind of were discretizing soccer in a way to make it a similar format. But then about 10 years ago, we started getting this player tracking data, which is the location of everybody and the ball or the puck on the field, you know, depending on the sport, 10 to 25 times per second.

And that's drastically changed. the methodologies and things that are used. So, I mean, Bayesian analysis was great for this play by play data or even, you know, game by game data and measuring how, how players or teams performed. And then now we've started getting such huge data sets that, you know, more of the computer science world, neural networks, things like that started becoming much more prevalent in sports analysis just because the data sets were so massive.

Not that statistics doesn't play a role. It still does. And I think. People sometimes overly rely on these black box methods. They don't think about the implications or the biases in the data, which are still important. But we have these huge amounts of data now and it's just exploded to like, you know, if you want all the data in a season in the NFL, it's like over one terabyte of locations of everybody on every field, 20, every play of 25 times a second. It's just massive. Right.

So it's, it's really changed the way people have done things. Right. And we started going from really simple questions to huge big questions. And the funny thing is now, I actually think with the data being so large, people are now actually going back to answering more simple questions. Like we're not trying to measure everything all at once. Let's try to measure very specific things that we weren't able to measure before. Hmm. Yeah, that is definitely interesting. and is that so first.

Is that availability of data, massive availability of data, the case in all the sports industry? Or is it more, well, the most historical ones, as you were saying, maybe more baseball. I know the data set are more massive there and maybe other sports like soccer are less prevalent, the data set are less prevalent, less massive, or is that a uniform trend? First question. And then second question is, Where does that data leave? Is that mostly open source or is that still quite close source data?

Yeah. So I mean, baseball is usually like the cutting edge of everything because they had a head start. And basketball and then like kind of American football, international soccer football and hockey kind of trail behind. But the data sets now in all those sports are very massive. Hockey just got The NHL just got their player puck tracking data just a couple of years ago. Now baseball and basketball have moved on beyond just knowing where players are on the field.

They actually have data of what's called pose data. So they know where different joints and their arms and the legs are of every player on the field or on the court. So that data is massive. It's massive everywhere. There's companies that are trying to collect new data based on video, so they're using computer vision algorithms to do that, but largely to answer your second question. This is not open source data. So the old school data, the play by play data is open source.

You can find that on every sport pretty much via an open source mechanism now. But this huge, these huge data sets of the tracking of the players, you know, 10 to 25 times per second. It's usually all closed source. There are a few. releases of that here and there, you know, the NFL does a competition where they release some of that data each year, like a very small set. and a few other leagues have done something similar as well. If they know that's, that's kind of gives you a taste.

if you have money, there are companies that try to create that data themselves and they'll sell it to you. But you know, that's usually pretty expensive for an individual person to buy. So again, just that. I see. Okay. Yeah, interesting. Definitely. Because like data is kind of oil in our industry, right? So it's definitely interesting to know what's the state of the supply of oil in a way.

Maybe for people who are less versed in in sports modeling, can you give us an example of how analytical insights have directly influenced team strategy or player selection in one of your consulting roles. Yeah. So I mean, I'll just kind of talk broadly at first. I mean, so sometimes it's just the most basic things, right?

So like in basketball, people shoot three pointers more because all they did is figured out the expected value was larger for three point shot than it was for most two point shots. Not, not those layups and the dunks, right? Those are very high percentages. So the expected value of a, of a high percentage times two is, you know, is, is pretty good. But then even if. The percentage drops off a lot when you multiply it by three to get the expected value of a three point shot.

You know, it's also pretty good. So that means basketball has changed drastically because of that. and in my roles, I guess, you know, I think in a lot of sports, there's just been a lot of open questions. People kind of move one way. And then I think actually, I think the sports analysis does really good job of tackling very easy problems first.

But then I think there's actually a tendency for the analysts themselves to be overconfident in their analysis and they're not factoring in all of the sources of variation that might be there. And something I'm also very curious about it is what's your experience with non -stats stakeholders? So coaches, scouts, players, how do they typically respond to the analytics and the insights you provide and other... differences in reception across sports, maybe across roles.

Yeah. So, I mean, it really does vary as in all things, there's variance. There are some typically younger, you know, coaches or scouts that are a little bit more receptive than people who have been doing something for a long time. And I think that's just human nature. You're used to doing things a certain way. You don't like. You know, to stereotype, you don't like some young person coming and telling you how to do your job. Right. So you have to be really careful about that.

and the, and the funny thing is, you know, everything that I have learned or, you know, I believe in, in terms of making data driven decisions and don't overestimate based on small sample sizes goes out the window when I'm trying to convince a stakeholder of something. So for example, If I have a model and I want them to use it, and I think it's going to help them.

Of course, I've done the analysis to say, you know, what over the long run, how it would improve our efficiency, or if we make a decision in this way, it'd be better process, et cetera. I've done that analysis and I've done it over a larger sample size. But when I, when I tell them what they want to know is they want confirmation bias, right? They love confirming their beliefs.

So in order to get them to, agree with what you're saying, it, this works so much more better than saying, you know, out of the thousand players that I did this in, you know, you only were correct 60 % of the time, but my model would have been correct 70%. Like they don't want to hear that. They essentially say, well, my model, you know, you love this player. So does my model. I find the one guy, even if it's literally only one person, they're like, yeah. Like, if your model can.

If your model can see that, then it must be doing something right. And then it's like, then they start to trust you a little bit. And over time you give them little pieces, little crumbs of a cookie that they can help, you know, get confidence in. And then, you know, then is when you share with them, okay, well, but it's also suggesting this, which is different than what you've been doing in the past. Right? So you don't ever start with, you know, trust me.

because you might be wrong, because you're a human. I mean, like, you know, humans always make mistakes, but we usually don't think we make as many mistakes as we do. And so I found just over time is if you get people to trust you by confirming their prior held beliefs, right? It's another Bayesian concepts. If you can confirm their prior beliefs, they're going to accept your future recommendations or future things that the model might suggest more than if you start with.

the differences upfront. And so that's like a little bit of human bias, right? That you have just learned over time. And some things are just really hard for people to accept, but over time, if you get people to trust you and you build that relationship, there's a lot of human elements here and then they trust your work by confirming their prior held beliefs, then they'll trust you and open up a little bit more to being a little bit more open -minded about other things as well.

Because then like, okay, well, I know you're not an idiot. Like you could speak my language some. now I might be more open to learning a little bit of your language. And that's just sort of a human relationship thing that you have to always work on. Yeah, that is very interesting. And I'm very, yeah, I'm always very interested to hear about that because I also face clients daily and have to explain models to them.

And so as you were saying, that definitely varies a lot in interactions to the model. But that negative wisdom of maybe indulging the... the confirmation bias at the beginning and then slowly go towards a bit more of speaking the truth. It's very interesting. I had not thought of that, but that's yeah, definitely I can see that being a valid strategy when you also are in front of someone who doesn't really understand the value of the modeling, I would say.

Whereas when I encounter clients who are already convinced of what the models can do for them. They are usually looking for contradicting what they already think. And that's when they find the model interesting. So I find that really, really cool to see. The contradictions are really where there's value, right? But there's no value in a model if no one uses it, right? Even if the model is really good, if no one uses it, it has zero value.

If they use it, the contradictions are valuable if they're right, correct? So in soccer analysis, you know, I've spent my career doing lots of different sports, but there's this sort of, this applies to every sport. In basketball, we can call it the LeBron test and soccer, we'll call it the messy test, where it's essentially, if you build a model and it's trying to evaluate players and messy is not like one of the top players in your model, then.

You're not going to share it with anybody because no one's going to believe you. Right. That's like the first thing everyone does is like, okay, well is messy up top. And if like, if messy is near the top, then like people, at least they'll listen to you a little bit longer. Right. But they're not going to listen to you at all. If you're like, yeah, messy is an okay player. Right. Like I don't care what your model says. Right. That's wrong. Right. That's that, that's what people believe.

So it's like a little bit of like, I need to feed you like, no, no, no. Like I'm taking a different approach than what you do, but you know, my approach also thinks that messy is the best. Right. And then I'm like, it's okay. You know, Okay, yeah, we agree. He is really good. Yeah, it's like a sniff test, right? And it's like, in a way, it's like, well, that's a strong prior. And it's like, it's saying, well, I have a very strong prior. That message is really good.

To convince me, otherwise you're going to need really, really good data. It's like, well, the earth is very probably somewhat round. It's going to be very hard for you to... move that prior from me and telling me it's not, in a way. Yeah. And in sports, people have really strong priors, right? So, you know, those sniff tests do really matter. And as a modeler, even for myself, like, I'm a human. So like, I do the same thing. If I'm building a model, I always want to see the results.

And it's like, I don't look at the median, like I do, but I don't look at who the median result is in my model half the time. I usually look at the best and I look at the worst. And if I don't understand it, then I'm like, maybe my model is doing something wrong. And I'm all like, gonna, I'm going to dive in a little bit more. If it like confirms my prior held beliefs, I'm like, it's probably correct. Right. And even as a modeler, right, you have to be careful of that.

But at the same time in sports, you know, it's like I said, subjective analysis can be helpful. It's because people's subjective and I'm like, there's wisdom. People coaches have been playing a game for. 20, 30, or coaching a game for 20 or 30 years to think that they don't have something to offer a model is kind of crazy in my opinion. They might have biases and of course they do, but their information that they can provide is useful. Yeah, definitely.

And that's where we go back to what we were talking about at the beginning in the value of Bayesian inference in that context. Because if you can leverage that deep and hard -hearned knowledge, from the coaches, from the scouts, and add that to your model, it's like getting the best of both worlds. And that can make your analysis extremely powerful and useful, as you were saying. Yeah. And people have done studies like this, I've done studies like this.

If you build a model just on the data and ignore the human element, right? Or if you build a model just on human and scouting analysis and ignore the other data. Right. Neither one of those is going to do as well as when you combine both. And that's really, that's what, you know, that's Bayesian analysis is you're combining subjective belief with objective data and then making forecasts based on them.

And we know that if you have priors that are not really, really bad, a subjective Bayesian forecast is going to have smaller error than a data, you know, what we call maximum likelihood forecast, right. And stats terms, right. Or. You know, just the human one, just the no data, but, you know, feelings forecast as well, right? So there's the combination of the two, always does better. Yeah. Yeah. Yeah. Preaching, preaching to the choir here for sure.

And actually, I think that's a good time now in the episode to get a bit more nerdy, if we can, because I've seen you, so you've obviously worked extensively with. soccer analytics and you have an interesting soccer power ratings and projections on your website that I'm gonna link to in the show notes but can you tell us about it and what makes these projections unique in your perspective in evaluating team and player performance and don't be afraid to dig into the nerdy details because...

My audience definitely liked that. Yes. Sure. I'll dig in. So what's on my website is... Sorry if you can hear my dog there. What's on my website is perhaps the most simple power ratings forecast that I've ever done. So I say that, not that it's like stupid or anything. So when I was at ESPN, I build power ratings in American football, both professional and collegiate, and basketball, professional and collegiate. and hockey, I mean, like almost every sport, right?

So what's on my website, I'll explain the model very simply is it's a Bayesian model where you have an effect for each team, right? And the response variable is the expected goals for each team. So usually when we do a power ratings and we're trying to estimate for a team, you know, there's two sort of.

things that we're trying to estimate their offensive ability and their defensive ability and then you assume essentially that their overall team ability, you know, if it's a linear model, right is the combination of their offense and their defensive abilities. Okay, so you so essentially in each match, right?

You have essentially two rows of data where you have the expected goals for the one team and then the expected goals for the other and the reason we use expected goals, although I actually have lot of issues with the expected goals. They are a better indicator of how, how good the team performed on offense than just the raw number of goals. And right. I don't need to go into details, right?

It's essentially a, it's an expected value as opposed to an observation from a Poisson distribution, which soccer scores roughly, roughly reflect a Poisson or pretty close to a Poisson distribution, right? The expected goals is that expectation. And so essentially I have a hierarchical Bayesian model where I actually. I actually do a few things. So I actually assume the expected goals is the mean of a Poisson distribution. The observed goals is the actual outcome of the Poisson distribution.

And then I fit a linear model essentially where I look, okay, I have team A was on offense, team B was the opponent. And this was team A's expected goals. And I'm essentially fitting a regression model, right? A Bayesian regression model where I have individual team effects. I have a prior on each team. each team's offense and each team's defense. And that prior, you know, rough, I don't have to get too crazy.

You know, I just use a normal distribution and, and, you know, sometimes I actually, when I code in Stan, I actually like using, distribution was a little, a little bit thicker tails. But I think for this model, I was just trying to go simple, normal distribution prior with a mean, you know, for my expected, essentially each team's expected goals per game, on offense versus.

Defense right and the defensive value I usually use I usually do the subtraction So it's team the offensive team minus the defensive team and that way The the defensive team's value is is is higher if they're a good defense So essentially if team a's, you know expect the goals and they in a game against an average opponent is like 1 .5 and the defense was Average expected goals in the game was you know that they allowed was 1 .4 then you would say, the difference is like 0 .1, okay.

I also include effects for being at home in this model. I think, actually, I think that's all I do. But in other models I've done, you can look at things such as how much rest they've had since their last match. You can look at the difference between each team's rest. And those are not linear effects, right? You have to do some sort of nonlinear effects for that, right?

Because like one day of rest is, two days of rest is not, Like the difference between two days of rest and one day of rest is very different than seven days versus eight days of rest, right? Seven and eight days of rest are pretty much the same thing, but two and one is very different, right? Like much bigger effect for having two days of rest than just one day of rest. And so you can do things like that, or how far away they had to travel, those sorts of things.

Now in European soccer, that's not a huge deal, because especially in the competitions within each country, no team is traveling that far. But in American sports, it is a pretty big deal. Like, you know, you, you have to fly five, six hours across the country on short notice. Like that can, that can really affect performance.

and, and other things, like I said, I don't have this in the soccer model, but I, if anyone's interested in modeling sports outcomes, that people typically tend to overlook is the, I liked always a big proponent of elevation, meaning that if there are certain sports where there are certain teams that play at higher altitudes, And if you're not used to playing at higher altitudes, it's actually a very noticeable effect in a model that you're going to have a lower offensive output and

you'll actually allow more points on the other end due to fatigue. And so the United States, it's the teams that are playing in Colorado and in Utah. But in Europe, it could be the teams that have to go to Switzerland or the teams that have to go to some of these alpine regions that are higher up in altitude. In Mexico, if you have to go to Mexico City, it's extremely high. Or Colombia, right?

I mean, depending on what you're doing, these are very high altitude places that have shown to have a measurable impact on an opponent's performance. Yeah, that's very fun. My God, I love those kind of models. That's so much fun. And I would also guess that, I mean, at least my per would be that there is a reverse mechanism also for teams who are used to playing altitude. Do they get a boost of performance when they play closer to the C level?

Because they could have had adaptation that make them better when they go to the C level. Yeah. I mean, I think there's certainly science behind that. I found that is a lot harder to show in a model than the reverse. Not that it might not be there, but I think the effect size, if it is there, is definitely smaller than the reverse. Yeah. That's what... That's what I would expect to like. I think the effect is here mainly because, well, I've seen it.

Like it seems to be pretty well seated in the science literature, but that doesn't mean the effect is big. So yeah. Yeah. I mean, I'm a runner and I know that all of the distance runners that are training for marathons that are elites and professionals, they all train at higher altitudes, right? For the... six weeks leading up to a competition and then they travel to the competition at a lower altitude. And, you know, they think they have an oxygen performance boost due to that.

Yeah. Yeah. Kind of like legal oxygen doping, legal blood doping. Yeah. Yeah, exactly. Yeah. Yeah. I mean, I think it seems to be pretty much proven.

I would say maybe it has more of an impact on individual spots like marathon running or else, because it's more like, you know, it's just like, Even if you're winning just a few tenths of a second, well, it can help you have a better time in the end because, well, at this level, just having the smallest increase in performance could be the difference between first and second place.

But maybe that's harder to see such a small effect on a collective spot, a collective game because, well, Maybe there are some... Maybe it's just not an addition. Maybe it's actually the effect cancel out. So in the end, you don't really see a big effect. But that would be... Yeah. I'd love to do an experiment on that. Like an RCT. That would be so much fun. Yeah. Well, good luck trying to do experiments in sports. It's hard. Yeah, I know. I know. But that... I mean, if the multiverse exists...

Then there is a universe where we can do that kind of experiments. And my god, these scientists must have so much fun. And yeah, so thanks a lot, first, for detailing the model that clearly and in so much details. That's super cool. So the results of the model are in a cool dashboard on your website. Do you have the model and data available freely, maybe on your GitHub, that we can put in the show notes? Yeah, I'm not sure. I think my GitHub, I don't know if my GitHub model is in the model.

It's on GitHub. I don't know if it's private or not, but I can let you know. You know, I use actually open source data for that. So I, I, let me double check. I can actually double check and get back to you after the show on if, yeah, if I could have it in my public GitHub or not. So, yeah. Yeah. But essentially it uses the, there's a package called world football R and. It uses data from there to build the model. So some of that data is just from, it's scraped from like transfer market.

so I use, I use, I didn't really talk about how I set priors means for each of the teams, but very, a very simple, very simple, hierarchical model is essentially just to use the expenditures of the club and use that as a prior mean for how good the club will be going into the season. And, and. Unlike some other sports in soccer, world football, how much a club spends is very highly correlated with how successful they are, which makes sense, but it's not true necessarily in like baseball.

So, do you see these effects of budget? So, yeah, first, before I go on a follow up question, yeah, for sure. Get back to me after the show. And if that's possible, we'll put that in the show notes because I'm sure. A lot of listeners will be interested in checking that out. I personally will be very interested in checking that out, definitely. So that'd be awesome. And second, that effect of budget that you see on the performance of a team.

And so I guess in football performance mean number of expect expectation of games won. Do you see that on Curse? Do you see? that much of an effect also in a closed league system like the MLS? Or is that so because my prior would be the effect of budget would be even stronger in open leagues like we have in Europe because it's like there is no compensation mechanism, right? Clubs can go down and usually in Europe the strongest clubs are the historical clubs.

or the new clubs are just the ones that were lucky to be bought by very, very healthy shareholders. And like, there is not a lot of switching of the hierarchy and changing of the hierarchy, mainly because of budget, as you were saying. But I would think that maybe the effect of budget is less strong in a closed league like the MLS. Is that true? Is that something you see or is it something that's still in the air?

Yes. So I haven't looked specifically at the MLS, but in general in American sports, which all have closed leagues, the budget, well, for various reasons, the budget effects are not super strong. So, you know, in American baseball, there is no spending limit. So in some American sports, like the NFL and football, like there's a salary cap, meaning you can't spend more than a certain amount.

So there is no relationship between overall spending and winning because everyone has to spend a minimum and there's a maximum. In baseball, there is no limit. There's a tax. If you spend too much money, they do tax you. But there's still not a huge correlation. And then in MLS, like I said, I'm not entirely sure. Most of the clubs, they are constrained about how much they can spend. And so there isn't as much variance also in spending.

So like, you know, Messi going to Inter Miami, it wasn't that Inter Miami could pay him a lot of money. They actually, you know, there's a couple of exemptions that an MLS club could use to pay an international player. They have, they're called, you know, a couple of exemption players they have.

And that's originally started when David Beckham went to Los Angeles and they kind of made that rule essentially just so he could, they could afford paying him what he was used to or close to what he was used to being paid in Europe. and, and the MLS is still kind of the case. You have one or two players you're allowed to have on these exemptions and. The way Messi was able to make it work is he's getting paid from Apple for his Apple's broadcasting the MLS games.

So they're paying him essentially to play in the MLS because they're hoping, more people are going to watch our broadcasts are going to pay us. And so we're going to give you a percentage of that. And that's where actually a lot of his salary or like his earnings are coming from is from a, a deal with Apple versus the actual MLS club in Miami, which can only pay him so much.

So my guess is, my prior is, I haven't looked specifically at the MLS with this, but my prior is yes, that there isn't a huge relationship in the MLS between winning and spending just because there's not much of a variance. In order to see those correlations, you have to have a large enough variance in the spending to notice the relationship, right? So. Yeah, definitely interesting.

I mean, I love also looking at these, you know, the... how the structure of a league impacts the show and the wins is extremely interesting. That can seem very nerdy and I think that's my political science training that kicks back here, but really how you structure the game also makes the game what it is and the results and the show you're going to get. I find that extremely interesting to see how the American games, the US games are structured.

Because ironically, it's a system where there is much more social transfers, if you want, like we have in Europe for social security and health and education. American sports are socialist, and European sports are capitalist. But typically, we consider Americans to be more capitalist and the Europeans to be more socialist. So it's an interesting inversion. Yeah. No, definitely. And I mean, I think...

Honestly, that's going to be interesting in the coming years to see what's happening on the European side because there are more and more debates about whether we should have a closed European wide league, which would basically be an extension of the current Champions League. And honestly, I think it's going to take that road because more and more championship, at least all the championship, I would say, for the exception of the Premier League. get more and more concentrated on just a few clubs.

And just from time to time, you have one club that bumps onto the top, like Leverkusen this year in Germany, Monaco in France a few years ago, Montpellier. But that's like really exceptions. And in the end, you almost always get the same clubs that win all the time. And so the idea of open leagues is not really true for the top of the leagues. It's definitely true for the bottom, but the big clubs never go down.

And... And so I think at some point, this illusion of the open leagues is going to disappear and probably we'll get a European wide championship where like basically the leagues are going to get a bit more even because I think it's better for the show and that's going to make more money. And in the end, I think that's what the question is also. Yeah, you might be right, but I hope, I hope not.

I really, as an American, always have dreamed of Americans doing relegation and promotion just because... You know, in America, we have this problem where we call it tanking, right? Because we have the socialist draft system where if the worst teams are incentivized to lose because they know they're not going to win. So they want to get the best possible players in the draft the next season. And so they're incentivized, you know, to, to lose a little bit more.

And so that really does kind of, you know, the promotion relegation is nice because it solves that, you know, if you keep losing, you lose a lot of money because you get sent down. so everyone's motivated even at the bottom of each league to keep winning games, right? As much as possible. Otherwise they lose a lot of money. And in American leagues with the closed system, it's like, well, Hey, you know, it's actually, we talk about sick sickle.

He, and one thing that sports analytics analytics have done is essentially say, it's really hard to go from an American sport being an average team to a really good team. And the reason is. is the draft system. So in the draft system, people are always overconfident in how good the players are, but there's really thick right tails of how good a player can be.

So when you get a new player who's young and you can draft them at the top of the draft, they might not pan out, but they also have a really thick right tail, meaning that if they do pan out, you could go from being one of the worst teams to one of the best teams really quickly.

And so, You know, it's this other analysis of like, well, if you don't ever have an option opportunity to draft someone in a position where there's that right tail, where, you know, once out of every five years, you get a player who's transcends everyone else that comes in, then you can't move up from average to really good, but you can go from being bad to really good. So often teams and the smarter teams, if they're really good, they say really good.

But once they start noticing the players are getting older, they just trade everybody away. They get rid of all their best players and they just stink for a year or two and hopefully they can get some good draft. They get a lot of draft picks. Essentially. They try to trade their players away, get more draft picks, and then it becomes a sample size problem. And it says, well, if we have more draft picks, our probability of getting someone on the right tail goes up.

And so that's all we're going to do is we're just going to increase our odds of getting that right tail player. And if we get that player, then we'll be good again. Yeah. Yeah. It's like. buying a lot of lottery tickets. Yeah, that's what they're doing. Yeah, now that's fascinating. Yeah, I wasn't aware of these effects. That's super interesting. Because basically, what you're saying is there is an incentive to be extreme, basically.

Either you want to be among the top ones or you want to be among the worst ones. But being in the middle is the worst, actually. It is the worst. Yeah. Yeah. That is extremely interesting. And that's... Yeah, I mean, I actually don't know which system I prefer.

Honestly, I'm just saying I think Europe is getting, is going there because we have more and more basically concentration of the wealth at the very top of the leagues and that's going to make the national leagues less and less interesting basically. But I don't know either if I prefer the European wide championship. Well, I think I would prefer European wide championship. for sure, but I think it would be great to have it still open.

So where you could have, you know, like basically countries would become regions and then you get from like, if you, if you're in the best in France, basically in one year, then you get to the highest level, which is the European one. And then if you're among the worst, you get down to your country the next year.

I think that would be very fun because the, like, especially now that players can be traded very easily between the, the... continental Europe because it's basically the same country legally. That also makes sense that the teams, you know, basically meeting PSG versus Barcelona is much more tied than PSG versus literally any team in France. So yeah, that's going to be very interesting. But at the same time, I'm very, yeah, I love hearing about the wrong incentives.

at the same time of the closed system. So thanks a lot for that. That's food for thought. And that's again, like that's very close to two elections, actually, like how you count the votes impacts the winner. And so here, like really in sports to how you structure your game has an impact on the winners. And I think it's extremely important to keep in mind because in the end, like how the the organization, so the MLS in the US or the UEFA in Europe have actually huge power over the game.

Well, thanks for that political science parenthesis. I wasn't expecting that, but that's definitely super interesting. To get back to the modeling because time is running by and I definitely want to ask you about the plus minus models because you're using that also to... estimate player value in American football. So I'm curious about that. What is that kind of model? Is that mainly for American football that you're using that also for other sports?

Or if it's only for American football, why is that particularly tailored to that sport? Yeah. So plus minus models actually are originated in basketball and they're, they work the best in basketball. They're not perfect. And that sort of the concept in basketball is you have 10 players on the court at each. at each moment and they substitute in and out. But while those 10 players are on the court, you know how many points are scored for each team, right?

So, you know, five players on the offense side and five players on defensive side. There's essentially just a big linear model and you look at and you want to adjust for how long they're on the court or how many possessions they were on the court for. So you can say, okay, these 10 players are on the court for two and a half minutes. And in those two and a half minutes, this team scored six points and their team scored four points.

And essentially what you're doing then is a plus minus model, essentially. So sometimes you might see in a, in a statistic after the game, like the total difference in the net points for the team when a player was on the court versus when they're not. Well, that's not too useful because there's a lot of correlations, right? You're playing with someone else a lot.

So what we call an adjusted plus minus model, right, is a linear model that then tries to fit those player effects of, you know, you get a one when you're on the court. on offense and negative one year on defense. And we look at your team's efficiency, right? Your points divided by some denominator, whether it's minutes or possessions. Okay. And that's sort of the basketball thing over time. They realized, okay, well, there's so much correlation between who is playing together.

We need to adjust for that. So they used ridge regression. And so that would divvy up the credit a little bit better. And you know, ridge regression is very good at when there's A lot of multicollinearity or correlation between two effects, right? And on the basketball team or all basketball players, you have teammates that play a lot together and they don't play with other people a lot.

But Ridge Regression has done a decently good job in basketball over a big sample of estimating how effective players are. And if you look at these things, you'll see, we talked about the sniff test. In 2012, LeBron is the number one player. And he's the number one player for a lot of the years, not so much anymore because he's older, et cetera. Right. But that's sort of those sniff tests that we get.

Well, some people in, in basketball and I'm proponent of this, like, you know, this is a Bayesian podcast is that ridge regression, you know, for those unfamiliar is, is a frequentist way to write a Bayesian model. That's very specific where you have a normal prior on each player with a mean zero. Okay. And that's ridge regression. So we think about it from that perspective with adjusted plus minus models.

What happens when you have a normal prior with mean zero is that when you have players that play less, we shrink more towards the prior mean. And it's only when we have more data for players that we can deviate from that prior mean. Well, one thing we know about sports is if you're not playing as much, that actually is pretty useful information. And what does that tell us? You're not very good. Because if you're good, you're going to play more. And if you're bad, you play less.

So other people have come around and, you know, in the last 10, 15 years and said, okay, well, instead of a ridge regression model for basketball, we should do a Bayesian regression model. And instead of having a mean zero for a player, we should have a mean of something else. So there's a few different versions that people have done. One thing, a very simple version is say just everybody has a mean prior mean of, you know, what we call a replacement player.

Okay. Someone that doesn't play very much. If you're really good and you play a lot. It doesn't matter what the prior mean is too much because the data is going to overwhelm the prior. But if you don't play very much, we're going to stick with that sort of negative prior mean because it means you're below average. And so that's one thing you can do.

A more sophisticated thing sometimes people will do is they'll build a hierarchical model where you have essentially a, a prior mean that is based on other statistics that we observe. So how many points you score or how many assists you have. And those that's called a box, a box score prior mean or a box score plus minus. So that's sort of the basketball. So we gave you the what plus minus models. So that's sort of the basketball approach.

Now. Basketball is really nice because you have lots of games in the NBA. You play every team at least twice and you substitute a lot and there's lots of scoring. Now my work in American football tried to address a lot of these issues in American football. You don't play every team. you don't substitute very much. And if you do play, you only play with certain people like all the time. And then there's not a lot of scoring compared to basketball.

There's some scoring, but you know, there's, you know, American football point scoring is unique, right? You get six or seven points for a touchdown, you get three points for a field goal, you know, and then on more rare occasions, you get these two point safeties. Yeah. So there's roughly maybe 10 scoring events in an American football game versus in basketball where you have, you know, a hundred to a hundred.

So there's, you know, about each two to three points, each one there's, you know, 80 to 120 scoring events in a basketball game. Right. So these models work a lot better. My work in American football has been to sort of, how do we take the basketball model and make some modifications so we can do a football model? And so one of the things that is tricky in football is. that certain positions never get substituted out.

So on offense, the quarterback plays every single play unless they're hurt or they stink. So they get benched. Well, the quarterback also always plays with the same offensive line as long as they're healthy and they don't get substituted out. So how does a model separate credit when the same players are on the field all the time?

And so my work in that was sort of to use Bayesian statistics and take the... the Bayesian regression model where we had a prior mean, I used some information to inform the prior mean for each player, but I also did this unique thing where I shrink. So the prior variance is a function and is actually, there's one prior variance for all players and then it's multiplied by another parameter, which is unique for the position that they play.

And so quarterbacks have a different shrinkage parameter, essentially, or prior variance than. a different position. And then instead of just looking at scoring plays in football, we have what we call is expected points added. So at each play, we look at on average, how many points are you going to score if you have the ball in this position? And I look at the difference between two plays, right? And that tells you essentially how much value you got in the result of the play.

So instead of using every scoring play, I just use every single play in football. And I do this unique shrinkage. dependent on position and doing that, and it's a huge model. So I did this in college football, which has way too many parameters because there's like 16 ,000 kids. But even in the NFL, I've done this and you get interesting results. Sometimes they match up with what you think, sometimes they don't.

But the interesting thing is you can actually estimate how much you should shrink each position. And so actually the model is nice because it essentially tells you how much of the variance in the outcome of the play. is dependent on how good players are across different positions. So in football, we all know that quarterbacks are the most impactful position in the game.

And I did give somewhat subjective priors, but not with, I still left a lot of uncertainty around and the model very well could see and estimate that quarterbacks are in fact the most important position because you shrink them the less they have the largest variance. So. You could look at that. If you look at the most impactful players in football, it should be a quarterback.

But in the same measure, the worst players in football are also quarterbacks because in order to negatively hurt your team, you can only hurt your team really a lot. If you're a quarterback compared to other positions, I mean, every position you can hurt your team, but no one can hurt a team as much as a bad quarterback hurts their team. Just like a good quarterback can help their team better. So that's sort of like a kind of rough overview of, of my plus minus modeling in football.

I think I do have, when I wrote the paper, I have a version of that written in Stan. The data set itself was not public, but I did have a version of the Stan model written and uploaded on my GitHub that you can look at. It's pretty massive. In recent years, I've tried to expand it and to do a state space model type version. So I have effects for each player for each season over time. Yeah, that was exactly what I meant. Computationally, that gets a little bit trickier.

And my dataset, actually, I was able to scrape some data for that. And then actually, I can't anymore. The NFL just stopped releasing that. So that work is on hold for now. But I probably need to find a graduate student that can help me finish it. Yeah, definitely we should put that in the show notes. That's super interesting. Your paper in the... and the link to the GitHub repo. That's for sure.

And that makes me think a recent episode I did, and also a recent interest of mine, I started contributing to that package called Baseflow, where that's precisely that could be useful in your case here, because your model structure doesn't change. If I understand correctly, because well, once you have the model structure, it's kind of like a physics model. It's not going to change when you have new data, but the data sets do change. So you have new data sets coming in.

And so that's where probably using these kind of inference that's called amortized Bayesian inference could be extremely useful because you would basically, if the bottle, the computational bottleneck would just happen once. That would be when you train the deep neural network. to learn the posterior structure and parameters. So instead of MCMC, you're using the deep neural network to learn the posterior.

But then once you have trained the deep neural network, then it's like doing posterior inference is trivial. And so for that kind of models where you have a lot of data, but the model is the same. That's a very good use case for amortized Bayesian inference. So that could be something very interesting here. Yeah. Yeah. Yeah. Yeah. Happy to tell you more about that afterwards if you're interested. But yeah, I've started digging into that, and that's super fun for sure.

So yeah, and I think this is a cool use case. Awesome. Well, I still have a few questions, but can I? We are getting short on time, so can I keep you a bit longer? Yeah, just a few more minutes. Sure. Yeah. Okay. Awesome. Yeah. So actually, I'd like to pick your brain about now talking a bit more about the future. I'm curious. So let me fuse two questions. So first, I'm curious what, where do you see the field of spots analytics heading in the next years? five to 10 years.

And also sub question is other spots, specific spots where you see significant potential for growth in analytics. Yeah, those are, those are good questions. I think they go kind of hand in hand. You know, I think it's hard to, it's hard if I could predict the future, right? I would probably have a different job. I'd probably be retired. But. You know, I think a lot of the future is going to be catching up to, you know, sports like soccer, American football, hockey going to be catching up.

And I think a lot of the growth is actually going to be making sports analytics more digestible for just everyday people. So the fans, right. And that's happened over time, right? You watched a broadcast of a, of a soccer game, 20 years ago, no one talked about expected goals. Now, most broadcasts will show it. They might not always talk about it. They'll show it. Like I said, expected goals, it's better than just showing the score, but there's a lot to be left undone.

I think in the future, there's going to be a lot of sports analytics that's really much focused on expected values to date. And not enough has been focused on distributions and variance around estimates. And so I think once one place it's going to have to end up going. and part of the reason is, right, we, we talk about neural networks. Neural networks are very good at expected values, with really large data sets. It's a lot harder, right?

Modeling variance is a lot harder in anything than modeling and expectations. So I think catching up on some of those things. And I think also, like I said, taking a step back and I think, you know, there's been a lot of good work that has been done, but I think we're going to find a few things that. Hey, maybe we were a little bit overconfident, right? And with everything in sports, it's always about game theory.

So even if something is optimal today, that strategy is not always going to be optimal in the future. And so if you, if, you know, in basketball for a sec, we talked about three pointers. Of course, three pointers are really good right now because they have higher expected value, but you know, defensively players are learning to play against three pointers better than they used to. or in American football, the numbers have said you should pass the ball more.

Well, now the defenses are learning how to defend it better. And so running is going to be more important than it used to be. Right. And so these things are always going to change. And so in five to 10 years, I don't know exactly what it's going to be, but I think in some ways, you know, you might find some analytics person in 10 years giving exact opposite advice of what we're seeing now, just because the game has evolved. The game has changed. And so now you should do something else, right?

To get an edge. and so I think the growth is in twofold. We're always staying on the cutting edge of like, what's next. Sometimes that's going back to where you were. and like I said, making the numbers more digestible for the everyday consumer. you know, it's, it's one thing you and I, we can talk about models. I had to do this at ESPN all the time. I can't talk about prior distributions on TV. Right? So how do we explain these things? Right?

And I think what's really going to be key is over time, this has happened already, but it's going to keep on happening that the analysts themselves are going to be much more data literate than they have been in the past. Not just because they have more people working with them or they're younger. Also the analysts in the future is going to be able to use AI to do their own analysis. And that could be scary because they might make some bad assumptions.

but they're also going to be more data savvy and they could load up a data set and use an AI tool. And even if they can't code to get insights that, you know, I used to have to write some code to get them and now they can just do it themselves. Right. And so that's, I think somewhere else that teams and coaches are going to be able to do more analysis on their own. And it's not that the data people aren't, aren't needed.

In fact, they're going to be needed even more to make sure that the coach isn't missing an assumption, right. That he needs to be thinking about of the structure of the data. Cause he might just be, great. Now I can run a regression. I don't even know. I don't even need to know how to code it. Right. that's great. But are you thinking about this? Right. And so there's going to be a lot of education about using some of these tools better and every, but everyone's going to have their access to it.

Right. It's going to be so much more accessible in the future than it has been in the past. Yeah. Yeah. Yeah. yeah, for sure. Completely, completely agree with that. and that's also something I'm very passionate about. That's also what these show. is here, right? It's to have the bridge between the modelers and the known stats people be easier, in a way. And that's something I really love doing also in my job, basically being that bridge between the really nitty gritty details of the model.

And then, OK, now that we have the model, how do we explain to the people who are actually going to consume the model results what the model can do, what it cannot do, and how we can? make decisions based on that, that hopefully are going to be better decisions than we used to make. And also, how do we update our decisions? Because, well, the game changes, as you said so well. So yeah, for sure, all that stuff is absolutely crucial. And I like using the metaphor of the engine and the car, right?

It's like building the model is the engine of the car. So surely, you want the best engine possible, but you also need a very cool car, because otherwise, nobody's going to want your engine. And so... like then building all the communication around the model, the visualizations, things like that, extremely important because then in the end, as you were saying at the beginning of the show, if the model isn't used, well, that's not a very good investment.

Yeah. So I would have literally, I would have a lot more questions if they are on my list, but we are going to call it a show poll because I don't want to keep you... three hours, you've already been very generous with your time. You can come back to the show anytime if you want to, if you have a cool new project you want to talk about for sure. Yeah, maybe we can record the French version of the podcast sometime, you know. yeah, yeah. I'll definitely be down for that.

You know, someone who will be very happy is my mother. She's always asking me, so when are you going to do the French version of your courses in your podcasting zone? I'm like, that's not going to happen, mom. Maybe that's what moms are for though. Exactly. Before letting you go, Paul, I'm going to ask you the last two questions. I ask every guest at the end of the show because it's a Beijing show, so what counts is not the individual point estimate, but the distribution of the responses.

First question, if you had unlimited time and resources, which problem? would you try to solve? Good. That's a good question. You sent me this ahead of time and I spent a couple seconds and I was like, man, I don't know. But I, it's tough. There's so many questions in sports. Yeah. I know. I, I mean, my, one of my passions is American football and I just keep going back. So I could tell, I love American football and I love soccer, international football. Right.

And both of those games, understanding. There's certain positions that are just really hard to understand how valuable they are. And so in soccer, it's like the midfield. It's we know you need a good midfielder, but how do you measure that? That's a really hard problem. And in football, there's a lot of positions in American football. There's a lot of positions like that as well. So I probably go somewhere along those.

Like I want to, I want to discover and measure the value in these really hard to measure, traits and values and these two sports. Yeah. Yeah, I definitely understand. The battle for the middle is extremely important always in soccer. And if you look at all the teams which win the Champions League, so the Holy Grail, like the Super Bowl of the soccer world, almost all the time they have an amazing and impressive pair or three players as midfielders. And that's like a sine qua non. But...

As you were saying, it's extremely hard to come up with a metric that's going to not only explain why the midfielders are good, but also help you constantly choose midfielders that will increase your probability of winning the Champions League. And I'm seeing that as a very frustrated Paris fan because that's been years since Thiago Mota basically retired that we're looking for a number six. So the play, the midfielder just before the defense and we're still looking for him.

Yeah. So please, Paul, let me know when you're done with that. Yeah. Well, unfortunately, there's several really good French midfielders. They just don't play for PSG. I know. I know. Not a lot of French players stay in France. That's why I'm telling you, we need a European wide league. Many more players would stay in France and play for PSG, I guess. And second question, if you could have dinner with any great scientific mind. dead, alive or fictional, who would it be? Fictional?

I haven't really thought about fictional scientific minds. That is a good question. Geez. Man. Well, I mean, I thought you were going to answer very fast. Actually, that one, I thought you were going to answer Bill James like super fast. Bill James. Yeah. Well, I've met Bill James. So, okay. So I have dinner with him, but I have met him. I'll go a little, how liberal are you with the word scientific mind here?

Yeah. So I think scientific mind, I think Galileo, I think Newton, I think Einstein, right? Like, You know, those are all, but I'm sure from the sports world, from the sports world, there is a former football player that very few people have ever heard of and his name is Virgil Carter. And the reason why I love him, he played in the seventies is that he wrote a paper about expected points in football while he was playing in the NFL. And it was sort of the first sports analytics.

ever done in American football and he was a player in American football at the same time. So very, not very well known. He's still alive. I don't know him at all, but he would be a really cool person. If I go like classical, scientific, scientific minds, I would, I would probably, maybe Gauss like, Hey, this distribution that has your name is like used everywhere and it's very useful. So I probably, I would stick with him. Normal distributions.

counseling distributions, like the rule of world nowadays. So I'd probably stick with that if I were to go traditional scientific mind. Yeah. Yeah. Now good choices. Good choices. I am amazed about that Virgil Carter story. That's so amazing. Yeah. So if anybody knows Virgil Carter, please contact us and we'll try to get that dinner for Paul.

If you do that, I'll definitely be here to grab the dinner and have a conversation with Virgil because like having someone like that on the show would be absolutely amazing. I love that story. That's so amazing. It's like, you know, the myth of the philosopher king. Well, here is like the myth of the scientist player. It's just like, I love that. Yeah. that's fantastic. Damn. Thanks a lot, Paul. Let's call it a show. Thanks for having me. Yeah, that was amazing.

As usual, we'll put resources and a link to your website in the show notes for those who want to dig deeper. Thanks again, Paul, for taking the time and being on this show. Thanks once again, I really enjoyed it. This has been another episode of Learning Bayesian Statistics. Be sure to rate, review, and follow the show on your favorite podcatcher, and

visit learnbaystats .com for more resources about today's topics, as well as access to more episodes to help you reach true Bayesian state of mind. That's learnbaystats .com. Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Megharam. Check out his awesome work at bababrinkman .com. I'm your host. Alex and Dora. You can follow me on Twitter at Alex underscore and Dora like the country. You can support the show and unlock exclusive benefits by visiting patreon

.com slash LearnBasedDance. Thank you so much for listening and for your support. You're truly a good Bayesian change your predictions after taking information and if you think and I'll be less than amazing. Let's adjust those expectations. Let me show you how to be a good Bayesian Change calculations after taking fresh data in Those predictions that your brain is making Let's get them on a solid foundation

Transcript source: Provided by creator in RSS feed: download file