#91, Exploring European Football Analytics, with Max Göbel - podcast episode cover

#91, Exploring European Football Analytics, with Max Göbel

Sep 20, 20231 hr 4 minSeason 1Ep. 91
--:--
--:--
Listen in podcast apps:

Episode description

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!


As you may know, I’m kind of a nerd. And I also love football — I've been a PSG fan since I’m 5 years old, so I’ve lived it all with this club.. And yet, I’ve never done a European-centered football analytics episode because, well, the US are much more advanced when it comes to sports analytics.

But today, I’m happy to say this day has come: a sports analytics episode where we can actually talk about European football. And that is thanks to Maximilan Göbel.

Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan. Before that, he did his PhD in Economics at the Lisbon School of Economics and Management. 

Max is a very passionate football fan and played himself for almost 25 years in his local football club. Unfortunately, he had to give it up when starting his PhD — don’t worry, he still goes to the gym, or goes running and sometimes cycling.

Max is also a great cook, inspired by all kinds of Italian food, and an avid podcast listener — from financial news, to health and fitness content, and even a mysterious and entertaining Bayesian podcast…

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau and Luis Fonseca.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

  • Max’s website:

Transcript

[Alex Andorra]: Maximilian Göbel, welcome to Learning Basian Statistics. [Max]: Thanks Alex. [Alex Andorra]: Oh, yeah. Thank you for, for taking the time. I'm really excited about this [Alex Andorra]: episode. Um, I'm really having a variety of, uh, of, uh, podcast episodes [Alex Andorra]: these days. Um, going from, so episode nine 89 is going to get out in a [Alex Andorra]: few days. Uh, and, uh, you'll see it's about sports also, but it's about

[Alex Andorra]: the science of, um, sports and nutrition. of exercise and nutrition. And so [Alex Andorra]: today we're going to talk a lot about sports also, but more about football [Alex Andorra]: or soccer as it's known in the US. So that's going to be a fun one. And I'm [Alex Andorra]: really happy to have you on the show because you are German. So if I remember [Alex Andorra]: correctly, Germany is in Europe. And so you would be the first soccer analytics

[Alex Andorra]: episode Europe centered, which is cool. Yeah, it's one of the things I'm saying [Alex Andorra]: we should do more here in Europe. But before that, as usual, we'll start with [Alex Andorra]: your origin story. Max, how did you come to the world of econometrics and [Alex Andorra]: machine learning? Because it's actually what you're doing most of the time, [Alex Andorra]: if I understood correctly.

[Max]: Yeah, yeah, you're right, Alex. Well, actually, it's been well, if I say it's quite [Max]: a journey, it sounds dramatic. But that's, that's not the case. But it took me quite a [Max]: while, let's say. Yeah, that's maybe the better framing. [Alex Andorra]: Yeah. [Max]: I started out in my PhD, basically, the first year is, you know, there's just some [Max]: coursework. But I went into the PhD without really having something that I really wanted

[Max]: to work on in particular. So I took the first year to see which courses I like, which [Max]: not. And at my university, it was not really allowed to choose from. I mean, we had [Max]: macroeconomics, microeconomics, and econometrics, the usual stuff. But yeah, really nothing resonated [Max]: with me so much, I have to say. And then I thought I would do some macro, macroeconomics. [Max]: I think many, many people, or most of the people. PhD students really want to do

[Max]: something in that field. So it was also me. But yeah, I really never got familiar with [Max]: that stuff so much. I never really liked it. But in the second year, then there was [Max]: a course of computational economics. And I liked that quite a lot. And it was also, [Max]: let's say a tough schedule. I had to prepare a proposal within a week and I didn't [Max]: have any idea about computational economics. But that really got me into looking into that

[Max]: stuff very deeply or deeper, let's say. And [Alex Andorra]: Yeah. [Max]: so, yeah, basically what I was working on there was some clustering, some unsupervised [Max]: learning basically, but it wasn't really a fancy machine learning back then. So what [Max]: I did [Alex Andorra]: Heh. [Max]: was like the project was related to clustering community structure in the SMP 500 basically, [Max]: that was the project. And... Yeah, but I really thought, oh, this network analysis,

[Max]: this community structure detection, that's really cool. I want to work on that. And yeah, [Max]: so I thought this would be basically the outline for the rest of my PhD. And how [Max]: did I get into economics and machine learning then? Because it wasn't really related [Max]: to or not really machine learning, what I was doing back then. So [Alex Andorra]: Yeah. [Max]: how do I get there then? It wasn't until the third year, basically until I got luckily

[Max]: invited to the University of Pennsylvania as a visiting student. And I got introduced, [Max]: I got invited by Francis Diebold and I'll be forever grateful for him for inviting [Max]: me there. And he had a research group on econometrics. And at that time, the topic [Max]: was about climate. And I, again, I thought, well, I'm, I don't care about the topic actually. [Max]: I just want to learn whatever. Yeah. comes to me. And so, yeah, I took that opportunity.

[Max]: He introduced me to his research group. And they were working on climate on climate [Max]: forecasting, climate econometrics. And that's how I got basically really introduced [Max]: into econometrics. Because before I went to the University of Pennsylvania, I thought [Max]: like, yeah, I basically know what's going on. And I have this and this project. And that's [Max]: cool. But when I really arrived there, I really got to know what PhD in economics

[Max]: is really about. And yeah, that was pretty insightful, I would say. [Alex Andorra]: Yeah. [Max]: And that's how I got introduced, basically, through this research group, through projects [Max]: that we were working on. And then there was one guy, he was Frank's RA. And yeah, he [Max]: was working on machine learning, in particular. And [Alex Andorra]: Mm-hmm. [Max]: basically, a couple of weeks in, he came to me and asked me, well, Max, you want to

[Max]: get me that and that data? And we can work on a project. started off a long, well, [Max]: quite well, a couple of years now of co-authorship with him with Philippe Goulicolom, [Max]: who is now a professor at UCAM in the University of Quebec at Montreal. And he's [Max]: working a lot on machine learning. And he basically introduced me to that sphere. [Max]: And so in the end, it was the third year of my PhD that I got introduced into econometrics

[Max]: and machine learning. And yeah, quite late, as I would say. Yeah. Better late than [Alex Andorra]: Yeah. [Max]: never maybe. [Alex Andorra]: I mean, better late than never. Right? So it's cool. And you seem to enjoy [Alex Andorra]: that. So that's super fun. And so today, what are we doing? Basically, how [Alex Andorra]: would you define the work you're doing nowadays and the topics you are particularly [Alex Andorra]: interested in?

[Max]: Yeah, well, that's a good question. And because everyone I got asked that question, [Max]: I also already or always had a difficult time actually saying [Alex Andorra]: Hehehe [Max]: because I was doing something here, something there. So [Alex Andorra]: Yeah. [Max]: in between, I also thought I would like to get back to macroeconomics actually, but [Max]: after spending a couple of months on something there and it didn't really work out,

[Max]: I completely ditched it at least for the meantime. So what I'm working now is basically [Max]: machine learning and macroeconomic forecasting, let's say. I have a project on recession forecasting [Max]: in the United States, which is probably a hot topic currently. Everyone is awaiting [Max]: it, but it doesn't really seem to occur. So you have to wait a couple of months more. [Max]: And then the other stuff is basically related to climate, a lot of climate forecasting.

[Max]: especially about Arctic sea ice, how Arctic sea ice is projected to evolve in the [Max]: future, not only in the near future, but also in the, let's say, longer run. So [Max]: when Arctic sea ice might potentially disappear, there are a couple of [Alex Andorra]: Mm-hmm. [Max]: projects on that are still related to that climate econometrics group. And then the

[Max]: other stuff is basically, yeah, I mentioned learning. And I got really interested in finance, [Max]: asset pricing, what you can do. [Max]: predicting stock returns, using machine learning tools there. That's super fascinating. [Max]: And yeah, just I mean, I have to say that I'm not a specialist in machine learning [Max]: or so. I'm just super interested and fascinated by the tools and the problems that

[Max]: come with them. So yeah, there's a lot of, well, they are powerful, but. Applying [Max]: them to finance and economics also comes with some drawbacks. So yeah, you have to work [Max]: around that. And it makes it super interesting. [Alex Andorra]: Yeah, yeah, yeah. Yeah, for sure. And I mean, that's [Max]: Okay. [Alex Andorra]: probably by being really interested in a topic that you end up being a specialist

[Alex Andorra]: of it. So it's like you don't really start being a specialist and then being [Alex Andorra]: interested in the subject. It's like the causality go the other way around. [Alex Andorra]: So that's [Max]: Thank [Alex Andorra]: good. [Max]: you. [Alex Andorra]: Like trying a lot of things is how you end up finding. what you're really

[Alex Andorra]: passionate about. Yeah, awesome. And I'm curious actually, in the research realm [Alex Andorra]: of economics, which tools do you use, machine learning tools, to work in [Alex Andorra]: these models? I'm guessing a lot of open source package, I'm hoping. Because [Alex Andorra]: I remember I was introduced a bit to, I mean, I knew a bit the econometrics [Alex Andorra]: economics field in Europe a few years ago and they were using Stata all

[Alex Andorra]: over the place. So I'm curious if that changed and how that changed. [Max]: Oh yeah, that's a funny question. Because Stata, yeah, I mean some people love Stata. [Max]: I'm actually at the complete other end of the distribution. So [Alex Andorra]: haha [Max]: I always try to avoid it as much as I can. I don't know, I never really liked it. [Max]: So what I'm using is basically R and Python. [Alex Andorra]: Okay. [Max]: I also worked a bit on MATLAB. I like MATLAB actually a lot.

[Alex Andorra]: Mm-hmm. [Max]: But yeah, now I'm mostly working in R and Python. And it really depends. Sometimes [Max]: I prefer R. Sometimes I prefer Python. For machine learning, I'm mostly using Python. [Max]: Well, let's say for machine learning, I'm actually using R, let's say, when it comes [Max]: to random forest or [Alex Andorra]: Mm-hmm.

[Max]: gradient boosted trees or something like that or just plain LASA or Ridge. When it comes [Max]: to deep learning, then I'm using Python. So TensorFlow, now I'm trying to switch to [Max]: PyTorch, actually. [Alex Andorra]: Mm-hmm. [Max]: And yeah, so that's basically the patch that I'm using. Yeah. [Alex Andorra]: Yeah, interesting. And how do you choose the tool, the particular tool you're [Alex Andorra]: using for a particular project?

[Max]: Yeah, that's a good question. I think that's mostly an art rather than a science, [Max]: I would say. And it's up to your preference. But not all tools work in every context, right? [Max]: So in economics, it's really the problem, especially in, I would say, macroeconomic forecasting, [Max]: where you have time series of, let's say, it gets until 700 observations on a monthly [Max]: basis for the United States maybe. And then you have a feature set of, let's say,

[Max]: 100 features when you include lags and all that. You can pump it up maybe to 1,000 [Max]: or something. But for machine learning or for deep learning, this is still rather [Max]: a small data set, I would say. So that's ridiculous, actually. [Alex Andorra]: Mm-hmm. [Max]: But still, that's then the challenge, right? To tune them, to train them so that [Max]: they don't overfit. And that's really the interesting part for me, I think. And yeah.

[Max]: In other contexts, other tools might work much more conveniently, let's say, or [Max]: are much easier to apply. So some lasso or so when you have a lot of features and you [Max]: just don't know which features are important, then you, yeah. I like lasso in that regard [Max]: because it selects basically the features for you. Or you might say, well, you're a file [Max]: As a pricing context, we have returns, a lot of noise in their signal-to-noise ratio,

[Max]: very, very low. You really don't know which features are important. So we just maybe [Max]: the better option, because Lasso would basically set almost everything to zero. Yeah, [Max]: so it really depends. You really have to make it dependent on the context that you're [Max]: working in. And [Alex Andorra]: Hehehe [Max]: yeah, but that's also interesting to see which models prefer or work well on which

[Max]: data sets and which contexts. And yeah, I'm still learning in that regard. And that's [Max]: super interesting. [Alex Andorra]: Yeah, yeah, yeah. No, for sure. And I find that super interesting also to see [Alex Andorra]: this ability of open source tools to basically be adopted more and more

[Alex Andorra]: in your research, which of course, I'm extremely biased, but I welcome. But also [Alex Andorra]: mainly because I do think that open data and open source are natural consequence, [Alex Andorra]: but also cause, I would say, of... more open science, which I definitely [Alex Andorra]: welcome and I think should be way more of the case, you know, like more and [Alex Andorra]: more you see papers with accompanying GitHub repositories and accompanying GitHub

[Alex Andorra]: open source packages even in Python or in R, which is definitely something [Alex Andorra]: new. And that's super cool that the research realm is catching up on that. [Alex Andorra]: Um, because less and less you see papers where I remember a few years ago, [Alex Andorra]: you know, like the first open say the open science and, or open data papers

[Alex Andorra]: was like, Oh yeah, the data is available by the way. Um, at the end of [Alex Andorra]: the paper, you know, and then you had to basically beg the, the corresponding [Alex Andorra]: author about like three times a week for four months to get some of the data [Alex Andorra]: and that was not really open basically, um, so yeah, that, that's a really [Alex Andorra]: cool. development that I really love. I have to say.

[Max]: No, absolutely. And this is also, I think that's a very good point. For example, me and [Max]: my co-authors, or my co-authors are pushing for that, really, to make the codes then also [Max]: available on the website, for example, so that people can cross-check. And that's [Max]: very good. And yeah, I like that also myself. When I read papers and I want to replicate [Max]: something and the authors are making the code available, basically, you can check

[Max]: if your own code is correct. That's super helpful. You learn a lot by that. And yeah, [Max]: really, really. Especially when, for example, using GustaTrees or so. I mean, it's [Max]: XGBoost, and it's super convenient to use. And for sure, there's some tuning that [Max]: you have to do yourself. But still, the package is there, basically. And it's super [Max]: convenient to use. You don't have to cope the whole forest, basically, yourself. [Alex Andorra]: Yeah.

[Max]: So yeah, for sure. That's [Alex Andorra]: Yeah, yeah, [Max]: amazing. [Alex Andorra]: yeah. No, clearly. Yeah, that's super nice and well done and like picking up [Alex Andorra]: all those different tools and different [Max]: Mm-hmm.

[Alex Andorra]: languages. That's super cool. And I don't know how it changed, but I do remember [Alex Andorra]: that a few years ago, doing open source development wasn't really incentivized [Alex Andorra]: for doctoral candidates or post-doctoral candidates, so maybe that changed and that's [Alex Andorra]: further better. But if that didn't, the fact that you're doing it is like [Alex Andorra]: even more commentable, I would say, because that's a bit adjacent to your

[Alex Andorra]: project. So yeah, well done on doing that and taking the time to do it. [Alex Andorra]: That's what we're called for sure. Um, so now I'd like to talk a bit about, [Alex Andorra]: yeah. So you said you're doing econometrics, but, um, can you define econometrics [Alex Andorra]: for us and, and tell us what it brings to economics basically. [Max]: Yeah, sure. So a lot of weight now for me on [Alex Andorra]: haha [Max]: giving the textbook definition of econometrics.

[Alex Andorra]: Yeah, exactly. [Max]: No, I mean, it's basically, or now I'm butchering the whole definition probably. But [Max]: it's applying statistical tools to an economic context and trying [Alex Andorra]: Mm-hmm. [Max]: to use statistical tools to basically verify some economic theory or some. to understand [Max]: some relationships between economic variables. So I think it's a, yeah, I think that that's

[Max]: basically it. It's kind of a fancier term for what it actually is, applying statistical [Max]: tools for understanding economic relationships. That's basically it. I mean, it's essential. [Max]: I mean, for empirical work, for sure they're economists who you only work on theory, [Max]: but yeah, for policy analysis or for... you need to analyze the data in the end. And [Max]: basically, that's what I'm doing. I don't really do theory stuff, but for me, it's just

[Max]: all empirical. And yeah, so definitely, it's very useful in the end, especially for [Max]: policymaking at central banks and everywhere, also for the industry, be it [Alex Andorra]: Mm-hmm. [Max]: banking industry or be it just normal in the real economy for [Alex Andorra]: Yeah. [Max]: analyzing demand and all that.

[Alex Andorra]: And do you... So I'm curious how you got introduced to Bayesian methods, [Alex Andorra]: actually, and why they stuck with you, because from what I remember, from [Alex Andorra]: the world of econometrics, Bayes was not used a lot in this field. So I'm actually [Alex Andorra]: curious why you are using it. [Max]: Yeah. Well, I have to admit, like, so I already said that it was like third year

[Max]: that I got to introduce in Jekyll and the Matrix. And that was this project when [Max]: Philippe, Frank's RA basically [Alex Andorra]: Mm-hmm. [Max]: came to me and asked me to gather some data on climate variables because we want to [Max]: run a vector autoregression of the Arctic. Basically, you basically get some, what we [Max]: basically did is we gathered data. and which time series on certain climate variables, [Max]: which we [Alex Andorra]: Mm-hmm.

[Max]: thought would proxy for the Arctic ecosystem basically. And then we wanted to use a vector [Max]: autoregression to analyze certain amplification mechanisms, if there is a shock to CO2, for [Max]: example, and also to be able to produce long-run forecasting projections. So when Arctic [Max]: seas might potentially [Max]: disappear in the future. [Alex Andorra]: Yeah. [Max]: And so the data is highly non-stationary. And [Alex Andorra]: Uh huh. Uh huh.

[Max]: in VARs, when you work with VARs, most economists really work with patient methods [Max]: there. And as I said, data was highly stationary. So patient statistics or the patient's [Max]: framework gives you some leeway there, granted some freedom there. So that was big. [Max]: Yeah, that was why Felipe told me, okay, look at Bayesian VARs, look at the Bayesian [Max]: way. And that's how I actually got introduced to that. And there was at the time, I really

[Max]: didn't have any exposure. So there was a package in MATLAB for doing Bayesian inference, [Max]: basically, with VARs. And that was super helpful. That helped me a lot. That was super, [Max]: or a great education, a source of education, really, that was great. And The more I learned [Max]: about it, the more it resonated with me, this concept of quantifying uncertainty.

[Max]: I think this is because especially in economics, this is quintessential to really [Max]: get an idea of [Alex Andorra]: Yeah. [Max]: what the uncertainty is. Point estimate is always nice, but you want to have the uncertainty [Max]: around it. And that's also what Frank Biber always told us. Yeah, you want to have [Max]: a measure of uncertainty. And definitely, that's true. Yeah, you get it from the in the

[Max]: Bayesian framework. It's just so intuitive to think about it. And yeah, I like that a [Max]: lot. And unfortunately, I don't really work so much or haven't worked in so many projects [Max]: with Bayesian methods lately, or not as much as I would like to. But yeah, it's

[Max]: ever since resonated with me. And yeah. I. Still, I wanted to learn more, and that's [Max]: how basically I got into looking at PyMC, because I wanted to learn with Python, and [Max]: thought, well, maybe an application with Bayesian methods, the Bayesian framework would [Max]: be cool to learn, and that's how I got into PyMC 3, or PyMC basically, or looked at [Max]: it and looked at it. So, yeah.

[Alex Andorra]: Yeah, yeah, yeah. Nice. That's interesting. So yeah, basically, it's like [Alex Andorra]: the uncertainty quantifying that was really important to you. [Max]: Exactly. So that was really the key, the key point there. [Alex Andorra]: Yeah. I mean, that does make sense, right? Because, yeah, that's really [Alex Andorra]: one of the parts where bass does shine a lot. And also, especially for [Alex Andorra]: the Arctic sea ice project that you are talking about. It's not like it's a

[Alex Andorra]: reproducible experiment. It's really hard in these cases to think from a [Alex Andorra]: frequentist framework of repeatable experiments. You cannot have multiple earths [Alex Andorra]: on which you can two RCTs where you melt the ice caps or not, and you melt [Alex Andorra]: it naturally. like naturally or thanks to human intervention. It's just

[Alex Andorra]: like, it doesn't work in that case. So yeah, Base, I'm not surprised that [Alex Andorra]: it would be a project where Base fits way more naturally. [Max]: Yeah, no, that's for sure. I mean, for example, these climate models from these climate [Max]: institutions, these are huge models. And big models, to train them or to run these [Max]: models, it takes a lot of time. And they are very sophisticated. So really, really sophisticated.

[Max]: But they are basically deterministic models. And they give you a point estimate [Max]: in the end. But our... interest was basically really to see, well, we get a point estimate, [Max]: but we also want to see, especially when you project the path of Arctic sea ice, the [Max]: uncertainty around it. Well, how likely is it that maybe or that we see Arctic sea [Max]: ice disappearing, not at our point estimate in the 2060s or 70s, but beforehand? Like how

[Max]: large is the uncertainty? Maybe our model is really not good and the uncertainty is so [Max]: much all over the place that it's more or less useless. But yeah, and that project [Max]: was actually interesting to see that the uncertainty or the credible region was [Max]: basically spanning like 20 years, 25 years around. So that was very interesting. [Max]: And it gave us a quick quantification of uncertainty to it. Yeah, that was really, [Max]: really interesting.

[Alex Andorra]: Yeah, yeah, yeah. Nice. Uh, they, I love that. Uh, and I mean, I would [Alex Andorra]: have, that's really interesting for me to, to talk with someone who recently [Alex Andorra]: got into the Bayesian framework and to understand how you get into it and why, [Alex Andorra]: and, and how, uh, so I would have a lot of other questions on that, but [Alex Andorra]: I want to talk about football or soccer, so let's, let's switch to that and

[Alex Andorra]: then if we have time at the end of the episode, I'll come back with my, [Alex Andorra]: um, nerdy, uh. Educational questions. So yeah, basically you have an area or a hobby [Alex Andorra]: of yours where you do apply and need actually Beijing stats and that's [Alex Andorra]: soccer analytics. First, I read a bit your website and I saw you were a passionate [Alex Andorra]: football since you were a child and you mentioned a bunch of European championships.

[Alex Andorra]: Not the French one though. I was absolutely outraged. What happened? What [Alex Andorra]: happened? Like, don't you get the French games in Germany? [Max]: Oh yeah, well that's another issue. So when I was younger really, I mean it was only [Max]: the Bundesliga and sometimes when you were lucky, sometimes you got the highlights [Max]: of the French Premier League and the Serie A, but yeah you had to be really lucky, [Max]: it was not always available and I wasn't that...

[Max]: Yeah, I didn't know the websites where you could watch it basically. So [Alex Andorra]: Hehehehe [Max]: that was another issue. But yeah, the French, well, the French league, I was never [Max]: really a fan of. I'm sorry, Alex. But yeah, that's just even though one of my favorite [Max]: players was Joao Gopic. So Olympic [Alex Andorra]: Oh, [Max]: Rio. [Alex Andorra]: really? [Max]: Yeah, [Alex Andorra]: Oh, [Max]: yeah, yeah. So [Alex Andorra]: he went [Max]: yeah.

[Alex Andorra]: to Milan. Yeah. Um, [Alex Andorra]: yeah, no offense taken. I think the French league is pretty boring. Um, and, [Alex Andorra]: uh, yeah, as long [Max]: as [Alex Andorra]: as, [Max]: the Bundesliga. [Alex Andorra]: I mean, yeah, um, as long as PSG is dominating like that, uh, I mean, that's [Alex Andorra]: good for me because, um, I'm a PSG fan since I'm like five year olds, uh, [Alex Andorra]: but yeah, like, uh, it's not a very interesting league. And the level is

[Alex Andorra]: kind of going down by the gears. So hopefully we'll get some investors in other [Alex Andorra]: clubs, which make for a good competition for Paris, but until now it's really [Alex Andorra]: bad. And it's actually bad for Paris because the competition inside the country [Alex Andorra]: is really bad. So then when they get on the European stage, they are not [Alex Andorra]: really used to the intensity and having so much. adversity in a way. So,

[Alex Andorra]: yeah, it's too easy for them, let's say. So basically, but I didn't get you [Alex Andorra]: on the show to trash the French league. I want to talk about soccer factor [Alex Andorra]: model that you recently worked on. And I found it super interesting because [Alex Andorra]: that's mainly, yeah, the main question I always have in soccer analytics.

[Alex Andorra]: The nerd in me is always very careful about the hot takes that you see the [Alex Andorra]: commentators have about players where it's like, yeah, but what's the, how [Alex Andorra]: do you separate a player's skill from the ability, skills and ability from his [Alex Andorra]: team's strength? And that's to me is extremely important because mostly [Alex Andorra]: in Europe, right now, most of the clubs... mainly invest on players on gut

[Alex Andorra]: feeling, basically. And the thing is when you do that and you're not able [Alex Andorra]: to separate inherent player abilities from team strength, then you get [Alex Andorra]: kind of an aura effect from the beginning of your carrier that can follow [Alex Andorra]: you, even though you're not that good of a player, but basically, like [Alex Andorra]: this aura can follow you even though you are not making that much of a difference.

[Alex Andorra]: But it's just like, it's hard to contradict it because you don't really have [Alex Andorra]: the method of the scientific way of disproving basically what's going on. [Alex Andorra]: That actually, well, it's not really your inherent abilities but mainly the [Alex Andorra]: people you're surrounded with. And I think it's like absolutely important [Alex Andorra]: to do that and should lead to... really a revolutionized way of transferring

[Alex Andorra]: players and signing them and so on. So, that was basically the background

[Alex Andorra]: for people who are not interested in football. Even though, even if the field [Alex Andorra]: doesn't interest you, I think the method and the goal of the model is actually [Alex Andorra]: extremely important because you can also think about that in finance, for [Alex Andorra]: instance, like I know a lot more work has been done in finance for that [Alex Andorra]: because I mean, the return or. Basically, the incentives of the money are

[Alex Andorra]: much more important because you know if you make money or not. But I know [Alex Andorra]: there is a lot of literature right on basically passive investment versus [Alex Andorra]: active investment. And how do you actually prove that an active investment [Alex Andorra]: is better than a passive one and that it's actually due to the skills of [Alex Andorra]: the person who invested on the market instead of just random market fluctuation?

[Alex Andorra]: So you can see that in a lot of contexts where you can see that. Basically, [Alex Andorra]: information is sparse, is hard to decipher, and so you need a model to make [Alex Andorra]: sense of it. So you can see that, I would say, in football, in a lot of [Alex Andorra]: sports, in finance, in medicine also, right, where it's like you can have a

[Alex Andorra]: lot of these celebrity effect basically. I think in a lot of contexts where [Alex Andorra]: celebrity effect is important, it can be broken down by that scientific way [Alex Andorra]: of estimating it. So these... politics, of course, movie. I think it's basically [Alex Andorra]: a theme that's running in a lot of fields where the celebrity effect is [Alex Andorra]: extremely big. So yeah, that was a very long introduction. [Max]: Yeah.

[Alex Andorra]: But to say that, I think it's very useful. So you can react to what I said [Alex Andorra]: and also afterwards, if you can tell us what a factor model is. Because [Alex Andorra]: your model is very, You could lead the soccer factor model, but then can [Alex Andorra]: you tell us before that what a factor model is? [Max]: Yeah. No, Alex, I mean, you laid it out perfectly. I couldn't have said it any more

[Max]: accurately, I would say, really on the point as far as I see that. So a factor model, [Max]: what it actually is, is a factor basically as some, I would define it as [Max]: some proxy for a certain. exposure to a certain, in finance to a certain risk basically. [Alex Andorra]: Mm-hmm. [Max]: Also a reduction for example in when you look at economics or macroeconomics it's [Max]: often related to the context you have a huge set of features and you reduce it to

[Max]: a couple of underlying factors or a single factor only. It's a kind of a feature reduction [Max]: like dimensionally reduction technique like PCA. [Alex Andorra]: Mm-hmm. [Max]: principal component analysis or that. But in finance, it's really like a proxy for [Max]: a certain risk exposure that basically the cross-section of stock returns or all stock [Max]: returns are exposed to a certain systematic risk exposure. [Alex Andorra]: Mm-hmm.

[Max]: All stock returns are basically exposed to it. This is basically a factor. And [Alex Andorra]: Yep. [Max]: in the literature, and as surprising as identified, several of these and yeah. common [Max]: risk exposures basically across the whole universe of stocks basically. But as you already [Max]: said, you can use it also as quantifying the ability, for example, of a portfolio manager. [Max]: So if he has some skill in the game, basically if he has really superior selection

[Max]: potential, then just following along these. common risk exposures, basically. [Alex Andorra]: Mm-hmm. [Max]: And that's also what this Stalker Factor Model basically is inspired by, to identify [Max]: certain features that all players are exposed to because of the differences in the

[Max]: teams. And then when you account for that, then you can basically extract the skill [Max]: and the inherent ability of each player after you account for these systematic differences [Max]: across teams basically that influences [Alex Andorra]: Hmph. [Max]: the ability or the observed performance of a player. [Alex Andorra]: Yeah, yeah, for sure. Yeah, for sure. Because like in the example of football, [Alex Andorra]: like you'd say it's easier to be the number nine. So the, how do you say

[Alex Andorra]: in English that position, like the front, playing. Number nine is like the [Alex Andorra]: guy who's supposed to score the goals. Like the English natives can then [Alex Andorra]: tell me what the, the name is in French that would be Atacon. It's easier [Alex Andorra]: to be the number nine of PSG than the number nine of a very small team in [Alex Andorra]: France, because the whole, the rest of the team is stronger. The manager is

[Alex Andorra]: supposed to be stronger and so on. So, yeah, you're like, yeah, but maybe [Alex Andorra]: if you took the number nine of the small team and you put it in Paris, [Alex Andorra]: maybe he would perform as well as the current number nine does. So how do [Alex Andorra]: you make the difference? So that's what we're going to talk about. Before [Alex Andorra]: that, I'm curious, from a structural standpoint, these kind of factor models, how

[Alex Andorra]: do they work? How much time do you need to really start to decipher the [Alex Andorra]: difference between inherent skills and exhaustion as basically strength? [Alex Andorra]: And that question is basically, how much data you need from the past years [Alex Andorra]: to start having an idea like how data hungry are those models.

[Max]: Yeah, so that's definitely a good question, a good point. So you have to create these, [Max]: yeah, you have, so in the model that I'm basically proposing is, Basically, I need [Max]: a lead time into the season to really account for certain differences. So I need [Max]: a couple of games already that [Alex Andorra]: Mm-hmm.

[Max]: would need to be played to really account for differences in teams. Because before the [Max]: first game, basically, everything, or based on the data that I had, everyone [Alex Andorra]: Mm-hmm. [Max]: would have been the same. [Alex Andorra]: Mm-hmm. [Max]: But it depends really on the data. If you have data that allows you to account for [Max]: differences across teams, batch it. [Alex Andorra]: Mm-hmm. [Max]: let's say, or so you can just start right [Alex Andorra]: Yeah.

[Max]: away. And for overall data, I would say like more data is always better. If you have [Max]: only a few observations, I think the Bayesian framework is then tailor made for [Max]: that as well. Like it's yeah, it grants you some leeway there. But I would say really, [Max]: it's the more data you have, the better. But yeah. [Alex Andorra]: But you could already, OK, so you could already start having that idea with

[Alex Andorra]: just a few games. Then you get the idea of the strength of the team. And then [Alex Andorra]: you can start deciphering the strengths of the player. OK. [Max]: Exactly, exactly. [Alex Andorra]: Yeah. [Max]: But as far as I always used a certain number of, let's say, burn-in games [Alex Andorra]: Yeah, [Max]: to [Alex Andorra]: yeah, [Max]: really account [Alex Andorra]: yeah. [Max]: for that.

[Alex Andorra]: Yeah. And I mean, it's not that superficial, right? Because you can think like [Alex Andorra]: right now it's August, it's the beginning of the leagues for the European [Alex Andorra]: teams. August is a weird moment where the teams are still warming up basically. [Alex Andorra]: Um, and they are not really, they are clearly not at peak performance. Usually [Alex Andorra]: they try to peak around spring for the Northern hemisphere. So around March.

[Alex Andorra]: from February to May, basically, they are trying to get their peak. So they [Alex Andorra]: are still warming up. They can still trade players until the end of August. [Alex Andorra]: So you could really say that the games they are doing in August, even though [Alex Andorra]: they are official games, they are still warming up games and don't really [Alex Andorra]: mean a lot for a long-term performance perspective. So that's an interesting moment

[Alex Andorra]: to start warming up the model, I'd say. And so, but something I mean, and [Alex Andorra]: maybe you have that for future iterations of the model where you could put

[Alex Andorra]: in the priors. Um, we're going to talk about the structure of the model, uh, [Alex Andorra]: right away, right after that, but, uh, something I'm thinking about is that [Alex Andorra]: you could put in the prior, the information that you have about the strengths [Alex Andorra]: of the team in, in the way that, yeah, you have the budget, which is a good

[Alex Andorra]: proxy for potential future performance. But also, like, just past performance. If you [Alex Andorra]: know that Paris has been the champion for nine years out of 10, well, you [Alex Andorra]: have really good prior about the strengths of the team. So you can [Max]: Okay. [Alex Andorra]: probably also add that into the model and in that way reduce the warming [Alex Andorra]: up period of the model.

[Max]: Yeah, no, absolutely. Or how Paris against Lyon, let's say, has performed in [Alex Andorra]: Yep. [Max]: the past. So they're direct comparison between those teams, basically, when they faced [Max]: each other for past years. That would also feed in there. Yeah, so absolutely. There's [Max]: a lot of potential. And my model is, [Alex Andorra]: Mm-hmm. Yeah. [Max]: when you're basically suggesting this stuff, my model just appears very rudimentary.

[Max]: But it could be definitely. extended in that regard. [Alex Andorra]: Yeah, I mean, that's the fun thing of model and rights. It's like you have [Alex Andorra]: to start somewhere that's good enough, and then you have a lot of ideas to [Alex Andorra]: extend it. And it's a never-ending endeavor. Like, each model, if you want to [Alex Andorra]: do your good work on it your whole life, if you're interested enough, you

[Alex Andorra]: definitely can do that. I know my models that I often revisit are the ones [Alex Andorra]: for predicting French presidential elections. when I started doing that in 2017 [Alex Andorra]: and compared to the one I had for 2022, it's just embarrassing. [Alex Andorra]: But in a way, it's good that the work you're doing right now is the best

[Alex Andorra]: one you've ever done. And in a few years, when you look at the work you're [Alex Andorra]: doing right now, it should be the worst you've ever done because that means [Alex Andorra]: you've... progressed a lot in the meantime. So I think it's a good mindset. [Alex Andorra]: So how did you adapt that factor model for soccer? Like how, what does the model [Alex Andorra]: structure look like basically for listeners to have an idea? And for those

[Alex Andorra]: watching on YouTube, you can share your screen actually. So if you want [Alex Andorra]: to share anything at some point, feel free to do it. Otherwise, the audio format [Alex Andorra]: is here for you because it's a podcast. So it's an audio first content. [Max]: Perfect. Yeah. So yeah, maybe if I get it on the screen, I'll do that. But for now,

[Max]: maybe the structure, I think, is pretty simple. And as you laid it out already very, [Max]: very accurately, it's basically trying to come up with some features, do some feature [Max]: engineering that basically accounts for differences across teams. And well, when you [Max]: look at, let's say, player a certain player, let's say, Cristiano Ronaldo. And you [Max]: really want to account for the difference that his current team is currently between

[Max]: his team and the team that he's facing at that exact instance. And you want to create [Max]: some features that can proxy for these differences across teams. And that's basically [Max]: the heart of the model. And this is basically inspired by these asset pricing factors that [Max]: try to account for. differences across assets, across stocks, across firms, basically.

[Max]: And the modeling part itself is really nothing sophisticated. You can include kind [Max]: of a hierarchical structure where you don't need to, but it can help, definitely. [Max]: But it's really the feature engineering that is at the heart of it. And then IMC comes [Max]: in very conveniently and just basically. That's the dirty work for you.

[Alex Andorra]: Mm-hmm. And so what's the, so then that's cool. If it's a simple structure, [Alex Andorra]: yeah, can you talk about what was your likelihood [Max]: Thanks [Alex Andorra]: and then [Max]: for watching! [Alex Andorra]: what kind of distribution you put on the parameters and things like that? [Alex Andorra]: I think it would be a fun thing to talk about for the listeners. [Max]: Sure, sure. Then maybe I just get the workbook loaded. [Alex Andorra]: Oh yeah.

[Max]: So maybe I can share my screen and couple [Alex Andorra]: Yes, [Max]: of... [Alex Andorra]: you should be able to. [Max]: Let me see. [Max]: So in terms of a likelihood, basically, or what the model structure is, so I have to [Max]: proxy, I need some observed measurement of a player's performance. Not [Alex Andorra]: Yes. [Max]: a skill, I mean, that is something that is underlying, that is latent, that we want [Max]: to identify. [Alex Andorra]: Mm-hmm.

[Max]: But we need some observed measure of player performance. What I used is scoring [Max]: goals. Did players score a goal in a certain game or not? So basically, 0, 1, basically [Max]: binomial distributed, and basically, the logistic regression it is. You want [Alex Andorra]: Yeah. [Max]: to identify the probability of a player's scoring. And so now I have it. I guess I have [Max]: it here. [Alex Andorra]: you may have [Max]: Um, [Alex Andorra]: to authorize Google Chrome to share.

[Max]: exactly, [Alex Andorra]: Oh [Max]: exactly. That [Alex Andorra]: yeah. [Max]: unfortunately takes a bit of time. Um, Sorry, I guess I'll be [Alex Andorra]: Yeah, [Max]: here in a second. [Alex Andorra]: it's all good. Yep. It's all good. You can do that and [Max]: Okay. [Alex Andorra]: come back. I don't know what's going to happen for the recording, but I already [Alex Andorra]: did that. After all, it's no problem. [Max]: Sorry, I didn't.

[Alex Andorra]: I mean, it's the first time I do it. So I didn't know it either. [Max]: Ah, okay, here it is. Wait. [Max]: Is it Joe? Ah. [Alex Andorra]: So [Max]: No. [Alex Andorra]: I think you need to give permission. [Max]: Yeah, exactly. That's [Alex Andorra]: And open [Max]: one. [Alex Andorra]: your computer system settings and click privacy and security. [Max]: Well, maybe.

[Alex Andorra]: Apparently, if you open your system settings, and then you go [Max]: Yeah [Alex Andorra]: to privacy and security, and you click screen recording, and allow your [Alex Andorra]: browser to share your screen. I think you need to allow Google Chrome to [Alex Andorra]: share your screen. [Max]: mm-hmm yeah I was there but ah yeah okay now [Alex Andorra]: I mean [Max]: maybe [Alex Andorra]: otherwise it's no chip. [Max]: Okay. [Max]: Okay. Sorry for that.

[Alex Andorra]: So let's see. [Max]: No? That's what I wanna [Alex Andorra]: Yeah, it [Max]: do with [Alex Andorra]: may [Max]: that guess. [Alex Andorra]: be [Max]: Sorry. [Alex Andorra]: because you have to get out to quit Google Chrome and then come back. Are [Alex Andorra]: you on Mac? [Max]: Yeah, yeah, exactly. [Alex Andorra]: Yeah, so you probably need to close Google Chrome and then come back. But [Alex Andorra]: you can do that. And then you come back to the same link I sent you.

[Max]: Okay. [Alex Andorra]: And then it should work. Maybe I'll have to [Max]: OK. [Alex Andorra]: do another recording, but that's OK. I can edit that after once. It's easy. [Max]: Okay, okay. [Alex Andorra]: So I'll wait for you here. Yeah. [Max]: Okay. I'm back Alex. Sorry. [Max]: Sorry Alex, I cannot hear you currently. [Alex Andorra]: Yes, that's normal. I was muted. So cool. I didn't even have to start a new

[Alex Andorra]: recording. You can just join the room again. Cool. First time it happened, [Alex Andorra]: so I didn't know what would happen. So cool. Perfect. So does it work now? [Max]: Let's [Alex Andorra]: Let's [Max]: see. [Alex Andorra]: try. [Alex Andorra]: No, [Max]: No, [Alex Andorra]: still not. [Max]: no, [Alex Andorra]: That's weird. [Max]: no. I'll give it a last try and otherwise I just.

[Alex Andorra]: Yeah, otherwise it's okay, but... [Max]: Yeah, Google [Alex Andorra]: It's [Max]: Chrome, [Alex Andorra]: weird. [Max]: it's there. [Alex Andorra]: It should work. [Max]: I allowed it. So I don't know, Google Chrome, it's fine. It can access.

[Max]: but [Alex Andorra]: I'm checking that it could be on my end maybe, so... [Max]: screen [Alex Andorra]: Yeah, no, [Max]: the [Alex Andorra]: on [Max]: window [Alex Andorra]: my end also it's all good, so... [Max]: And I'm sorry, no, fortunately it doesn't [Alex Andorra]: No, weird. [Max]: work. [Alex Andorra]: Anyways, that's OK. So well, then let's continue between the [Max]: Thanks for watching. [Alex Andorra]: screen sharing. You can just talk through it. It's no problem.

[Max]: Okay. [Alex Andorra]: I've [Max]: Yeah. [Alex Andorra]: done it. We've done it for a lot of podcast episodes. [Max]: OK. Yeah, so the structure basically is relatively simple. You need some idea of [Max]: what the performance of the player is. And you have to have a proxy for that. [Alex Andorra]: Mm-hmm. [Max]: And well, you need this performance to be observed, obviously. And the proxy that [Max]: I choose for a player's performance is whether he scores a goal or not, so 0 or 1

[Max]: in a certain game. We're normally distributed our y, our target. And it's basically a logistic [Max]: regression that we are running. Because what we want to identify is really the skill [Max]: and the ability, latent variable hidden in our observed performance measure, basically. [Max]: And so the model is pretty simple. You need the prior. You have basically a bunch [Max]: of coefficients. That is, you have the alpha. the skill, the ability that you're interested

[Max]: in. And then you have the loadings, the coefficients on all the factors that are in [Max]: your model. So you basically have to impose priors for all the coefficients. [Alex Andorra]: Mm-hmm. [Max]: And then you have to define the likelihood, the newly distributed. And yeah, that's basically

[Max]: the model. It's on the workbook. And people can go through it. There's also a redacted [Max]: version, basically, where you're People, if they are fancy, can try to work with their [Max]: own priors and all that and try to do it themselves first and check the unredacted [Max]: version. [Alex Andorra]: Oh, that's cool. [Max]: So they want to play with that a bit. [Alex Andorra]: Nice. [Max]: Yeah, that's basically it. So it's nothing really crazy. It's the four lines of code,

[Max]: the basic model, basically. And yeah, when you look at multiple players, so you can [Max]: do that for a single player only, but you can also do that for sure for multiple [Max]: players. The key reason is that. Basically, everyone should be exposed to the, each player [Max]: should be exposed to these factors with the same loading basically. So you can expose,

[Max]: impose a hierarchical structure on the ability and skill of each player. You should [Max]: definitely do that, but you can post the hierarchical structure by player or also [Max]: by season. So the ability of the player may evolve over seasons or across seasons basically. [Alex Andorra]: Mm-hmm, mm-hmm, yeah. [Max]: That's, I think. something worth looking into or worthwhile doing. And then basically [Max]: you have the loadings on the factors and they should account for the team effort

[Max]: basically. You want to account that and you want to get that out of the way so that [Max]: you're basically in the end left with this latent factor, the alpha, the inherent [Max]: skill and ability of the player. [Alex Andorra]: Yeah, yeah, yeah. OK. Yeah, that makes sense. And I mean, for sure, I will [Alex Andorra]: put all of these in your episode's show notes. And actually, I think I can share [Alex Andorra]: my screen. I didn't know why I didn't think about that before. And here

[Alex Andorra]: is the notebook, right? Am I on the right notebook? [Max]: Exactly. [Alex Andorra]: Yeah, perfect. [Max]: Yeah, yeah, [Alex Andorra]: So. [Max]: yeah. So there are a couple of notebooks there. So there's this in the Pyamicon folder, [Max]: that's the one where there's the redacted version and the unredacted version and the [Max]: version that we're currently looking on. That's the initial part with all its typos [Max]: in there.

[Alex Andorra]: Ah ok, so it's not the right one. Then, should look at [Max]: It's [Alex Andorra]: another [Max]: fine [Alex Andorra]: one. [Max]: one, so it's perfect. The other one is just a bit smaller and more concise, I would [Max]: say. [Alex Andorra]: Ah, here. Unredacted. Perfect. Yeah, I have it here. So yeah, like for those

[Alex Andorra]: of you watching on YouTube, I'm charging it right now. And so basically, [Alex Andorra]: this is the part of the model where you're talking about the likelihood, [Alex Andorra]: where it's goal is scored or not scored. And then you have here the probability, [Alex Andorra]: which is basically here. this alpha that you talked about, right? That [Max]: Exactly. [Alex Andorra]: is the inherent skill of the player which enters probability. And you have

[Alex Andorra]: the Xs and the beta. So the Xs, are they the factors or the beta are the [Alex Andorra]: factors? [Max]: So the Xs are the factors. These are the differences across the teams or between [Max]: the teams. And this is what you want to basically account for and to clean the observed [Max]: performance measure from. Yeah. [Alex Andorra]: Yeah, yeah. Oh, yeah, OK. Yeah, for sure. And then the beta is the slope, basically, [Alex Andorra]: on the factors. Yeah, yeah, [Max]: Exactly.

[Alex Andorra]: yeah. Yeah, yeah, it's a fun model. So of course, it's hard to make it just [Alex Andorra]: this on the podcast. But I encourage you to go and watch that part on YouTube. I'm [Alex Andorra]: sharing it right now. And also, you can just take a look at the notebook from [Alex Andorra]: Max, which I put in the show notes, where you have all the details. So it's [Alex Andorra]: pretty fun to look at. And also, as you were saying, the model is pretty small.

[Alex Andorra]: So that's the amazing thing that I find is that basically, and now if we [Alex Andorra]: go look at the Prime C implementation, so a bit later [Max]: Oh. [Alex Andorra]: down in the model, the really cool thing is that basically the model is quite [Alex Andorra]: easy to code, right? And in a way, that's just a few lines of codes, so [Alex Andorra]: basically four lines of codes, as you were saying, and you're done. So that's

[Alex Andorra]: the beauty of the probabilistic programming framework, right? It's a really [Alex Andorra]: useful model. But if you want to get to a first good enough version that [Alex Andorra]: already gives you interesting insights, you don't have to reinvent everything. [Alex Andorra]: And you don't have to go with the first, hardest version from the start, [Alex Andorra]: where you have a hierarchical time series model where everything is varying

[Alex Andorra]: and pulling information. Sure, that's cool. But don't start with that. It's [Alex Andorra]: like if you're starting to train, don't start with 100 push-ups. Start by like [Alex Andorra]: try five first, and then do a few series of them. build your way up to

[Alex Andorra]: 100. So that's the critical thing I find of here at the patient framework [Alex Andorra]: coupled to the part of probabilistic programming languages, which is you can get [Alex Andorra]: down to a first good enough version and then in a few lines of codes having [Alex Andorra]: your version and then sampling from it. Because here you have it on the screen.

[Alex Andorra]: The likelihood that you have a line for deterministic, which is the. logistic [Alex Andorra]: regression line, and then you have your intercept and your coefficient on [Alex Andorra]: the factors. And basically that's it. That's really amazing. [Max]: Absolutely. No, that's, I think, the beauty of Climacy that it allows you to describe [Max]: or build your model in a pretty intuitive way. And you can even let it be printed out

[Max]: to see if everything is as you would have expected. And [Alex Andorra]: Yeah. [Max]: yeah, then Climacy does the dirty work, the sampling and all that for you. And yeah, [Max]: but it already gives you an intuitive idea of how the modeling works. And yeah, that's [Max]: absolutely [Alex Andorra]: Yeah, yeah, yeah. [Max]: super [Alex Andorra]: No, [Max]: cool. [Alex Andorra]: it's really fun. Well done on that. And so I'm curious, what are your, do

[Alex Andorra]: you have any ideas? Do you want to keep working on this model? Do you have [Alex Andorra]: any ideas on where to take it from what it is right now? Um. [Max]: Yeah, that's a good question, actually. So definitely the model can be improved. And [Max]: definitely, it's all depending on the features that you have and the data that you [Max]: have. And I think the clubs, they have so much more [Alex Andorra]: Yeah.

[Max]: interesting data than I have. And they could build many, many more interesting factors [Max]: according to our differences across [Alex Andorra]: Oh yeah, [Max]: teams. [Alex Andorra]: for sure. [Max]: So yeah, I really don't know because I tried to reach out to a couple of clubs, [Max]: let's say. But I don't know. there was nothing really coming back. So yeah, apparently, [Max]: perhaps they're not interested in that or maybe they have their own models already

[Max]: or something. So I really don't know. I'd be excited to work on that. But as you [Max]: said, it's rather a side project that I did once upon a time. And yeah, it's not [Max]: really related to economics or finance. That's why I'm currently working absolutely [Max]: on other stuff. But yeah, I would love to work on that in that regard. But yeah, it [Max]: seems not. not so many teams are picking up on that, at least to those that I reached

[Max]: out. And it seems to be European clubs. Um, because in part of your last episodes,

[Max]: I heard people talking about that in the United States, it's pretty different. And, [Max]: um, yeah, uh, there are a lot of, apparently a lot of clubs already trying to implement [Max]: that to really try to understand the inherent latent skill of, of players, not necessarily [Max]: in soccer, but in baseball or in [Alex Andorra]: Yeah, [Max]: other, [Alex Andorra]: oh, especially [Max]: um, in other disciplines.

[Alex Andorra]: baseball. Yeah, yeah, yeah. So this is sad, but I'm kind of reassured to [Alex Andorra]: hear you say that because I do think it's a huge area of improvement that [Alex Andorra]: there is in Europe. And clubs just don't seem to be very interested. The [Alex Andorra]: thing I know is that a few English clubs are using data pretty heavily, like Liverpool.

[Alex Andorra]: Manchester City, clubs like that, but still is kind of the exception. I [Alex Andorra]: know Toulouse now in France, which is a small club, and that makes sense. [Alex Andorra]: If you're a small club, you have less money, so you have much more competitive [Alex Andorra]: pressure to find good players, which you are not overpaying, which is basically

[Alex Andorra]: where science can help you. You don't want to pay for just a name. You [Alex Andorra]: want to pay for someone who has a name because... he's got talent, not [Alex Andorra]: just because he's got a name. So it's like, to me, everybody should do that. [Alex Andorra]: And I just don't understand why they don't. Because it's just like, that's [Alex Andorra]: also the beauty of sport, right, you don't care about the name, you care about

[Alex Andorra]: what someone can do and if they have talent or not. Like, you should not care [Alex Andorra]: at all about the name, about the color of the skin, about nothing else, [Alex Andorra]: but what they can do on the field. And... Yeah, like to me that if I had [Alex Andorra]: a club, that would be one of my first priority. How do we make sure we optimize

[Alex Andorra]: the way we are signing the players because it costs a lot of money. So. [Max]: I think one club that also does a lot of that data work is in Denmark, the FC Midjartland [Max]: or something. I think [Alex Andorra]: Uh-huh. [Max]: the name I got it completely wrong. But I heard once upon a time that they're really [Max]: investing a lot in data science and trying to assign players according to data or at least

[Max]: incorporate data a lot in their daily training exercises and all that. So yeah, they [Max]: are one of the cutting edge maybe there in Europe as well. Small club, [Alex Andorra]: Mm-hmm. [Max]: but yeah. I think they won the Danish Championship a couple of years ago.

[Alex Andorra]: Yeah, not surprised. I mean, something I see a lot, at least in France, [Alex Andorra]: and I've seen that a lot also on electoral forecasting, is basically this [Alex Andorra]: idea that if you start doing that, you're basically becoming kind of inhuman [Alex Andorra]: and you make players being robots. Basically, that's really an interesting thing [Alex Andorra]: to me because one of the spots that really use data heavily is cycling. A

[Alex Andorra]: lot of the teams are using now data. Here, again, thanks a lot to the British, [Alex Andorra]: which often in Europe are the first ones to take up the data wave. And so [Alex Andorra]: I know, for instance, Bradley Wiggins, I think he's won the Tour de France. [Alex Andorra]: I don't remember how many times, but a lot of times. And basically, a lot. The

[Alex Andorra]: whole team was using data to optimize the performances of the team. And [Alex Andorra]: that was one, like the British started being like, okay, we need to get back [Alex Andorra]: on our circling game. They started using data extremely optimally and well, they [Alex Andorra]: did. And thanks to these, basically a lot of the teams started to do that again.

[Alex Andorra]: And the Tour de France is extremely optimized on that. But it's funny because when [Alex Andorra]: you hear the mediatic coverage of that, at least in France, it's a bad thing [Alex Andorra]: because it's like players are becoming robot. and they cannot eat what they [Alex Andorra]: want at the time they want. And they like, it just gets the magic out of [Alex Andorra]: the Tour de Francois and I strongly disagree with that, of course, because the

[Alex Andorra]: performances get better in a clean way, of course. Well, then that's just [Alex Andorra]: better for everybody because the show is going to get better. And also We're [Alex Andorra]: talking about the Tour de France or professional athletes. Like the goal is [Alex Andorra]: not to recreationally do that. They do that for a living. Um, so it's important [Alex Andorra]: for their own basically income. Uh, but also they do that because they want

[Alex Andorra]: to be the best. Is it, they are not doing that because, well, they just [Alex Andorra]: want to cycle on the weekends, right? They cycle for living. So yeah, sure. [Alex Andorra]: If you're an amateur cyclist, then okay. You don't need the same. structure [Alex Andorra]: as a professional cyclist. But even then, if you want to improve your performances [Alex Andorra]: as an amateur cyclist, you're going to need to optimize some of the things.

[Alex Andorra]: And if you really care about it, you're going to need to optimize your nutrition, [Alex Andorra]: for instance, and maybe when you take your meals or else. But if you're [Alex Andorra]: a professional, the one slightest change can mean you're going to have to take [Alex Andorra]: your meals or else. perform one second better or two seconds better, which [Alex Andorra]: can make you win the Tour de France or not. So I don't understand this argument

[Alex Andorra]: in these contexts where you're trying to optimize performance. For me, it's [Alex Andorra]: like not something that should count here. They are not doing that for pleasure [Alex Andorra]: only. [Max]: I think absolutely agree. Absolutely agree. It should be incorporated much more, [Max]: especially for the clubs. In the end, I think it will pay off as you lay it out. [Alex Andorra]: Mm-hmm. [Max]: You want to pick a lemon, and [Alex Andorra]: Yeah.

[Max]: you just rather pick it. Yeah. [Alex Andorra]: Yeah, yeah. No, I mean, I have to say it's like, it's an interesting topic [Alex Andorra]: for me because I'm trying to crack that nut and I cannot crack it for now. [Alex Andorra]: Like, understand why basically the clubs in Europe are not really interested [Alex Andorra]: in that. Because I don't really care about the Chinese side or else. I'm like, [Alex Andorra]: once the club starts picking that up, then everybody will have to. But what

[Alex Andorra]: I'm trying to understand is why the clubs don't do that. because it's just [Alex Andorra]: leaving gates on the table. And I'm just super curious about why they would [Alex Andorra]: do that from a sociological standpoint, honestly. Because I've seen a lot [Alex Andorra]: of clubs using, they have data science teams, but they use it for marketing. [Alex Andorra]: That's [Max]: I see, [Alex Andorra]: such a [Max]: I [Alex Andorra]: shame. [Max]: see.

[Alex Andorra]: And I don't know why. So if anybody [Max]: there. [Alex Andorra]: knows, please get in touch. If anybody is working in a club, please get [Alex Andorra]: in touch with Max or me, because I want to know about it. We don't even need

[Alex Andorra]: to work together. I would be happy to help you out with a model, but for [Alex Andorra]: now, I just want to know why and what are the internal factors, because [Alex Andorra]: definitely there is something going on, but I don't know what it is, and [Alex Andorra]: I'm just curious about it. So yeah, to try and make it a bit more constructive, [Alex Andorra]: do you have any idea on how we personally in the data world could change

[Alex Andorra]: the status quo in that regard? And not only for spots, but that's also true [Alex Andorra]: for a lot of domain where more robust application of the scientific method [Alex Andorra]: would be useful. But it's hard to get it done. Do you have any ideas personally [Alex Andorra]: on how that status quo could be changed? [Max]: Yeah, I think it's really hard to say. It depends on the willingness to adopt these,

[Max]: to be open to these methods, I would say. And the players play an important part, [Max]: or I think the crucial part, because if the players are not willing to adopt these [Max]: additional insights, I would say, it's just not possible. [Alex Andorra]: Mm-hmm. [Max]: But for sure, I mean, as you say, it's management, it's internal. things that are [Max]: going on there, politics potentially, but I really don't know. How can someone resolve

[Max]: that? I don't know. I regard it always as, for sure, you shouldn't base all your decisions [Max]: on this model or on a single model or so, but it can help [Alex Andorra]: No for sure. [Max]: stimulate your decision process, and I think it's a useful addition. And in the [Alex Andorra]: Yep.

[Max]: end, for sure, there might be an upfront cost, basically, to implement, to get the data, [Max]: to implement the model, to hire people to produce that, but In the end, it actually [Max]: may pay off economically because it may save you from picking a lemon overpaying massively. [Alex Andorra]: Oh yeah, for sure. [Max]: So [Alex Andorra]: Yeah, yeah. [Max]: yeah, I see it really as a worthwhile investment. [Alex Andorra]: No official, [Max]: I think the US [Alex Andorra]: yeah.

[Max]: sports has demonstrated that. [Alex Andorra]: Yeah, yeah. I mean, just look at the US, just look at all the other fields, [Alex Andorra]: especially marketing, for instance, which is starting and already started to adopt [Alex Andorra]: data analysis and modeling aggressively and they just like, we do that all at the labs, [Alex Andorra]: basically making them save a lot of money and not only save money, but make

[Alex Andorra]: more money. So like, it's just, yeah, like, I don't think this is a question, [Alex Andorra]: but yeah. I mean, something you can do. I would think if you're interested [Alex Andorra]: in it and have the time, something maybe that could work is if you could make [Alex Andorra]: some predictions with your model, basically. And I would think to get it per [Alex Andorra]: player, you would probably need some hierarchical structure in that to get

[Alex Andorra]: some better predictions. But once you get there, you have something of a [Alex Andorra]: web page with basically the predictions of the model per player saying [Alex Andorra]: basically, this player is basically overvalued and this player is undervalued, [Alex Andorra]: basically based on the results of the model. And then basically see what that [Alex Andorra]: gives you during the season because at the beginning of the season, you

[Alex Andorra]: can see that player is basically undervalued. He's gonna perform better than [Alex Andorra]: what the market currently think. And then people see that it's true. All that's [Alex Andorra]: a clear sign that basically these kind of... methods and models are working [Alex Andorra]: and so that could spark some interest. Um, because definitely demonstrating [Alex Andorra]: what a model is for. Because I'm my hinge, hinge hunch. I think it's hunch.

[Alex Andorra]: My hunch is that, um, basically the decision makers in the clubs are not data, [Alex Andorra]: um, they don't, don't really know what data is about. and they even don't [Alex Andorra]: know what a model is and what it can give you. But if you are able to demonstrate [Alex Andorra]: what a model can give you, because they don't care about the model, the priors, [Alex Andorra]: the parameters, stuff like that, they just care about the results of the model.

[Alex Andorra]: So if you can demonstrate the results of the model and even better what the [Alex Andorra]: model can say about recruiting that player or not recruiting that player, [Alex Andorra]: that would maybe have a better impact, or at least I would say it increases [Alex Andorra]: the probability that the impact... These methods can help get noticed.

[Max]: Oh, absolutely. That's absolutely the case. For sure, it depends on having the real-time [Max]: data, basically getting the real-time data. [Alex Andorra]: Exactly. Yeah. [Max]: That's an upfront cost that you would have to pay. No, but that's actually the intent, [Max]: really. This is the intent to run that model for multiple players as part of the workbook, [Max]: for example, to lay it out and to compare which players perform well or not. And you

[Max]: see it, for example, Cristiano Ronaldo, when he won the. player of the year award in [Max]: 2008. He was basically in the middle of the pack in [Alex Andorra]: Mm-hmm. [Max]: that season. So there were other players actually outperforming, for example, Imera [Max]: Berbertov in that very season. He was playing for Tottenham [Alex Andorra]: Yeah.

[Max]: later on in the year, thereafter signed by Manchester United. So you see that. And [Max]: for sure, there's a lot of subjective judgment coming in from when you observe it [Max]: and you see the model telling you something completely different. But this is stimulating [Max]: and it should potentially update your priors, so [Alex Andorra]: Yeah, [Max]: your [Alex Andorra]: exactly. [Max]: subjective price.

[Alex Andorra]: Yeah. And forces you to lay out your priors clearly [Max]: Thank [Alex Andorra]: and [Max]: you. [Alex Andorra]: on paper. So it's actually very important. Yeah. So I would say definitely [Alex Andorra]: something like that. And if you have the predictions for the biggest number [Alex Andorra]: of players on a webpage and basically betting based on the model, saying [Alex Andorra]: that this model, this player is going to over perform. in respect to the

[Alex Andorra]: market or underperformed in respect to the market. That's an interesting [Alex Andorra]: thing. And also, as you were saying, for the individual rewards, where the [Alex Andorra]: name is extremely, like, counts a lot, where you can see someone like Messi, [Alex Andorra]: who is, yeah, sure, an incredible player. But the number of times he's got the [Alex Andorra]: golden... How is it called in English? Ballon d'or? Golden ball, I don't

[Alex Andorra]: know. You could argue that some of these seasons where he did get the reward, [Alex Andorra]: maybe there were other players who were actually overperforming him, but they [Alex Andorra]: don't have the name recognition, so they are not scrutinized as much. They don't [Alex Andorra]: have the confirmation bias going in their favor, where it's like everybody's [Alex Andorra]: looking at Messi because they already know he's extremely good, so they just

[Alex Andorra]: look at confirming the fact that he's... Incredible, which he is, but maybe [Alex Andorra]: not all the time, so as to get so many rewards. So yeah, like that. To me, [Alex Andorra]: that would be a really good way of demonstrating the utility of these methods. [Alex Andorra]: Basically, [Max]: Thank [Alex Andorra]: making [Max]: you. [Alex Andorra]: it really concrete for the decision maker. [Max]: Thank you.

[Alex Andorra]: So before we close up the show, I'd like to get back a bit on your personal [Alex Andorra]: experience with bass. And I'm curious, what was your main pain point on this [Alex Andorra]: project, the Sucker Factor model, and just in general, when you're using the [Alex Andorra]: bassian workflow, what is your main pain point right now? [Max]: Yeah, so in that project, I really have to admit that Mayer was lucky. But [Alex Andorra]: Yeah.

[Max]: there wasn't really a huge pain point. I mean, it's not [Alex Andorra]: Uh-huh. [Max]: something publishable for a paper or so. It's just basically sketching the idea [Max]: behind the model and basically showing the outline of the model, what it can give [Max]: you. [Max]: pretty well. I didn't really, I don't remember any really big problems. So then when

[Max]: I looked at the model evaluation, everything looked fine. I mean, for example, we can evaluate [Max]: the how well the model works is when you look at in this logistic regression at [Max]: the area under the curve, for example, it's a popular metric. And it wasn't a reasonable [Max]: ballpark. And that was fine for me so that the model didn't the results were really [Max]: what you would have, or that it's kind of reliable, the results. So that was not much

[Max]: of a pain point. And that was also nice for me to see that, yeah, it's a simple model [Max]: and it works also pretty simply. And yeah, that was a project that I was pleased [Max]: to see that there were not many obstacles that I had to overcome.

[Alex Andorra]: Nice. Yeah, that's good to hear. And so in general, in the Bayesian workflow, [Alex Andorra]: do you identify something in your own learning that is costing you to learn [Alex Andorra]: right now, that has cost you to learn, and you would like an easier way [Alex Andorra]: to have learned that? [Max]: I mean, I have to say that, for example, with all the different samplers that are out [Max]: there, that's not my major field. I would like to learn much, much more about the inner

[Max]: workings of all these samplers. I mean, I code maybe one of the simpler ones, myself [Max]: maybe once or so, but then I really resort to open source packages for that. But to really [Max]: understand what's going on, I think, yeah. looking deeper into that, that's definitely [Max]: something I would like to do and would need to do. [Alex Andorra]: Mm-hmm. [Max]: But yeah, I think that's basically the math of it. I think it's the most fascinating

[Max]: stuff and how it really works and how it's then implemented in code. I think that's [Max]: the most fascinating stuff. But yeah, the beauty of PyMC then is if you really are [Max]: interested in the outcome and want a fast outcome, yeah, it's pretty intuitive. [Max]: Yeah. [Alex Andorra]: Nice. OK. Well, it's good to hear. Yeah, and I'm asking that from a developer [Alex Andorra]: perspective and also teacher perspective. That's always interesting for

[Alex Andorra]: me to get a peek in the learning experience of the people. Cool. So before we [Alex Andorra]: close up the show, is there a topic I didn't ask you about and that you'd [Alex Andorra]: like to mention? [Max]: Well, actually, my career hasn't progressed so much so far. So I think we covered everything [Max]: there. So, oh yeah, that's pretty interesting. And yeah, you covered actually everything. [Alex Andorra]: Awesome. Yeah, we did record for a long time, so that's a price.

[Max]: Thank you. [Alex Andorra]: Yeah, and I'm happy. I got to ask you the main thing I wanted to ask you, [Alex Andorra]: so that's super cool. In a reasonable amount of time, I'm sure the listeners will [Alex Andorra]: appreciate it, because the last two episodes were the two longest of the whole [Alex Andorra]: podcast. So it's good to get back to reasonable amounts of time for people, [Alex Andorra]: I guess. And yeah, so before letting you go, I'm gonna ask you the last

[Alex Andorra]: two questions I ask every guest at the end of the show. So Max, if you had [Alex Andorra]: unlimited time and resources, which problem would you try? [Max]: Yeah, so I think one of the most popular answers is climate change. And definitely, [Alex Andorra]: Mm-hmm. [Max]: it's, it's probably the most present problem, especially here in Milan currently. [Max]: You really feel it.

[Alex Andorra]: Ha. [Max]: But when I've been or throughout the time I've been working on a bit of climate [Max]: econometrics, let's say, forecasting RTC, as I saw what people are really doing [Max]: in climate and what, yeah, they're fascinating people out there very, very intelligent people.

[Max]: So I think my throwing money on me would be wasted in that regard. I mean, what I'd [Max]: be rather interested in is like, yeah, maybe implementing that into sports into sports [Max]: analytics, right to, to allow teams to access data to have access to data, and [Alex Andorra]: Mm-hmm.

[Max]: to kind of create that level playing field across players and then really, yeah, [Max]: it's an investment and people spend a lot of, especially in investing and in banking [Max]: and finance, spend a lot of time on crunching numbers and why not do that in sports as well [Max]: if you have the data available. So yeah, I'd be very, very interested in working on [Max]: that. That's for sure.

[Alex Andorra]: Yeah, I love it. Me too, for sure. That's a good one. And if you could have [Alex Andorra]: dinner with any great scientific mind, dead, alive or fictional, who would it [Alex Andorra]: be? [Max]: Yeah, well, that's a that's pretty a tough question, I have to say. So [Alex Andorra]: Yeah. [Max]: no, really, it's, yeah, there's so many amazing people out there. And when you read [Max]: papers, that's really incredible. What people are doing. And so yeah, there's so many

[Max]: people I'd like to talk to you on. Well, one, one for sure. It's Frank Debal, the guy [Max]: who basically invited me to the University of Pennsylvania, because that was a declining [Max]: point in my PhD, absolutely. But then if I could pick one as professors should expand [Max]: on your network, basically, it would be Ben Bernanke. He [Alex Andorra]: Mm-hmm. [Max]: was former president of the Federal Reserve. He received [Alex Andorra]: Mm-hmm.

[Max]: the Nobel Prize in economics. Well, people say there's no Nobel Prize in economics, but [Max]: yeah, the Ricks Bank prize last year for his work on banks and financial crisis. [Max]: Yeah, that would be super interesting to talk to him. He served his country basically. [Max]: Then he was assistant professor. So how he managed all that. And yeah, that would be

[Max]: super interesting to talk to him. Phenomenal scholar. And I like reading his papers. So [Max]: yeah, I think that would be super cool. [Alex Andorra]: Nice, yeah. Love it. Very nerdy answer. [Max]: Okay. [Alex Andorra]: Awesome. Well, thanks a lot, Max. That [Max]: Thanks, Adam. [Alex Andorra]: was really interesting. You allowed me to rant about some of my pet peeves [Alex Andorra]: about [Max]: Thanks [Alex Andorra]: data [Max]: for watching!

[Alex Andorra]: analytics and soccer. And I hope people learned a bit more. And of course, [Alex Andorra]: if they are curious, as usual, I will put a link. resources and a link to [Alex Andorra]: your website in the show notes for those who want to dig deeper. Thank you [Alex Andorra]: again Max for taking the time and being on this show. [Max]: Thanks Alex. It was a pleasure.

Transcript source: Provided by creator in RSS feed: download file