#91, Exploring European Football Analytics, with Max Göbel

00:01

[Alex Andorra]: Maximilian Göbel, welcome to Learning Basian Statistics. [Max]: Thanks Alex. [Alex Andorra]: Oh, yeah. Thank you for, for taking the time. I'm really excited about this [Alex Andorra]: episode. Um, I'm really having a variety of, uh, of, uh, podcast episodes [Alex Andorra]: these days. Um, going from, so episode nine 89 is going to get out in a [Alex Andorra]: few days. Uh, and, uh, you'll see it's about sports also, but it's about

00:32

[Alex Andorra]: the science of, um, sports and nutrition. of exercise and nutrition. And so [Alex Andorra]: today we're going to talk a lot about sports also, but more about football [Alex Andorra]: or soccer as it's known in the US. So that's going to be a fun one. And I'm [Alex Andorra]: really happy to have you on the show because you are German. So if I remember [Alex Andorra]: correctly, Germany is in Europe. And so you would be the first soccer analytics

01:01

[Alex Andorra]: episode Europe centered, which is cool. Yeah, it's one of the things I'm saying [Alex Andorra]: we should do more here in Europe. But before that, as usual, we'll start with [Alex Andorra]: your origin story. Max, how did you come to the world of econometrics and [Alex Andorra]: machine learning? Because it's actually what you're doing most of the time, [Alex Andorra]: if I understood correctly.

01:29

[Max]: Yeah, yeah, you're right, Alex. Well, actually, it's been well, if I say it's quite [Max]: a journey, it sounds dramatic. But that's, that's not the case. But it took me quite a [Max]: while, let's say. Yeah, that's maybe the better framing. [Alex Andorra]: Yeah. [Max]: I started out in my PhD, basically, the first year is, you know, there's just some [Max]: coursework. But I went into the PhD without really having something that I really wanted

01:56

[Max]: to work on in particular. So I took the first year to see which courses I like, which [Max]: not. And at my university, it was not really allowed to choose from. I mean, we had [Max]: macroeconomics, microeconomics, and econometrics, the usual stuff. But yeah, really nothing resonated [Max]: with me so much, I have to say. And then I thought I would do some macro, macroeconomics. [Max]: I think many, many people, or most of the people. PhD students really want to do

02:28

[Max]: something in that field. So it was also me. But yeah, I really never got familiar with [Max]: that stuff so much. I never really liked it. But in the second year, then there was [Max]: a course of computational economics. And I liked that quite a lot. And it was also, [Max]: let's say a tough schedule. I had to prepare a proposal within a week and I didn't [Max]: have any idea about computational economics. But that really got me into looking into that

02:57

[Max]: stuff very deeply or deeper, let's say. And [Alex Andorra]: Yeah. [Max]: so, yeah, basically what I was working on there was some clustering, some unsupervised [Max]: learning basically, but it wasn't really a fancy machine learning back then. So what [Max]: I did [Alex Andorra]: Heh. [Max]: was like the project was related to clustering community structure in the SMP 500 basically, [Max]: that was the project. And... Yeah, but I really thought, oh, this network analysis,

03:25

[Max]: this community structure detection, that's really cool. I want to work on that. And yeah, [Max]: so I thought this would be basically the outline for the rest of my PhD. And how [Max]: did I get into economics and machine learning then? Because it wasn't really related [Max]: to or not really machine learning, what I was doing back then. So [Alex Andorra]: Yeah. [Max]: how do I get there then? It wasn't until the third year, basically until I got luckily

03:53

[Max]: invited to the University of Pennsylvania as a visiting student. And I got introduced, [Max]: I got invited by Francis Diebold and I'll be forever grateful for him for inviting [Max]: me there. And he had a research group on econometrics. And at that time, the topic [Max]: was about climate. And I, again, I thought, well, I'm, I don't care about the topic actually. [Max]: I just want to learn whatever. Yeah. comes to me. And so, yeah, I took that opportunity.

04:26

[Max]: He introduced me to his research group. And they were working on climate on climate [Max]: forecasting, climate econometrics. And that's how I got basically really introduced [Max]: into econometrics. Because before I went to the University of Pennsylvania, I thought [Max]: like, yeah, I basically know what's going on. And I have this and this project. And that's [Max]: cool. But when I really arrived there, I really got to know what PhD in economics

04:50

[Max]: is really about. And yeah, that was pretty insightful, I would say. [Alex Andorra]: Yeah. [Max]: And that's how I got introduced, basically, through this research group, through projects [Max]: that we were working on. And then there was one guy, he was Frank's RA. And yeah, he [Max]: was working on machine learning, in particular. And [Alex Andorra]: Mm-hmm. [Max]: basically, a couple of weeks in, he came to me and asked me, well, Max, you want to

05:17

[Max]: get me that and that data? And we can work on a project. started off a long, well, [Max]: quite well, a couple of years now of co-authorship with him with Philippe Goulicolom, [Max]: who is now a professor at UCAM in the University of Quebec at Montreal. And he's [Max]: working a lot on machine learning. And he basically introduced me to that sphere. [Max]: And so in the end, it was the third year of my PhD that I got introduced into econometrics

05:45

[Max]: and machine learning. And yeah, quite late, as I would say. Yeah. Better late than [Alex Andorra]: Yeah. [Max]: never maybe. [Alex Andorra]: I mean, better late than never. Right? So it's cool. And you seem to enjoy [Alex Andorra]: that. So that's super fun. And so today, what are we doing? Basically, how [Alex Andorra]: would you define the work you're doing nowadays and the topics you are particularly [Alex Andorra]: interested in?

06:15

[Max]: Yeah, well, that's a good question. And because everyone I got asked that question, [Max]: I also already or always had a difficult time actually saying [Alex Andorra]: Hehehe [Max]: because I was doing something here, something there. So [Alex Andorra]: Yeah. [Max]: in between, I also thought I would like to get back to macroeconomics actually, but [Max]: after spending a couple of months on something there and it didn't really work out,

06:33

[Max]: I completely ditched it at least for the meantime. So what I'm working now is basically [Max]: machine learning and macroeconomic forecasting, let's say. I have a project on recession forecasting [Max]: in the United States, which is probably a hot topic currently. Everyone is awaiting [Max]: it, but it doesn't really seem to occur. So you have to wait a couple of months more. [Max]: And then the other stuff is basically related to climate, a lot of climate forecasting.

07:10

[Max]: especially about Arctic sea ice, how Arctic sea ice is projected to evolve in the [Max]: future, not only in the near future, but also in the, let's say, longer run. So [Max]: when Arctic sea ice might potentially disappear, there are a couple of [Alex Andorra]: Mm-hmm. [Max]: projects on that are still related to that climate econometrics group. And then the

07:30

[Max]: other stuff is basically, yeah, I mentioned learning. And I got really interested in finance, [Max]: asset pricing, what you can do. [Max]: predicting stock returns, using machine learning tools there. That's super fascinating. [Max]: And yeah, just I mean, I have to say that I'm not a specialist in machine learning [Max]: or so. I'm just super interested and fascinated by the tools and the problems that

07:57

[Max]: come with them. So yeah, there's a lot of, well, they are powerful, but. Applying [Max]: them to finance and economics also comes with some drawbacks. So yeah, you have to work [Max]: around that. And it makes it super interesting. [Alex Andorra]: Yeah, yeah, yeah. Yeah, for sure. And I mean, that's [Max]: Okay. [Alex Andorra]: probably by being really interested in a topic that you end up being a specialist

08:26

[Alex Andorra]: of it. So it's like you don't really start being a specialist and then being [Alex Andorra]: interested in the subject. It's like the causality go the other way around. [Alex Andorra]: So that's [Max]: Thank [Alex Andorra]: good. [Max]: you. [Alex Andorra]: Like trying a lot of things is how you end up finding. what you're really

08:40

[Alex Andorra]: passionate about. Yeah, awesome. And I'm curious actually, in the research realm [Alex Andorra]: of economics, which tools do you use, machine learning tools, to work in [Alex Andorra]: these models? I'm guessing a lot of open source package, I'm hoping. Because [Alex Andorra]: I remember I was introduced a bit to, I mean, I knew a bit the econometrics [Alex Andorra]: economics field in Europe a few years ago and they were using Stata all

09:14

[Alex Andorra]: over the place. So I'm curious if that changed and how that changed. [Max]: Oh yeah, that's a funny question. Because Stata, yeah, I mean some people love Stata. [Max]: I'm actually at the complete other end of the distribution. So [Alex Andorra]: haha [Max]: I always try to avoid it as much as I can. I don't know, I never really liked it. [Max]: So what I'm using is basically R and Python. [Alex Andorra]: Okay. [Max]: I also worked a bit on MATLAB. I like MATLAB actually a lot.

09:52

[Alex Andorra]: Mm-hmm. [Max]: But yeah, now I'm mostly working in R and Python. And it really depends. Sometimes [Max]: I prefer R. Sometimes I prefer Python. For machine learning, I'm mostly using Python. [Max]: Well, let's say for machine learning, I'm actually using R, let's say, when it comes [Max]: to random forest or [Alex Andorra]: Mm-hmm.

10:12

[Max]: gradient boosted trees or something like that or just plain LASA or Ridge. When it comes [Max]: to deep learning, then I'm using Python. So TensorFlow, now I'm trying to switch to [Max]: PyTorch, actually. [Alex Andorra]: Mm-hmm. [Max]: And yeah, so that's basically the patch that I'm using. Yeah. [Alex Andorra]: Yeah, interesting. And how do you choose the tool, the particular tool you're [Alex Andorra]: using for a particular project?

10:41

[Max]: Yeah, that's a good question. I think that's mostly an art rather than a science, [Max]: I would say. And it's up to your preference. But not all tools work in every context, right? [Max]: So in economics, it's really the problem, especially in, I would say, macroeconomic forecasting, [Max]: where you have time series of, let's say, it gets until 700 observations on a monthly [Max]: basis for the United States maybe. And then you have a feature set of, let's say,

11:11

[Max]: 100 features when you include lags and all that. You can pump it up maybe to 1,000 [Max]: or something. But for machine learning or for deep learning, this is still rather [Max]: a small data set, I would say. So that's ridiculous, actually. [Alex Andorra]: Mm-hmm. [Max]: But still, that's then the challenge, right? To tune them, to train them so that [Max]: they don't overfit. And that's really the interesting part for me, I think. And yeah.

11:39

[Max]: In other contexts, other tools might work much more conveniently, let's say, or [Max]: are much easier to apply. So some lasso or so when you have a lot of features and you [Max]: just don't know which features are important, then you, yeah. I like lasso in that regard [Max]: because it selects basically the features for you. Or you might say, well, you're a file [Max]: As a pricing context, we have returns, a lot of noise in their signal-to-noise ratio,

12:13

[Max]: very, very low. You really don't know which features are important. So we just maybe [Max]: the better option, because Lasso would basically set almost everything to zero. Yeah, [Max]: so it really depends. You really have to make it dependent on the context that you're [Max]: working in. And [Alex Andorra]: Hehehe [Max]: yeah, but that's also interesting to see which models prefer or work well on which

12:37

[Max]: data sets and which contexts. And yeah, I'm still learning in that regard. And that's [Max]: super interesting. [Alex Andorra]: Yeah, yeah, yeah. No, for sure. And I find that super interesting also to see [Alex Andorra]: this ability of open source tools to basically be adopted more and more

12:57

[Alex Andorra]: in your research, which of course, I'm extremely biased, but I welcome. But also [Alex Andorra]: mainly because I do think that open data and open source are natural consequence, [Alex Andorra]: but also cause, I would say, of... more open science, which I definitely [Alex Andorra]: welcome and I think should be way more of the case, you know, like more and [Alex Andorra]: more you see papers with accompanying GitHub repositories and accompanying GitHub

13:33

[Alex Andorra]: open source packages even in Python or in R, which is definitely something [Alex Andorra]: new. And that's super cool that the research realm is catching up on that. [Alex Andorra]: Um, because less and less you see papers where I remember a few years ago, [Alex Andorra]: you know, like the first open say the open science and, or open data papers

13:51

[Alex Andorra]: was like, Oh yeah, the data is available by the way. Um, at the end of [Alex Andorra]: the paper, you know, and then you had to basically beg the, the corresponding [Alex Andorra]: author about like three times a week for four months to get some of the data [Alex Andorra]: and that was not really open basically, um, so yeah, that, that's a really [Alex Andorra]: cool. development that I really love. I have to say.

14:15

[Max]: No, absolutely. And this is also, I think that's a very good point. For example, me and [Max]: my co-authors, or my co-authors are pushing for that, really, to make the codes then also [Max]: available on the website, for example, so that people can cross-check. And that's [Max]: very good. And yeah, I like that also myself. When I read papers and I want to replicate [Max]: something and the authors are making the code available, basically, you can check

14:42

[Max]: if your own code is correct. That's super helpful. You learn a lot by that. And yeah, [Max]: really, really. Especially when, for example, using GustaTrees or so. I mean, it's [Max]: XGBoost, and it's super convenient to use. And for sure, there's some tuning that [Max]: you have to do yourself. But still, the package is there, basically. And it's super [Max]: convenient to use. You don't have to cope the whole forest, basically, yourself. [Alex Andorra]: Yeah.

15:09

[Max]: So yeah, for sure. That's [Alex Andorra]: Yeah, yeah, [Max]: amazing. [Alex Andorra]: yeah. No, clearly. Yeah, that's super nice and well done and like picking up [Alex Andorra]: all those different tools and different [Max]: Mm-hmm.

15:18

[Alex Andorra]: languages. That's super cool. And I don't know how it changed, but I do remember [Alex Andorra]: that a few years ago, doing open source development wasn't really incentivized [Alex Andorra]: for doctoral candidates or post-doctoral candidates, so maybe that changed and that's [Alex Andorra]: further better. But if that didn't, the fact that you're doing it is like [Alex Andorra]: even more commentable, I would say, because that's a bit adjacent to your

15:48

[Alex Andorra]: project. So yeah, well done on doing that and taking the time to do it. [Alex Andorra]: That's what we're called for sure. Um, so now I'd like to talk a bit about, [Alex Andorra]: yeah. So you said you're doing econometrics, but, um, can you define econometrics [Alex Andorra]: for us and, and tell us what it brings to economics basically. [Max]: Yeah, sure. So a lot of weight now for me on [Alex Andorra]: haha [Max]: giving the textbook definition of econometrics.

16:21

[Alex Andorra]: Yeah, exactly. [Max]: No, I mean, it's basically, or now I'm butchering the whole definition probably. But [Max]: it's applying statistical tools to an economic context and trying [Alex Andorra]: Mm-hmm. [Max]: to use statistical tools to basically verify some economic theory or some. to understand [Max]: some relationships between economic variables. So I think it's a, yeah, I think that that's

16:51

[Max]: basically it. It's kind of a fancier term for what it actually is, applying statistical [Max]: tools for understanding economic relationships. That's basically it. I mean, it's essential. [Max]: I mean, for empirical work, for sure they're economists who you only work on theory, [Max]: but yeah, for policy analysis or for... you need to analyze the data in the end. And [Max]: basically, that's what I'm doing. I don't really do theory stuff, but for me, it's just

17:20

[Max]: all empirical. And yeah, so definitely, it's very useful in the end, especially for [Max]: policymaking at central banks and everywhere, also for the industry, be it [Alex Andorra]: Mm-hmm. [Max]: banking industry or be it just normal in the real economy for [Alex Andorra]: Yeah. [Max]: analyzing demand and all that.

17:44

[Alex Andorra]: And do you... So I'm curious how you got introduced to Bayesian methods, [Alex Andorra]: actually, and why they stuck with you, because from what I remember, from [Alex Andorra]: the world of econometrics, Bayes was not used a lot in this field. So I'm actually [Alex Andorra]: curious why you are using it. [Max]: Yeah. Well, I have to admit, like, so I already said that it was like third year

18:16

[Max]: that I got to introduce in Jekyll and the Matrix. And that was this project when [Max]: Philippe, Frank's RA basically [Alex Andorra]: Mm-hmm. [Max]: came to me and asked me to gather some data on climate variables because we want to [Max]: run a vector autoregression of the Arctic. Basically, you basically get some, what we [Max]: basically did is we gathered data. and which time series on certain climate variables, [Max]: which we [Alex Andorra]: Mm-hmm.

18:42

[Max]: thought would proxy for the Arctic ecosystem basically. And then we wanted to use a vector [Max]: autoregression to analyze certain amplification mechanisms, if there is a shock to CO2, for [Max]: example, and also to be able to produce long-run forecasting projections. So when Arctic [Max]: seas might potentially [Max]: disappear in the future. [Alex Andorra]: Yeah. [Max]: And so the data is highly non-stationary. And [Alex Andorra]: Uh huh. Uh huh.

19:12

[Max]: in VARs, when you work with VARs, most economists really work with patient methods [Max]: there. And as I said, data was highly stationary. So patient statistics or the patient's [Max]: framework gives you some leeway there, granted some freedom there. So that was big. [Max]: Yeah, that was why Felipe told me, okay, look at Bayesian VARs, look at the Bayesian [Max]: way. And that's how I actually got introduced to that. And there was at the time, I really

19:44

[Max]: didn't have any exposure. So there was a package in MATLAB for doing Bayesian inference, [Max]: basically, with VARs. And that was super helpful. That helped me a lot. That was super, [Max]: or a great education, a source of education, really, that was great. And The more I learned [Max]: about it, the more it resonated with me, this concept of quantifying uncertainty.

20:09

[Max]: I think this is because especially in economics, this is quintessential to really [Max]: get an idea of [Alex Andorra]: Yeah. [Max]: what the uncertainty is. Point estimate is always nice, but you want to have the uncertainty [Max]: around it. And that's also what Frank Biber always told us. Yeah, you want to have [Max]: a measure of uncertainty. And definitely, that's true. Yeah, you get it from the in the

20:32

[Max]: Bayesian framework. It's just so intuitive to think about it. And yeah, I like that a [Max]: lot. And unfortunately, I don't really work so much or haven't worked in so many projects [Max]: with Bayesian methods lately, or not as much as I would like to. But yeah, it's

20:54

[Max]: ever since resonated with me. And yeah. I. Still, I wanted to learn more, and that's [Max]: how basically I got into looking at PyMC, because I wanted to learn with Python, and [Max]: thought, well, maybe an application with Bayesian methods, the Bayesian framework would [Max]: be cool to learn, and that's how I got into PyMC 3, or PyMC basically, or looked at [Max]: it and looked at it. So, yeah.

21:24

[Alex Andorra]: Yeah, yeah, yeah. Nice. That's interesting. So yeah, basically, it's like [Alex Andorra]: the uncertainty quantifying that was really important to you. [Max]: Exactly. So that was really the key, the key point there. [Alex Andorra]: Yeah. I mean, that does make sense, right? Because, yeah, that's really [Alex Andorra]: one of the parts where bass does shine a lot. And also, especially for [Alex Andorra]: the Arctic sea ice project that you are talking about. It's not like it's a

21:57

[Alex Andorra]: reproducible experiment. It's really hard in these cases to think from a [Alex Andorra]: frequentist framework of repeatable experiments. You cannot have multiple earths [Alex Andorra]: on which you can two RCTs where you melt the ice caps or not, and you melt [Alex Andorra]: it naturally. like naturally or thanks to human intervention. It's just

22:22

[Alex Andorra]: like, it doesn't work in that case. So yeah, Base, I'm not surprised that [Alex Andorra]: it would be a project where Base fits way more naturally. [Max]: Yeah, no, that's for sure. I mean, for example, these climate models from these climate [Max]: institutions, these are huge models. And big models, to train them or to run these [Max]: models, it takes a lot of time. And they are very sophisticated. So really, really sophisticated.

22:51

[Max]: But they are basically deterministic models. And they give you a point estimate [Max]: in the end. But our... interest was basically really to see, well, we get a point estimate, [Max]: but we also want to see, especially when you project the path of Arctic sea ice, the [Max]: uncertainty around it. Well, how likely is it that maybe or that we see Arctic sea [Max]: ice disappearing, not at our point estimate in the 2060s or 70s, but beforehand? Like how

23:19

[Max]: large is the uncertainty? Maybe our model is really not good and the uncertainty is so [Max]: much all over the place that it's more or less useless. But yeah, and that project [Max]: was actually interesting to see that the uncertainty or the credible region was [Max]: basically spanning like 20 years, 25 years around. So that was very interesting. [Max]: And it gave us a quick quantification of uncertainty to it. Yeah, that was really, [Max]: really interesting.

23:52

[Alex Andorra]: Yeah, yeah, yeah. Nice. Uh, they, I love that. Uh, and I mean, I would [Alex Andorra]: have, that's really interesting for me to, to talk with someone who recently [Alex Andorra]: got into the Bayesian framework and to understand how you get into it and why, [Alex Andorra]: and, and how, uh, so I would have a lot of other questions on that, but [Alex Andorra]: I want to talk about football or soccer, so let's, let's switch to that and

24:16

[Alex Andorra]: then if we have time at the end of the episode, I'll come back with my, [Alex Andorra]: um, nerdy, uh. Educational questions. So yeah, basically you have an area or a hobby [Alex Andorra]: of yours where you do apply and need actually Beijing stats and that's [Alex Andorra]: soccer analytics. First, I read a bit your website and I saw you were a passionate [Alex Andorra]: football since you were a child and you mentioned a bunch of European championships.

24:51

[Alex Andorra]: Not the French one though. I was absolutely outraged. What happened? What [Alex Andorra]: happened? Like, don't you get the French games in Germany? [Max]: Oh yeah, well that's another issue. So when I was younger really, I mean it was only [Max]: the Bundesliga and sometimes when you were lucky, sometimes you got the highlights [Max]: of the French Premier League and the Serie A, but yeah you had to be really lucky, [Max]: it was not always available and I wasn't that...

25:25

[Max]: Yeah, I didn't know the websites where you could watch it basically. So [Alex Andorra]: Hehehehe [Max]: that was another issue. But yeah, the French, well, the French league, I was never [Max]: really a fan of. I'm sorry, Alex. But yeah, that's just even though one of my favorite [Max]: players was Joao Gopic. So Olympic [Alex Andorra]: Oh, [Max]: Rio. [Alex Andorra]: really? [Max]: Yeah, [Alex Andorra]: Oh, [Max]: yeah, yeah. So [Alex Andorra]: he went [Max]: yeah.

25:46

[Alex Andorra]: to Milan. Yeah. Um, [Alex Andorra]: yeah, no offense taken. I think the French league is pretty boring. Um, and, [Alex Andorra]: uh, yeah, as long [Max]: as [Alex Andorra]: as, [Max]: the Bundesliga. [Alex Andorra]: I mean, yeah, um, as long as PSG is dominating like that, uh, I mean, that's [Alex Andorra]: good for me because, um, I'm a PSG fan since I'm like five year olds, uh, [Alex Andorra]: but yeah, like, uh, it's not a very interesting league. And the level is

26:14

[Alex Andorra]: kind of going down by the gears. So hopefully we'll get some investors in other [Alex Andorra]: clubs, which make for a good competition for Paris, but until now it's really [Alex Andorra]: bad. And it's actually bad for Paris because the competition inside the country [Alex Andorra]: is really bad. So then when they get on the European stage, they are not [Alex Andorra]: really used to the intensity and having so much. adversity in a way. So,

26:43

[Alex Andorra]: yeah, it's too easy for them, let's say. So basically, but I didn't get you [Alex Andorra]: on the show to trash the French league. I want to talk about soccer factor [Alex Andorra]: model that you recently worked on. And I found it super interesting because [Alex Andorra]: that's mainly, yeah, the main question I always have in soccer analytics.

27:10

[Alex Andorra]: The nerd in me is always very careful about the hot takes that you see the [Alex Andorra]: commentators have about players where it's like, yeah, but what's the, how [Alex Andorra]: do you separate a player's skill from the ability, skills and ability from his [Alex Andorra]: team's strength? And that's to me is extremely important because mostly [Alex Andorra]: in Europe, right now, most of the clubs... mainly invest on players on gut

27:43

[Alex Andorra]: feeling, basically. And the thing is when you do that and you're not able [Alex Andorra]: to separate inherent player abilities from team strength, then you get [Alex Andorra]: kind of an aura effect from the beginning of your carrier that can follow [Alex Andorra]: you, even though you're not that good of a player, but basically, like [Alex Andorra]: this aura can follow you even though you are not making that much of a difference.

28:10

[Alex Andorra]: But it's just like, it's hard to contradict it because you don't really have [Alex Andorra]: the method of the scientific way of disproving basically what's going on. [Alex Andorra]: That actually, well, it's not really your inherent abilities but mainly the [Alex Andorra]: people you're surrounded with. And I think it's like absolutely important [Alex Andorra]: to do that and should lead to... really a revolutionized way of transferring

28:42

[Alex Andorra]: players and signing them and so on. So, that was basically the background

28:47

[Alex Andorra]: for people who are not interested in football. Even though, even if the field [Alex Andorra]: doesn't interest you, I think the method and the goal of the model is actually [Alex Andorra]: extremely important because you can also think about that in finance, for [Alex Andorra]: instance, like I know a lot more work has been done in finance for that [Alex Andorra]: because I mean, the return or. Basically, the incentives of the money are

29:09

[Alex Andorra]: much more important because you know if you make money or not. But I know [Alex Andorra]: there is a lot of literature right on basically passive investment versus [Alex Andorra]: active investment. And how do you actually prove that an active investment [Alex Andorra]: is better than a passive one and that it's actually due to the skills of [Alex Andorra]: the person who invested on the market instead of just random market fluctuation?

29:32

[Alex Andorra]: So you can see that in a lot of contexts where you can see that. Basically, [Alex Andorra]: information is sparse, is hard to decipher, and so you need a model to make [Alex Andorra]: sense of it. So you can see that, I would say, in football, in a lot of [Alex Andorra]: sports, in finance, in medicine also, right, where it's like you can have a

29:52

[Alex Andorra]: lot of these celebrity effect basically. I think in a lot of contexts where [Alex Andorra]: celebrity effect is important, it can be broken down by that scientific way [Alex Andorra]: of estimating it. So these... politics, of course, movie. I think it's basically [Alex Andorra]: a theme that's running in a lot of fields where the celebrity effect is [Alex Andorra]: extremely big. So yeah, that was a very long introduction. [Max]: Yeah.

30:21

[Alex Andorra]: But to say that, I think it's very useful. So you can react to what I said [Alex Andorra]: and also afterwards, if you can tell us what a factor model is. Because [Alex Andorra]: your model is very, You could lead the soccer factor model, but then can [Alex Andorra]: you tell us before that what a factor model is? [Max]: Yeah. No, Alex, I mean, you laid it out perfectly. I couldn't have said it any more

30:45

[Max]: accurately, I would say, really on the point as far as I see that. So a factor model, [Max]: what it actually is, is a factor basically as some, I would define it as [Max]: some proxy for a certain. exposure to a certain, in finance to a certain risk basically. [Alex Andorra]: Mm-hmm. [Max]: Also a reduction for example in when you look at economics or macroeconomics it's [Max]: often related to the context you have a huge set of features and you reduce it to

31:24

[Max]: a couple of underlying factors or a single factor only. It's a kind of a feature reduction [Max]: like dimensionally reduction technique like PCA. [Alex Andorra]: Mm-hmm. [Max]: principal component analysis or that. But in finance, it's really like a proxy for [Max]: a certain risk exposure that basically the cross-section of stock returns or all stock [Max]: returns are exposed to a certain systematic risk exposure. [Alex Andorra]: Mm-hmm.

31:52

[Max]: All stock returns are basically exposed to it. This is basically a factor. And [Alex Andorra]: Yep. [Max]: in the literature, and as surprising as identified, several of these and yeah. common [Max]: risk exposures basically across the whole universe of stocks basically. But as you already [Max]: said, you can use it also as quantifying the ability, for example, of a portfolio manager. [Max]: So if he has some skill in the game, basically if he has really superior selection

32:27

[Max]: potential, then just following along these. common risk exposures, basically. [Alex Andorra]: Mm-hmm. [Max]: And that's also what this Stalker Factor Model basically is inspired by, to identify [Max]: certain features that all players are exposed to because of the differences in the

32:46

[Max]: teams. And then when you account for that, then you can basically extract the skill [Max]: and the inherent ability of each player after you account for these systematic differences [Max]: across teams basically that influences [Alex Andorra]: Hmph. [Max]: the ability or the observed performance of a player. [Alex Andorra]: Yeah, yeah, for sure. Yeah, for sure. Because like in the example of football, [Alex Andorra]: like you'd say it's easier to be the number nine. So the, how do you say

33:19

[Alex Andorra]: in English that position, like the front, playing. Number nine is like the [Alex Andorra]: guy who's supposed to score the goals. Like the English natives can then [Alex Andorra]: tell me what the, the name is in French that would be Atacon. It's easier [Alex Andorra]: to be the number nine of PSG than the number nine of a very small team in [Alex Andorra]: France, because the whole, the rest of the team is stronger. The manager is

33:48

[Alex Andorra]: supposed to be stronger and so on. So, yeah, you're like, yeah, but maybe [Alex Andorra]: if you took the number nine of the small team and you put it in Paris, [Alex Andorra]: maybe he would perform as well as the current number nine does. So how do [Alex Andorra]: you make the difference? So that's what we're going to talk about. Before [Alex Andorra]: that, I'm curious, from a structural standpoint, these kind of factor models, how

34:17

[Alex Andorra]: do they work? How much time do you need to really start to decipher the [Alex Andorra]: difference between inherent skills and exhaustion as basically strength? [Alex Andorra]: And that question is basically, how much data you need from the past years [Alex Andorra]: to start having an idea like how data hungry are those models.

34:45

[Max]: Yeah, so that's definitely a good question, a good point. So you have to create these, [Max]: yeah, you have, so in the model that I'm basically proposing is, Basically, I need [Max]: a lead time into the season to really account for certain differences. So I need [Max]: a couple of games already that [Alex Andorra]: Mm-hmm.

35:08

[Max]: would need to be played to really account for differences in teams. Because before the [Max]: first game, basically, everything, or based on the data that I had, everyone [Alex Andorra]: Mm-hmm. [Max]: would have been the same. [Alex Andorra]: Mm-hmm. [Max]: But it depends really on the data. If you have data that allows you to account for [Max]: differences across teams, batch it. [Alex Andorra]: Mm-hmm. [Max]: let's say, or so you can just start right [Alex Andorra]: Yeah.

35:30

[Max]: away. And for overall data, I would say like more data is always better. If you have [Max]: only a few observations, I think the Bayesian framework is then tailor made for [Max]: that as well. Like it's yeah, it grants you some leeway there. But I would say really, [Max]: it's the more data you have, the better. But yeah. [Alex Andorra]: But you could already, OK, so you could already start having that idea with

36:04

[Alex Andorra]: just a few games. Then you get the idea of the strength of the team. And then [Alex Andorra]: you can start deciphering the strengths of the player. OK. [Max]: Exactly, exactly. [Alex Andorra]: Yeah. [Max]: But as far as I always used a certain number of, let's say, burn-in games [Alex Andorra]: Yeah, [Max]: to [Alex Andorra]: yeah, [Max]: really account [Alex Andorra]: yeah. [Max]: for that.

36:25

[Alex Andorra]: Yeah. And I mean, it's not that superficial, right? Because you can think like [Alex Andorra]: right now it's August, it's the beginning of the leagues for the European [Alex Andorra]: teams. August is a weird moment where the teams are still warming up basically. [Alex Andorra]: Um, and they are not really, they are clearly not at peak performance. Usually [Alex Andorra]: they try to peak around spring for the Northern hemisphere. So around March.

36:49

[Alex Andorra]: from February to May, basically, they are trying to get their peak. So they [Alex Andorra]: are still warming up. They can still trade players until the end of August. [Alex Andorra]: So you could really say that the games they are doing in August, even though [Alex Andorra]: they are official games, they are still warming up games and don't really [Alex Andorra]: mean a lot for a long-term performance perspective. So that's an interesting moment

37:13

[Alex Andorra]: to start warming up the model, I'd say. And so, but something I mean, and [Alex Andorra]: maybe you have that for future iterations of the model where you could put

37:27

[Alex Andorra]: in the priors. Um, we're going to talk about the structure of the model, uh, [Alex Andorra]: right away, right after that, but, uh, something I'm thinking about is that [Alex Andorra]: you could put in the prior, the information that you have about the strengths [Alex Andorra]: of the team in, in the way that, yeah, you have the budget, which is a good

37:44

[Alex Andorra]: proxy for potential future performance. But also, like, just past performance. If you [Alex Andorra]: know that Paris has been the champion for nine years out of 10, well, you [Alex Andorra]: have really good prior about the strengths of the team. So you can [Max]: Okay. [Alex Andorra]: probably also add that into the model and in that way reduce the warming [Alex Andorra]: up period of the model.

38:10

[Max]: Yeah, no, absolutely. Or how Paris against Lyon, let's say, has performed in [Alex Andorra]: Yep. [Max]: the past. So they're direct comparison between those teams, basically, when they faced [Max]: each other for past years. That would also feed in there. Yeah, so absolutely. There's [Max]: a lot of potential. And my model is, [Alex Andorra]: Mm-hmm. Yeah. [Max]: when you're basically suggesting this stuff, my model just appears very rudimentary.

38:37

[Max]: But it could be definitely. extended in that regard. [Alex Andorra]: Yeah, I mean, that's the fun thing of model and rights. It's like you have [Alex Andorra]: to start somewhere that's good enough, and then you have a lot of ideas to [Alex Andorra]: extend it. And it's a never-ending endeavor. Like, each model, if you want to [Alex Andorra]: do your good work on it your whole life, if you're interested enough, you

39:00

[Alex Andorra]: definitely can do that. I know my models that I often revisit are the ones [Alex Andorra]: for predicting French presidential elections. when I started doing that in 2017 [Alex Andorra]: and compared to the one I had for 2022, it's just embarrassing. [Alex Andorra]: But in a way, it's good that the work you're doing right now is the best

39:30

[Alex Andorra]: one you've ever done. And in a few years, when you look at the work you're [Alex Andorra]: doing right now, it should be the worst you've ever done because that means [Alex Andorra]: you've... progressed a lot in the meantime. So I think it's a good mindset. [Alex Andorra]: So how did you adapt that factor model for soccer? Like how, what does the model [Alex Andorra]: structure look like basically for listeners to have an idea? And for those

40:05

[Alex Andorra]: watching on YouTube, you can share your screen actually. So if you want [Alex Andorra]: to share anything at some point, feel free to do it. Otherwise, the audio format [Alex Andorra]: is here for you because it's a podcast. So it's an audio first content. [Max]: Perfect. Yeah. So yeah, maybe if I get it on the screen, I'll do that. But for now,

40:29

[Max]: maybe the structure, I think, is pretty simple. And as you laid it out already very, [Max]: very accurately, it's basically trying to come up with some features, do some feature [Max]: engineering that basically accounts for differences across teams. And well, when you [Max]: look at, let's say, player a certain player, let's say, Cristiano Ronaldo. And you [Max]: really want to account for the difference that his current team is currently between

41:01

[Max]: his team and the team that he's facing at that exact instance. And you want to create [Max]: some features that can proxy for these differences across teams. And that's basically [Max]: the heart of the model. And this is basically inspired by these asset pricing factors that [Max]: try to account for. differences across assets, across stocks, across firms, basically.

41:26

[Max]: And the modeling part itself is really nothing sophisticated. You can include kind [Max]: of a hierarchical structure where you don't need to, but it can help, definitely. [Max]: But it's really the feature engineering that is at the heart of it. And then IMC comes [Max]: in very conveniently and just basically. That's the dirty work for you.

41:51

[Alex Andorra]: Mm-hmm. And so what's the, so then that's cool. If it's a simple structure, [Alex Andorra]: yeah, can you talk about what was your likelihood [Max]: Thanks [Alex Andorra]: and then [Max]: for watching! [Alex Andorra]: what kind of distribution you put on the parameters and things like that? [Alex Andorra]: I think it would be a fun thing to talk about for the listeners. [Max]: Sure, sure. Then maybe I just get the workbook loaded. [Alex Andorra]: Oh yeah.

42:18

[Max]: So maybe I can share my screen and couple [Alex Andorra]: Yes, [Max]: of... [Alex Andorra]: you should be able to. [Max]: Let me see. [Max]: So in terms of a likelihood, basically, or what the model structure is, so I have to [Max]: proxy, I need some observed measurement of a player's performance. Not [Alex Andorra]: Yes. [Max]: a skill, I mean, that is something that is underlying, that is latent, that we want [Max]: to identify. [Alex Andorra]: Mm-hmm.

42:59

[Max]: But we need some observed measure of player performance. What I used is scoring [Max]: goals. Did players score a goal in a certain game or not? So basically, 0, 1, basically [Max]: binomial distributed, and basically, the logistic regression it is. You want [Alex Andorra]: Yeah. [Max]: to identify the probability of a player's scoring. And so now I have it. I guess I have [Max]: it here. [Alex Andorra]: you may have [Max]: Um, [Alex Andorra]: to authorize Google Chrome to share.

43:37

[Max]: exactly, [Alex Andorra]: Oh [Max]: exactly. That [Alex Andorra]: yeah. [Max]: unfortunately takes a bit of time. Um, Sorry, I guess I'll be [Alex Andorra]: Yeah, [Max]: here in a second. [Alex Andorra]: it's all good. Yep. It's all good. You can do that and [Max]: Okay. [Alex Andorra]: come back. I don't know what's going to happen for the recording, but I already [Alex Andorra]: did that. After all, it's no problem. [Max]: Sorry, I didn't.

44:11

[Alex Andorra]: I mean, it's the first time I do it. So I didn't know it either. [Max]: Ah, okay, here it is. Wait. [Max]: Is it Joe? Ah. [Alex Andorra]: So [Max]: No. [Alex Andorra]: I think you need to give permission. [Max]: Yeah, exactly. That's [Alex Andorra]: And open [Max]: one. [Alex Andorra]: your computer system settings and click privacy and security. [Max]: Well, maybe.

44:39

[Alex Andorra]: Apparently, if you open your system settings, and then you go [Max]: Yeah [Alex Andorra]: to privacy and security, and you click screen recording, and allow your [Alex Andorra]: browser to share your screen. I think you need to allow Google Chrome to [Alex Andorra]: share your screen. [Max]: mm-hmm yeah I was there but ah yeah okay now [Alex Andorra]: I mean [Max]: maybe [Alex Andorra]: otherwise it's no chip. [Max]: Okay. [Max]: Okay. Sorry for that.

45:13

[Alex Andorra]: So let's see. [Max]: No? That's what I wanna [Alex Andorra]: Yeah, it [Max]: do with [Alex Andorra]: may [Max]: that guess. [Alex Andorra]: be [Max]: Sorry. [Alex Andorra]: because you have to get out to quit Google Chrome and then come back. Are [Alex Andorra]: you on Mac? [Max]: Yeah, yeah, exactly. [Alex Andorra]: Yeah, so you probably need to close Google Chrome and then come back. But [Alex Andorra]: you can do that. And then you come back to the same link I sent you.

45:41

[Max]: Okay. [Alex Andorra]: And then it should work. Maybe I'll have to [Max]: OK. [Alex Andorra]: do another recording, but that's OK. I can edit that after once. It's easy. [Max]: Okay, okay. [Alex Andorra]: So I'll wait for you here. Yeah. [Max]: Okay. I'm back Alex. Sorry. [Max]: Sorry Alex, I cannot hear you currently. [Alex Andorra]: Yes, that's normal. I was muted. So cool. I didn't even have to start a new

47:52

[Alex Andorra]: recording. You can just join the room again. Cool. First time it happened, [Alex Andorra]: so I didn't know what would happen. So cool. Perfect. So does it work now? [Max]: Let's [Alex Andorra]: Let's [Max]: see. [Alex Andorra]: try. [Alex Andorra]: No, [Max]: No, [Alex Andorra]: still not. [Max]: no, [Alex Andorra]: That's weird. [Max]: no. I'll give it a last try and otherwise I just.

48:14

[Alex Andorra]: Yeah, otherwise it's okay, but... [Max]: Yeah, Google [Alex Andorra]: It's [Max]: Chrome, [Alex Andorra]: weird. [Max]: it's there. [Alex Andorra]: It should work. [Max]: I allowed it. So I don't know, Google Chrome, it's fine. It can access.

48:33

[Max]: but [Alex Andorra]: I'm checking that it could be on my end maybe, so... [Max]: screen [Alex Andorra]: Yeah, no, [Max]: the [Alex Andorra]: on [Max]: window [Alex Andorra]: my end also it's all good, so... [Max]: And I'm sorry, no, fortunately it doesn't [Alex Andorra]: No, weird. [Max]: work. [Alex Andorra]: Anyways, that's OK. So well, then let's continue between the [Max]: Thanks for watching. [Alex Andorra]: screen sharing. You can just talk through it. It's no problem.

49:08

[Max]: Okay. [Alex Andorra]: I've [Max]: Yeah. [Alex Andorra]: done it. We've done it for a lot of podcast episodes. [Max]: OK. Yeah, so the structure basically is relatively simple. You need some idea of [Max]: what the performance of the player is. And you have to have a proxy for that. [Alex Andorra]: Mm-hmm. [Max]: And well, you need this performance to be observed, obviously. And the proxy that [Max]: I choose for a player's performance is whether he scores a goal or not, so 0 or 1

49:41

[Max]: in a certain game. We're normally distributed our y, our target. And it's basically a logistic [Max]: regression that we are running. Because what we want to identify is really the skill [Max]: and the ability, latent variable hidden in our observed performance measure, basically. [Max]: And so the model is pretty simple. You need the prior. You have basically a bunch [Max]: of coefficients. That is, you have the alpha. the skill, the ability that you're interested

50:14

[Max]: in. And then you have the loadings, the coefficients on all the factors that are in [Max]: your model. So you basically have to impose priors for all the coefficients. [Alex Andorra]: Mm-hmm. [Max]: And then you have to define the likelihood, the newly distributed. And yeah, that's basically

50:33

[Max]: the model. It's on the workbook. And people can go through it. There's also a redacted [Max]: version, basically, where you're People, if they are fancy, can try to work with their [Max]: own priors and all that and try to do it themselves first and check the unredacted [Max]: version. [Alex Andorra]: Oh, that's cool. [Max]: So they want to play with that a bit. [Alex Andorra]: Nice. [Max]: Yeah, that's basically it. So it's nothing really crazy. It's the four lines of code,

50:58

[Max]: the basic model, basically. And yeah, when you look at multiple players, so you can [Max]: do that for a single player only, but you can also do that for sure for multiple [Max]: players. The key reason is that. Basically, everyone should be exposed to the, each player [Max]: should be exposed to these factors with the same loading basically. So you can expose,

51:24

[Max]: impose a hierarchical structure on the ability and skill of each player. You should [Max]: definitely do that, but you can post the hierarchical structure by player or also [Max]: by season. So the ability of the player may evolve over seasons or across seasons basically. [Alex Andorra]: Mm-hmm, mm-hmm, yeah. [Max]: That's, I think. something worth looking into or worthwhile doing. And then basically [Max]: you have the loadings on the factors and they should account for the team effort

51:54

[Max]: basically. You want to account that and you want to get that out of the way so that [Max]: you're basically in the end left with this latent factor, the alpha, the inherent [Max]: skill and ability of the player. [Alex Andorra]: Yeah, yeah, yeah. OK. Yeah, that makes sense. And I mean, for sure, I will [Alex Andorra]: put all of these in your episode's show notes. And actually, I think I can share [Alex Andorra]: my screen. I didn't know why I didn't think about that before. And here

52:27

[Alex Andorra]: is the notebook, right? Am I on the right notebook? [Max]: Exactly. [Alex Andorra]: Yeah, perfect. [Max]: Yeah, yeah, [Alex Andorra]: So. [Max]: yeah. So there are a couple of notebooks there. So there's this in the Pyamicon folder, [Max]: that's the one where there's the redacted version and the unredacted version and the [Max]: version that we're currently looking on. That's the initial part with all its typos [Max]: in there.

52:48

[Alex Andorra]: Ah ok, so it's not the right one. Then, should look at [Max]: It's [Alex Andorra]: another [Max]: fine [Alex Andorra]: one. [Max]: one, so it's perfect. The other one is just a bit smaller and more concise, I would [Max]: say. [Alex Andorra]: Ah, here. Unredacted. Perfect. Yeah, I have it here. So yeah, like for those

53:04

[Alex Andorra]: of you watching on YouTube, I'm charging it right now. And so basically, [Alex Andorra]: this is the part of the model where you're talking about the likelihood, [Alex Andorra]: where it's goal is scored or not scored. And then you have here the probability, [Alex Andorra]: which is basically here. this alpha that you talked about, right? That [Max]: Exactly. [Alex Andorra]: is the inherent skill of the player which enters probability. And you have

53:39

[Alex Andorra]: the Xs and the beta. So the Xs, are they the factors or the beta are the [Alex Andorra]: factors? [Max]: So the Xs are the factors. These are the differences across the teams or between [Max]: the teams. And this is what you want to basically account for and to clean the observed [Max]: performance measure from. Yeah. [Alex Andorra]: Yeah, yeah. Oh, yeah, OK. Yeah, for sure. And then the beta is the slope, basically, [Alex Andorra]: on the factors. Yeah, yeah, [Max]: Exactly.

54:10

[Alex Andorra]: yeah. Yeah, yeah, it's a fun model. So of course, it's hard to make it just [Alex Andorra]: this on the podcast. But I encourage you to go and watch that part on YouTube. I'm [Alex Andorra]: sharing it right now. And also, you can just take a look at the notebook from [Alex Andorra]: Max, which I put in the show notes, where you have all the details. So it's [Alex Andorra]: pretty fun to look at. And also, as you were saying, the model is pretty small.

54:42

[Alex Andorra]: So that's the amazing thing that I find is that basically, and now if we [Alex Andorra]: go look at the Prime C implementation, so a bit later [Max]: Oh. [Alex Andorra]: down in the model, the really cool thing is that basically the model is quite [Alex Andorra]: easy to code, right? And in a way, that's just a few lines of codes, so [Alex Andorra]: basically four lines of codes, as you were saying, and you're done. So that's

55:15

[Alex Andorra]: the beauty of the probabilistic programming framework, right? It's a really [Alex Andorra]: useful model. But if you want to get to a first good enough version that [Alex Andorra]: already gives you interesting insights, you don't have to reinvent everything. [Alex Andorra]: And you don't have to go with the first, hardest version from the start, [Alex Andorra]: where you have a hierarchical time series model where everything is varying

55:44

[Alex Andorra]: and pulling information. Sure, that's cool. But don't start with that. It's [Alex Andorra]: like if you're starting to train, don't start with 100 push-ups. Start by like [Alex Andorra]: try five first, and then do a few series of them. build your way up to

56:00

[Alex Andorra]: 100. So that's the critical thing I find of here at the patient framework [Alex Andorra]: coupled to the part of probabilistic programming languages, which is you can get [Alex Andorra]: down to a first good enough version and then in a few lines of codes having [Alex Andorra]: your version and then sampling from it. Because here you have it on the screen.

56:22

[Alex Andorra]: The likelihood that you have a line for deterministic, which is the. logistic [Alex Andorra]: regression line, and then you have your intercept and your coefficient on [Alex Andorra]: the factors. And basically that's it. That's really amazing. [Max]: Absolutely. No, that's, I think, the beauty of Climacy that it allows you to describe [Max]: or build your model in a pretty intuitive way. And you can even let it be printed out

56:56

[Max]: to see if everything is as you would have expected. And [Alex Andorra]: Yeah. [Max]: yeah, then Climacy does the dirty work, the sampling and all that for you. And yeah, [Max]: but it already gives you an intuitive idea of how the modeling works. And yeah, that's [Max]: absolutely [Alex Andorra]: Yeah, yeah, yeah. [Max]: super [Alex Andorra]: No, [Max]: cool. [Alex Andorra]: it's really fun. Well done on that. And so I'm curious, what are your, do

57:19

[Alex Andorra]: you have any ideas? Do you want to keep working on this model? Do you have [Alex Andorra]: any ideas on where to take it from what it is right now? Um. [Max]: Yeah, that's a good question, actually. So definitely the model can be improved. And [Max]: definitely, it's all depending on the features that you have and the data that you [Max]: have. And I think the clubs, they have so much more [Alex Andorra]: Yeah.

57:50

[Max]: interesting data than I have. And they could build many, many more interesting factors [Max]: according to our differences across [Alex Andorra]: Oh yeah, [Max]: teams. [Alex Andorra]: for sure. [Max]: So yeah, I really don't know because I tried to reach out to a couple of clubs, [Max]: let's say. But I don't know. there was nothing really coming back. So yeah, apparently, [Max]: perhaps they're not interested in that or maybe they have their own models already

58:09

[Max]: or something. So I really don't know. I'd be excited to work on that. But as you [Max]: said, it's rather a side project that I did once upon a time. And yeah, it's not [Max]: really related to economics or finance. That's why I'm currently working absolutely [Max]: on other stuff. But yeah, I would love to work on that in that regard. But yeah, it [Max]: seems not. not so many teams are picking up on that, at least to those that I reached

58:36

[Max]: out. And it seems to be European clubs. Um, because in part of your last episodes,

58:42

[Max]: I heard people talking about that in the United States, it's pretty different. And, [Max]: um, yeah, uh, there are a lot of, apparently a lot of clubs already trying to implement [Max]: that to really try to understand the inherent latent skill of, of players, not necessarily [Max]: in soccer, but in baseball or in [Alex Andorra]: Yeah, [Max]: other, [Alex Andorra]: oh, especially [Max]: um, in other disciplines.

58:59

[Alex Andorra]: baseball. Yeah, yeah, yeah. So this is sad, but I'm kind of reassured to [Alex Andorra]: hear you say that because I do think it's a huge area of improvement that [Alex Andorra]: there is in Europe. And clubs just don't seem to be very interested. The [Alex Andorra]: thing I know is that a few English clubs are using data pretty heavily, like Liverpool.

59:27

[Alex Andorra]: Manchester City, clubs like that, but still is kind of the exception. I [Alex Andorra]: know Toulouse now in France, which is a small club, and that makes sense. [Alex Andorra]: If you're a small club, you have less money, so you have much more competitive [Alex Andorra]: pressure to find good players, which you are not overpaying, which is basically

59:48

[Alex Andorra]: where science can help you. You don't want to pay for just a name. You [Alex Andorra]: want to pay for someone who has a name because... he's got talent, not [Alex Andorra]: just because he's got a name. So it's like, to me, everybody should do that. [Alex Andorra]: And I just don't understand why they don't. Because it's just like, that's [Alex Andorra]: also the beauty of sport, right, you don't care about the name, you care about

01:00:13

[Alex Andorra]: what someone can do and if they have talent or not. Like, you should not care [Alex Andorra]: at all about the name, about the color of the skin, about nothing else, [Alex Andorra]: but what they can do on the field. And... Yeah, like to me that if I had [Alex Andorra]: a club, that would be one of my first priority. How do we make sure we optimize

01:00:32

[Alex Andorra]: the way we are signing the players because it costs a lot of money. So. [Max]: I think one club that also does a lot of that data work is in Denmark, the FC Midjartland [Max]: or something. I think [Alex Andorra]: Uh-huh. [Max]: the name I got it completely wrong. But I heard once upon a time that they're really [Max]: investing a lot in data science and trying to assign players according to data or at least

01:00:58

[Max]: incorporate data a lot in their daily training exercises and all that. So yeah, they [Max]: are one of the cutting edge maybe there in Europe as well. Small club, [Alex Andorra]: Mm-hmm. [Max]: but yeah. I think they won the Danish Championship a couple of years ago.

01:01:13

[Alex Andorra]: Yeah, not surprised. I mean, something I see a lot, at least in France, [Alex Andorra]: and I've seen that a lot also on electoral forecasting, is basically this [Alex Andorra]: idea that if you start doing that, you're basically becoming kind of inhuman [Alex Andorra]: and you make players being robots. Basically, that's really an interesting thing [Alex Andorra]: to me because one of the spots that really use data heavily is cycling. A

01:01:42

[Alex Andorra]: lot of the teams are using now data. Here, again, thanks a lot to the British, [Alex Andorra]: which often in Europe are the first ones to take up the data wave. And so [Alex Andorra]: I know, for instance, Bradley Wiggins, I think he's won the Tour de France. [Alex Andorra]: I don't remember how many times, but a lot of times. And basically, a lot. The

01:02:06

[Alex Andorra]: whole team was using data to optimize the performances of the team. And [Alex Andorra]: that was one, like the British started being like, okay, we need to get back [Alex Andorra]: on our circling game. They started using data extremely optimally and well, they [Alex Andorra]: did. And thanks to these, basically a lot of the teams started to do that again.

01:02:25

[Alex Andorra]: And the Tour de France is extremely optimized on that. But it's funny because when [Alex Andorra]: you hear the mediatic coverage of that, at least in France, it's a bad thing [Alex Andorra]: because it's like players are becoming robot. and they cannot eat what they [Alex Andorra]: want at the time they want. And they like, it just gets the magic out of [Alex Andorra]: the Tour de Francois and I strongly disagree with that, of course, because the

01:02:55

[Alex Andorra]: performances get better in a clean way, of course. Well, then that's just [Alex Andorra]: better for everybody because the show is going to get better. And also We're [Alex Andorra]: talking about the Tour de France or professional athletes. Like the goal is [Alex Andorra]: not to recreationally do that. They do that for a living. Um, so it's important [Alex Andorra]: for their own basically income. Uh, but also they do that because they want

01:03:23

[Alex Andorra]: to be the best. Is it, they are not doing that because, well, they just [Alex Andorra]: want to cycle on the weekends, right? They cycle for living. So yeah, sure. [Alex Andorra]: If you're an amateur cyclist, then okay. You don't need the same. structure [Alex Andorra]: as a professional cyclist. But even then, if you want to improve your performances [Alex Andorra]: as an amateur cyclist, you're going to need to optimize some of the things.

01:03:49

[Alex Andorra]: And if you really care about it, you're going to need to optimize your nutrition, [Alex Andorra]: for instance, and maybe when you take your meals or else. But if you're [Alex Andorra]: a professional, the one slightest change can mean you're going to have to take [Alex Andorra]: your meals or else. perform one second better or two seconds better, which [Alex Andorra]: can make you win the Tour de France or not. So I don't understand this argument

01:04:12

[Alex Andorra]: in these contexts where you're trying to optimize performance. For me, it's [Alex Andorra]: like not something that should count here. They are not doing that for pleasure [Alex Andorra]: only. [Max]: I think absolutely agree. Absolutely agree. It should be incorporated much more, [Max]: especially for the clubs. In the end, I think it will pay off as you lay it out. [Alex Andorra]: Mm-hmm. [Max]: You want to pick a lemon, and [Alex Andorra]: Yeah.

01:04:35

[Max]: you just rather pick it. Yeah. [Alex Andorra]: Yeah, yeah. No, I mean, I have to say it's like, it's an interesting topic [Alex Andorra]: for me because I'm trying to crack that nut and I cannot crack it for now. [Alex Andorra]: Like, understand why basically the clubs in Europe are not really interested [Alex Andorra]: in that. Because I don't really care about the Chinese side or else. I'm like, [Alex Andorra]: once the club starts picking that up, then everybody will have to. But what

01:05:00

[Alex Andorra]: I'm trying to understand is why the clubs don't do that. because it's just [Alex Andorra]: leaving gates on the table. And I'm just super curious about why they would [Alex Andorra]: do that from a sociological standpoint, honestly. Because I've seen a lot [Alex Andorra]: of clubs using, they have data science teams, but they use it for marketing. [Alex Andorra]: That's [Max]: I see, [Alex Andorra]: such a [Max]: I [Alex Andorra]: shame. [Max]: see.

01:05:27

[Alex Andorra]: And I don't know why. So if anybody [Max]: there. [Alex Andorra]: knows, please get in touch. If anybody is working in a club, please get [Alex Andorra]: in touch with Max or me, because I want to know about it. We don't even need

01:05:40

[Alex Andorra]: to work together. I would be happy to help you out with a model, but for [Alex Andorra]: now, I just want to know why and what are the internal factors, because [Alex Andorra]: definitely there is something going on, but I don't know what it is, and [Alex Andorra]: I'm just curious about it. So yeah, to try and make it a bit more constructive, [Alex Andorra]: do you have any idea on how we personally in the data world could change

01:06:14

[Alex Andorra]: the status quo in that regard? And not only for spots, but that's also true [Alex Andorra]: for a lot of domain where more robust application of the scientific method [Alex Andorra]: would be useful. But it's hard to get it done. Do you have any ideas personally [Alex Andorra]: on how that status quo could be changed? [Max]: Yeah, I think it's really hard to say. It depends on the willingness to adopt these,

01:06:44

[Max]: to be open to these methods, I would say. And the players play an important part, [Max]: or I think the crucial part, because if the players are not willing to adopt these [Max]: additional insights, I would say, it's just not possible. [Alex Andorra]: Mm-hmm. [Max]: But for sure, I mean, as you say, it's management, it's internal. things that are [Max]: going on there, politics potentially, but I really don't know. How can someone resolve

01:07:12

[Max]: that? I don't know. I regard it always as, for sure, you shouldn't base all your decisions [Max]: on this model or on a single model or so, but it can help [Alex Andorra]: No for sure. [Max]: stimulate your decision process, and I think it's a useful addition. And in the [Alex Andorra]: Yep.

01:07:27

[Max]: end, for sure, there might be an upfront cost, basically, to implement, to get the data, [Max]: to implement the model, to hire people to produce that, but In the end, it actually [Max]: may pay off economically because it may save you from picking a lemon overpaying massively. [Alex Andorra]: Oh yeah, for sure. [Max]: So [Alex Andorra]: Yeah, yeah. [Max]: yeah, I see it really as a worthwhile investment. [Alex Andorra]: No official, [Max]: I think the US [Alex Andorra]: yeah.

01:07:49

[Max]: sports has demonstrated that. [Alex Andorra]: Yeah, yeah. I mean, just look at the US, just look at all the other fields, [Alex Andorra]: especially marketing, for instance, which is starting and already started to adopt [Alex Andorra]: data analysis and modeling aggressively and they just like, we do that all at the labs, [Alex Andorra]: basically making them save a lot of money and not only save money, but make

01:08:11

[Alex Andorra]: more money. So like, it's just, yeah, like, I don't think this is a question, [Alex Andorra]: but yeah. I mean, something you can do. I would think if you're interested [Alex Andorra]: in it and have the time, something maybe that could work is if you could make [Alex Andorra]: some predictions with your model, basically. And I would think to get it per [Alex Andorra]: player, you would probably need some hierarchical structure in that to get

01:08:37

[Alex Andorra]: some better predictions. But once you get there, you have something of a [Alex Andorra]: web page with basically the predictions of the model per player saying [Alex Andorra]: basically, this player is basically overvalued and this player is undervalued, [Alex Andorra]: basically based on the results of the model. And then basically see what that [Alex Andorra]: gives you during the season because at the beginning of the season, you

01:09:04

[Alex Andorra]: can see that player is basically undervalued. He's gonna perform better than [Alex Andorra]: what the market currently think. And then people see that it's true. All that's [Alex Andorra]: a clear sign that basically these kind of... methods and models are working [Alex Andorra]: and so that could spark some interest. Um, because definitely demonstrating [Alex Andorra]: what a model is for. Because I'm my hinge, hinge hunch. I think it's hunch.

01:09:31

[Alex Andorra]: My hunch is that, um, basically the decision makers in the clubs are not data, [Alex Andorra]: um, they don't, don't really know what data is about. and they even don't [Alex Andorra]: know what a model is and what it can give you. But if you are able to demonstrate [Alex Andorra]: what a model can give you, because they don't care about the model, the priors, [Alex Andorra]: the parameters, stuff like that, they just care about the results of the model.

01:09:56

[Alex Andorra]: So if you can demonstrate the results of the model and even better what the [Alex Andorra]: model can say about recruiting that player or not recruiting that player, [Alex Andorra]: that would maybe have a better impact, or at least I would say it increases [Alex Andorra]: the probability that the impact... These methods can help get noticed.

01:10:17

[Max]: Oh, absolutely. That's absolutely the case. For sure, it depends on having the real-time [Max]: data, basically getting the real-time data. [Alex Andorra]: Exactly. Yeah. [Max]: That's an upfront cost that you would have to pay. No, but that's actually the intent, [Max]: really. This is the intent to run that model for multiple players as part of the workbook, [Max]: for example, to lay it out and to compare which players perform well or not. And you

01:10:41

[Max]: see it, for example, Cristiano Ronaldo, when he won the. player of the year award in [Max]: 2008. He was basically in the middle of the pack in [Alex Andorra]: Mm-hmm. [Max]: that season. So there were other players actually outperforming, for example, Imera [Max]: Berbertov in that very season. He was playing for Tottenham [Alex Andorra]: Yeah.

01:10:58

[Max]: later on in the year, thereafter signed by Manchester United. So you see that. And [Max]: for sure, there's a lot of subjective judgment coming in from when you observe it [Max]: and you see the model telling you something completely different. But this is stimulating [Max]: and it should potentially update your priors, so [Alex Andorra]: Yeah, [Max]: your [Alex Andorra]: exactly. [Max]: subjective price.

01:11:19

[Alex Andorra]: Yeah. And forces you to lay out your priors clearly [Max]: Thank [Alex Andorra]: and [Max]: you. [Alex Andorra]: on paper. So it's actually very important. Yeah. So I would say definitely [Alex Andorra]: something like that. And if you have the predictions for the biggest number [Alex Andorra]: of players on a webpage and basically betting based on the model, saying [Alex Andorra]: that this model, this player is going to over perform. in respect to the

01:11:46

[Alex Andorra]: market or underperformed in respect to the market. That's an interesting [Alex Andorra]: thing. And also, as you were saying, for the individual rewards, where the [Alex Andorra]: name is extremely, like, counts a lot, where you can see someone like Messi, [Alex Andorra]: who is, yeah, sure, an incredible player. But the number of times he's got the [Alex Andorra]: golden... How is it called in English? Ballon d'or? Golden ball, I don't

01:12:13

[Alex Andorra]: know. You could argue that some of these seasons where he did get the reward, [Alex Andorra]: maybe there were other players who were actually overperforming him, but they [Alex Andorra]: don't have the name recognition, so they are not scrutinized as much. They don't [Alex Andorra]: have the confirmation bias going in their favor, where it's like everybody's [Alex Andorra]: looking at Messi because they already know he's extremely good, so they just

01:12:42

[Alex Andorra]: look at confirming the fact that he's... Incredible, which he is, but maybe [Alex Andorra]: not all the time, so as to get so many rewards. So yeah, like that. To me, [Alex Andorra]: that would be a really good way of demonstrating the utility of these methods. [Alex Andorra]: Basically, [Max]: Thank [Alex Andorra]: making [Max]: you. [Alex Andorra]: it really concrete for the decision maker. [Max]: Thank you.

01:13:05

[Alex Andorra]: So before we close up the show, I'd like to get back a bit on your personal [Alex Andorra]: experience with bass. And I'm curious, what was your main pain point on this [Alex Andorra]: project, the Sucker Factor model, and just in general, when you're using the [Alex Andorra]: bassian workflow, what is your main pain point right now? [Max]: Yeah, so in that project, I really have to admit that Mayer was lucky. But [Alex Andorra]: Yeah.

01:13:35

[Max]: there wasn't really a huge pain point. I mean, it's not [Alex Andorra]: Uh-huh. [Max]: something publishable for a paper or so. It's just basically sketching the idea [Max]: behind the model and basically showing the outline of the model, what it can give [Max]: you. [Max]: pretty well. I didn't really, I don't remember any really big problems. So then when

01:14:05

[Max]: I looked at the model evaluation, everything looked fine. I mean, for example, we can evaluate [Max]: the how well the model works is when you look at in this logistic regression at [Max]: the area under the curve, for example, it's a popular metric. And it wasn't a reasonable [Max]: ballpark. And that was fine for me so that the model didn't the results were really [Max]: what you would have, or that it's kind of reliable, the results. So that was not much

01:14:33

[Max]: of a pain point. And that was also nice for me to see that, yeah, it's a simple model [Max]: and it works also pretty simply. And yeah, that was a project that I was pleased [Max]: to see that there were not many obstacles that I had to overcome.

01:14:51

[Alex Andorra]: Nice. Yeah, that's good to hear. And so in general, in the Bayesian workflow, [Alex Andorra]: do you identify something in your own learning that is costing you to learn [Alex Andorra]: right now, that has cost you to learn, and you would like an easier way [Alex Andorra]: to have learned that? [Max]: I mean, I have to say that, for example, with all the different samplers that are out [Max]: there, that's not my major field. I would like to learn much, much more about the inner

01:15:23

[Max]: workings of all these samplers. I mean, I code maybe one of the simpler ones, myself [Max]: maybe once or so, but then I really resort to open source packages for that. But to really [Max]: understand what's going on, I think, yeah. looking deeper into that, that's definitely [Max]: something I would like to do and would need to do. [Alex Andorra]: Mm-hmm. [Max]: But yeah, I think that's basically the math of it. I think it's the most fascinating

01:15:53

[Max]: stuff and how it really works and how it's then implemented in code. I think that's [Max]: the most fascinating stuff. But yeah, the beauty of PyMC then is if you really are [Max]: interested in the outcome and want a fast outcome, yeah, it's pretty intuitive. [Max]: Yeah. [Alex Andorra]: Nice. OK. Well, it's good to hear. Yeah, and I'm asking that from a developer [Alex Andorra]: perspective and also teacher perspective. That's always interesting for

01:16:16

[Alex Andorra]: me to get a peek in the learning experience of the people. Cool. So before we [Alex Andorra]: close up the show, is there a topic I didn't ask you about and that you'd [Alex Andorra]: like to mention? [Max]: Well, actually, my career hasn't progressed so much so far. So I think we covered everything [Max]: there. So, oh yeah, that's pretty interesting. And yeah, you covered actually everything. [Alex Andorra]: Awesome. Yeah, we did record for a long time, so that's a price.

01:16:52

[Max]: Thank you. [Alex Andorra]: Yeah, and I'm happy. I got to ask you the main thing I wanted to ask you, [Alex Andorra]: so that's super cool. In a reasonable amount of time, I'm sure the listeners will [Alex Andorra]: appreciate it, because the last two episodes were the two longest of the whole [Alex Andorra]: podcast. So it's good to get back to reasonable amounts of time for people, [Alex Andorra]: I guess. And yeah, so before letting you go, I'm gonna ask you the last

01:17:22

[Alex Andorra]: two questions I ask every guest at the end of the show. So Max, if you had [Alex Andorra]: unlimited time and resources, which problem would you try? [Max]: Yeah, so I think one of the most popular answers is climate change. And definitely, [Alex Andorra]: Mm-hmm. [Max]: it's, it's probably the most present problem, especially here in Milan currently. [Max]: You really feel it.

01:17:47

[Alex Andorra]: Ha. [Max]: But when I've been or throughout the time I've been working on a bit of climate [Max]: econometrics, let's say, forecasting RTC, as I saw what people are really doing [Max]: in climate and what, yeah, they're fascinating people out there very, very intelligent people.

01:18:02

[Max]: So I think my throwing money on me would be wasted in that regard. I mean, what I'd [Max]: be rather interested in is like, yeah, maybe implementing that into sports into sports [Max]: analytics, right to, to allow teams to access data to have access to data, and [Alex Andorra]: Mm-hmm.

01:18:22

[Max]: to kind of create that level playing field across players and then really, yeah, [Max]: it's an investment and people spend a lot of, especially in investing and in banking [Max]: and finance, spend a lot of time on crunching numbers and why not do that in sports as well [Max]: if you have the data available. So yeah, I'd be very, very interested in working on [Max]: that. That's for sure.

01:18:50

[Alex Andorra]: Yeah, I love it. Me too, for sure. That's a good one. And if you could have [Alex Andorra]: dinner with any great scientific mind, dead, alive or fictional, who would it [Alex Andorra]: be? [Max]: Yeah, well, that's a that's pretty a tough question, I have to say. So [Alex Andorra]: Yeah. [Max]: no, really, it's, yeah, there's so many amazing people out there. And when you read [Max]: papers, that's really incredible. What people are doing. And so yeah, there's so many

01:19:21

[Max]: people I'd like to talk to you on. Well, one, one for sure. It's Frank Debal, the guy [Max]: who basically invited me to the University of Pennsylvania, because that was a declining [Max]: point in my PhD, absolutely. But then if I could pick one as professors should expand [Max]: on your network, basically, it would be Ben Bernanke. He [Alex Andorra]: Mm-hmm. [Max]: was former president of the Federal Reserve. He received [Alex Andorra]: Mm-hmm.

01:19:47

[Max]: the Nobel Prize in economics. Well, people say there's no Nobel Prize in economics, but [Max]: yeah, the Ricks Bank prize last year for his work on banks and financial crisis. [Max]: Yeah, that would be super interesting to talk to him. He served his country basically. [Max]: Then he was assistant professor. So how he managed all that. And yeah, that would be

01:20:10

[Max]: super interesting to talk to him. Phenomenal scholar. And I like reading his papers. So [Max]: yeah, I think that would be super cool. [Alex Andorra]: Nice, yeah. Love it. Very nerdy answer. [Max]: Okay. [Alex Andorra]: Awesome. Well, thanks a lot, Max. That [Max]: Thanks, Adam. [Alex Andorra]: was really interesting. You allowed me to rant about some of my pet peeves [Alex Andorra]: about [Max]: Thanks [Alex Andorra]: data [Max]: for watching!

01:20:35

[Alex Andorra]: analytics and soccer. And I hope people learned a bit more. And of course, [Alex Andorra]: if they are curious, as usual, I will put a link. resources and a link to [Alex Andorra]: your website in the show notes for those who want to dig deeper. Thank you [Alex Andorra]: again Max for taking the time and being on this show. [Max]: Thanks Alex. It was a pleasure.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript