[Alex]: OK, so we are live, Ben Vincent. Welcome to Learning Beijing Statistics. [Ben]: Hi Alex, it's very good to be here.
[Alex]: yeah I'm really happy to have you on the show because we've been working together [Alex]: for quite a while now and I wanted to have you on the show for a while but wanted [Alex]: to find the right angle because you do so many things that I was like I don't know [Alex]: which one we should focus on so fortunately I made my choice and today we're mainly gonna [Alex]: talk about causal inference You do a lot of things, so we'll see where the discussion
[Alex]: takes us. Also, I shared that recording link in the slack channel of the patrons of [Alex]: the show, so maybe some of them will show up at some point in the audience and we'll [Alex]: have some live questions. Maybe it will just be the two of us. We'll see what life [Alex]: has for us. Yeah, basically that's it from my end. A quick note that I am, as you [Alex]: may hear, a bit sick. So I'm going to try to cough as little as possible, but I cannot
[Alex]: make... that's the only promise I can make. Okay, so let's dive in. And as usual, [Alex]: let's start with your origin story, Ben. So how did you come to the world of... [Alex]: to the fabulous world of statistics and probabilistic modeling, and how sinuous [Alex]: of a path was it. [Ben]: Yeah, so it certainly wasn't direct. So I think most people I know have some kind [Ben]: of formal education in stats training to some capacity, whereas I didn't really have
[Ben]: any. So for many years I was teaching, I was faculty in an experimental psychology [Ben]: department. And of course there... we start the undergraduates off quite early with [Ben]: stats. Of course, it's all frequentist, but I didn't have any of that. My undergrad [Ben]: and PhD was basically neuroscience. [Alex]: Mm-hmm. [Ben]: And while clearly it's a quantitative topic, you didn't really get the same immersion
[Ben]: in statistical training. that you would in experimental psychology, for example. So [Ben]: I was definitely a latecomer. That said, I think my PhD in postdoc prepared me well [Ben]: on the kind of quantitative way of thinking, particularly programming and things like that. [Ben]: So my background with coding started at a very young age. And [Alex]: Oh [Ben]: I [Alex]: yeah. [Ben]: think that... Yeah, so it started off probably when I was about seven. [Alex]: Oh damn.
[Ben]: And yeah, so I learned BBC basic very early on and I think that was [Alex]: BBC [Ben]: that was [Alex]: basic [Ben]: a [Alex]: is that related to the BBC, the British broadcast channel in any way or not? [Ben]: It is, yeah, it is. So I'm not 100% sure exactly how that came about, but in the kind [Ben]: of relatively early days of computing, I think that there was this kind of clear prediction [Ben]: that people made that computing was gonna be a big thing. And so the BBC as kind of in
[Ben]: its kind of educational capacity started to kind of get involved in this. and they maybe [Ben]: supported the development of a language for, I think it was like BBC micro systems [Ben]: or Acorn micro systems. I can't quite remember. I was only seven. [Ben]: So yeah, that was a very kind of early introduction into coding. And I think people... [Ben]: who code obviously kind of appreciate that it makes you think slightly differently.
[Alex]: Mm-hmm. Yeah. [Ben]: So it gives you a good foundation, I think. Anyway, so my PhD kind of involved modeling [Ben]: neural networks. And so the basic idea was to answer a question really about the human [Ben]: brain, about physiology. So the question is, how does the brain actually wire itself [Ben]: up? Are there any underlying principles that we can use in order to kind of understand [Ben]: this very complicated organ? And my particular PhD basically was looking at the question of
[Ben]: the role of energy efficiency in neural networks. And [Alex]: Okay. [Ben]: we basically use the visual system. because as like a model system, because the neural [Ben]: wiring of the visual system is really well known. [Alex]: Mm-hmm. [Ben]: So what that allowed us to do was to make these little neural network simulations [Ben]: and put in various different costs, like for example, similar to what we now know as
[Ben]: weight decay in machine learning. or you could even think of it as like Laplacian [Ben]: priors from the Bayesian perspective. But from a physiology perspective, you can [Ben]: basically think of these things as energetic costs. So for every connection between one [Ben]: neuron and another, you can imagine that that takes energy in order to transmit information [Ben]: through the synapses. And you also have costs. simply for the number of neurons that
[Ben]: you have. So we were trying to understand why we have this quite peculiar arrangement [Ben]: of the visual system. So anyway, we were doing coding, a very early neural network stuff, [Ben]: compared to the things that people do now, it was trivial. So this was back in the [Ben]: era of shallow neural networks and also... wasn't any Bayesian stuff in sight at this
[Ben]: point. So later on I kind of started to get more exposed to, well in fact during [Ben]: my PhD I was kind of exposed to the Bayesian way of thinking through my [Alex]: Mm-hmm. [Ben]: supervisor, that's Dr. Roland Baddeley, and the way how he presented it was sufficiently [Ben]: like simple. in terms of the conceptual basis of Bayesian inference is seductively [Ben]: simple. And so you kind of think that you get it straight away.
[Alex]: No. [Ben]: But then when you try to actually think about it in concrete terms and maybe implement [Ben]: something without a whole bunch of additional stuff, it actually starts to get quite tricky. [Ben]: So I spent a long... period where I was kind of convinced that the Bayesian approach [Ben]: was definitely interesting and it sounded sufficiently contrarian for me to be interested.
[Ben]: But it wasn't maybe until the postdoc phase and early in my lectureship that I felt [Ben]: like I really [Ben]: had like a decent understanding to the extent that I could code these things up myself. [Alex]: Yeah that's super cool, I didn't know you started so early actually in your BBC time. [Alex]: And yeah I do agree that it's really cool to start so early, not really for their
[Alex]: language barrier, but I would say as you were saying for the kind of skills. that [Alex]: it gives you in the sense that it teaches you how to basically fail all the time and [Alex]: with be okay with failure and iterative improvement and in a way not only looking for [Alex]: success but also being comfortable. failing in a way which is something that at
[Alex]: least in my education in France lacked a lot. I don't know [Ben]: Hehehe [Alex]: about now but it's like really trying to make you say the things that you have to [Alex]: say to have a good grade instead of making you think through a solution. So really [Alex]: rewarding the result much more than the path to get to the result. [Ben]: Yeah, completely. I think the way how... The particular thinking skills it gives you,
[Ben]: I think are fantastic. And I didn't think about it back then, but the way how I might [Ben]: think about it now is... I was doing, you know, silly stuff. Like, I was simulating [Ben]: what happens if you have, like, little bacteria, which are just represented by pixels [Ben]: on your screen. and they kind of move around and there's food and every time they [Ben]: get enough food, then they replicate but if they don't get enough food, then they
[Ben]: die. So I kind of, you're basically creating your own miniature simplified world where you [Ben]: have clear, well-defined rules. And I might call that now like a data generating [Ben]: process. because you can tweak, you know, what's the lifespan, how much food do they
[Ben]: need, and different kinds of outcomes happen. So I really wish at some point, even [Ben]: at that point, I think my brain would have been amenable to Bayesian thinking because [Ben]: there you're basically just doing the opposite of that. [Alex]: Hmm. Yeah. Yeah. Yeah. Talking about Bayesian thinking. Do you remember when you [Alex]: were first introduced to Bayesian methods and why they stuck with you?
[Ben]: Yeah, so I think broadly speaking, the core kind of introduction to the Bayesian ideas [Ben]: was through my supervisor, Roland Baddeley. But because the Bayesian ideas didn't directly [Ben]: relate to the PhD work that I was doing, the actual concrete implementation of it was
[Ben]: maybe left a little bit opaque at that point. So I think really the exact point is [Ben]: kind of hard to define, but during my postdoc and early faculty time, I started to [Ben]: kind of shift just from making models which you kind of qualitatively compare to [Ben]: something that you might know and see if it's a good fit. To now try to... quantitatively
[Ben]: compare these models that you create to what we know about how people behave. So [Ben]: in particular, coming at it from like a cognitive, experimental, psychology perspective, [Ben]: we were trying to model visual attention. So [Alex]: Mm-hmm. [Ben]: both overt visual attention, which is basically, how do I move my eyes around? in [Ben]: order to gather information that I might need in order to solve problems and complete
[Ben]: tasks. But then you also have this thing called covert visual attention, which is maybe [Ben]: what most people kind of think of when when that phrase is mentioned is basically [Ben]: if you have a whole bunch of information coming in, some of that stuff is going to be [Ben]: more relevant to you than other stuff. And so if you're... If you can narrow your [Ben]: attention on the kind of appropriate information, then you're essentially increasing [Ben]: your signal to noise ratio.
[Alex]: Mm-hmm. [Ben]: You're gathering in the information that is informative to help make a decision and [Ben]: kind of eliminating some of the information that is not relevant. And so we tried to set [Ben]: up experiments both with eye movements and covert attention. where we basically put [Ben]: real people in psychology experiments. We made them look at computer screens and we [Ben]: showed them different kind of simplified visual situations and we gave them certain
[Ben]: tasks. And kind of our approach as modelers was to basically think up of possible ways, [Ben]: like what's going on inside this black box brain. that is actually leading them to [Ben]: behave in a certain way, solve a problem in a certain way. Um, so, so the, the core [Ben]: of that, the, the way, the kind of Bayesian way of thinking that Roland, um, introduced
[Ben]: to me was basically translated into that. So we were essentially asking, um, how would [Ben]: we expect people to behave or how would we expect them to solve a problem if they were [Ben]: Bayesian. So stepping back a little bit, I kind of see it almost as jumping into the [Ben]: kind of slightly more complex side of Bayes before the kind of easier side. So the [Ben]: easier side, I guess, or the main way how Bayes is kind of brought up to people is
[Ben]: through like describing your data. So what we might call like regular. statistics. [Ben]: But Bayesian modelling is a little bit slightly different in that you're not just [Ben]: trying to describe the data as such, but you're more actually trying to really describe [Ben]: the data generating process and that in this case involved like human cognition. [Alex]: Yeah, I see. Yeah, super interesting. And so that's how basically you started doing
[Alex]: bass and now you are basically doing that full time. Right? So maybe define the [Alex]: work that you're doing nowadays for listeners and also the topics that you are [Alex]: particularly interested in. [Ben]: Yeah, sure. So the main stuff that I'm doing at the moment is basically working as [Ben]: a consultant data scientist. And most of that stuff is through PMC labs. And that's [Ben]: largely how we know each other. I do a little bit of random consulting on the side,
[Ben]: but the majority of it is through PMC labs. So we do a bunch of different things. [Ben]: part of it is educational. So I think after all my years of teaching in academia, [Ben]: that maybe set me up to enjoy teaching Bayesian methods to people in industry. But [Ben]: we also just develop novel models and solve hard problems for people who have kind [Ben]: of... challenging data science questions in order to solve business problems in industry.
[Ben]: So that's the majority of what I'm doing. [Alex]: that sounds like fun [Ben]: It is. [Alex]: yeah and actually you do a lot of of causal inference we'll get back to the to the [Alex]: teaching aspects if we have time at the end of the show because that's always also [Alex]: something that i'm very interested in but yeah like to start diving in a bit about
[Alex]: your main focus these days let's talk about causal inference. So can you define causal [Alex]: inference for us and tell us how it differs, if it differs from Bayesian stats? [Ben]: Yeah, so just to kind of put this in context, I thought that over the years of exposure [Ben]: and kind of experience to Bayesian inference that I was a reasonably clued up scientist [Ben]: and a now data scientist. But then I kind of became aware through a really bizarre
[Ben]: kind of situation. that I was actually completely clueless about causality. And I [Ben]: think that this is actually like a, something that bubbles away in the brains of many academics [Ben]: because in the same way that I didn't have formal training in statistics, I think [Ben]: that unless you're kind of lucky, maybe you're in an area of epidemiology, for example,
[Ben]: I think it's extremely rare to get formal training in... causal inference. And so a lot [Ben]: of the ideas that we, we gather over time about causality focus mostly on kind of [Ben]: heuristics and things like this and not very much on kind of formal ideas about what [Ben]: causality is. [Alex]: Mm-hmm. [Ben]: So yeah, do you want me to go on and describe? like what causality is or expand [Ben]: a bit on how I got interested in causality.
[Alex]: Yeah, maybe define a bit the causality part, basically to set the ground up for people [Alex]: so that they understand the different specification stats. And then we'll dig into [Alex]: causal pi to actually do causal inference in the Bayesian framework. [Ben]: Sure. So, I mean, one way how you can slice things up is to basically think of statistical
[Ben]: relationships as being kind of one particular thing. And then you could further subdivide [Ben]: that into, you know, are you gonna take a frequentist approach or a Bayesian approach? [Ben]: And the other kind of angle would be looking at like, at... causal relationships. [Ben]: And you could almost think of that as orthogonal to what particular statistical approach
[Ben]: you use, whether it's frequentist or Bayesian. But really the way how I would kind [Ben]: of describe a causal relationship, I think this is gonna be fairly close to the [Ben]: kosher definition, is that a relationship can be said to be causal if you go in and intervene [Ben]: um on the cause and that then changes uh the effect and that opens up a whole bunch [Ben]: of questions like how do you know uh whether this thing really causes something
[Ben]: else because you can't simultaneously um kind of fall off a bike and not fall off [Ben]: a bike so uh yeah i think the the key [Ben]: If you intervene and that changes an outcome, then you can describe that relationship [Ben]: as causal. Anything else is basically statistical. There's no directionality there. [Alex]: I see. And so my kind of question would be, well, then why aren't we doing that all [Alex]: the time? And are there circumstances in which causal inference is most helpful?
[Ben]: Yeah, so in terms of why aren't we doing that all the time is an exceedingly good question, [Ben]: because the way, if you were to kind of list out the order of topics in a book, [Ben]: in terms of how people might come across it, or the level of perceived difficulty, then
[Ben]: it might be something like linear modeling type of stuff. like t-tests, ANOVAs, and then [Ben]: you might learn kind of hierarchical stuff, and then you might learn, oh, all of [Ben]: this stuff that I learned can be approached from the Bayesian perspective, and life actually [Ben]: gets much simpler if you're a Bayesian, and so you might ask kind of, why didn't they [Ben]: just teach me the Bayesian stuff to begin with? [Alex]: Mm-hmm.
[Ben]: But then, you know, you're looking toward the back of the book, the more complex chapters. [Ben]: on causality and you think, wow, this is like advanced stuff. But again, uh, I'm [Ben]: kind of asking myself, why, why isn't this stuff taught as, um, why isn't there a [Ben]: module, for example, that all undergraduates take in year one on, on causal inference, because, [Ben]: um, this stuff can actually be taught without getting that quantitative, to be honest,
[Ben]: largely. Like the majority of it can just be taught with a set of ideas and words [Ben]: and just concepts without having to get into maths or coding. [Alex]: Mm-hmm. [Ben]: That certainly does like add to your level of understanding. But yeah, I don't really [Ben]: know a good answer, but it's probably the same as why aren't we all learning Bayesian [Ben]: stuff first rather than frequentist stuff?
[Alex]: yeah and do you think it could be due to the fact that because I mean definitely [Alex]: one of the reasons that not everybody is using patient status well um sometimes it's [Alex]: not the right tool but also like historically it's been a bit harder to do now it's way easier [Alex]: uh and so like with path dependency things are take always a bit of time to to [Alex]: compensate, right? [Ben]: Yeah.
[Alex]: So is there something related with causal inference? And it's actually a question I wanted [Alex]: to ask you, which are mainly, what are some of the challenges or limitations that [Alex]: practitioners face when doing causal inference currently? Because that could be [Alex]: also a factor that impede the usefulness and the utility of causal inference tools.
[Ben]: Um, yeah, so I think. So just stepping back slightly, I think that you're right, there [Ben]: is a historical kind of effect that probably explains why we don't all learn causal inference [Ben]: in primary school. [Alex]: Mm-hmm.
[Ben]: I think going forward that doesn't necessarily have to be the case because now just in the [Ben]: same way that we have really great kind of educational material for Bayesian inference [Ben]: and we have really cool kind of relatively easy to use probabilistic programming languages
[Ben]: We That stuff can be bootstrapped. I guess to teach you about Causal inference so [Ben]: I found as I was kind of learning and I'm still very much learning about causal inference [Ben]: I found that a lot of those concepts were easier having a background in Bayesian [Ben]: inference, because you already know about the concepts of directed acyclic graphs, for
[Ben]: example, and you already know about observing variables. And so there are only like a handful [Ben]: of key concepts that you need to learn on top of that before you can start really [Ben]: diving in and start simulating real things to... to get a sense that you really understand [Ben]: what's going on.
[Alex]: Yeah, I mean to me the Bayesian framework is really close already to the causal inference [Alex]: framework because that generative modeling perspective is so ingrained in the Bayesian [Alex]: framework that once you're here already, well that you already have a foot in the door [Alex]: in a way.
[Ben]: Yeah, I mean the way of kind of describing models in terms of these directed acyclic graphs [Ben]: where you have these kind of pointy arrows and in Bayesian inference that kind of implies [Ben]: causality but it doesn't kind of necessitate it as such. But yeah, once you know some of [Ben]: those concepts then it's not a whole bunch more you need to learn on top of that
[Ben]: before you can really feel like you're getting somewhere. Um, but I think I missed [Ben]: out one of your, uh, questions earlier of, uh, like under what circumstances can cause [Ben]: winfrents be most helpful. Um, so [Alex]: Right. [Ben]: I, I missed that before. [Alex]: I mean, I think I interrupted you. [Ben]: Ah, [Alex]: So it's all on me. [Ben]: so [Alex]: But yeah, go ahead. [Ben]: yeah, [Alex]: Good point.
[Ben]: so in terms of my own question then, I mean, I think that there are a whole bunch [Ben]: of situations where it can be incredibly useful. So the first one is where there are [Ben]: a bunch of things called statistical paradoxes. And we all know that kind of paradoxes [Ben]: typically are just not really paradoxes but just expose not really the correct understanding
[Ben]: of how things work. So you can come across certain results, certain kind of empirical [Ben]: results and relationships between variables that really make you scratch your head and [Ben]: confuse you if you're coming from purely a statistical point of view. [Alex]: Mm-hmm. [Ben]: But once you start adding on a layer of causal thinking on top of that, all of a [Ben]: sudden, there are kind of very intuitive and appealing explanations for these things
[Ben]: that previously were quite confusing. So I guess that that's like an intellectual [Ben]: and practical thing. There's just moving from a situation where you don't really understand [Ben]: the data that you're looking at. to getting a much better understanding. I think the other [Ben]: situation that I would say is really key when causal inference is most helpful is
[Ben]: when you are doing kind of high stakes interventions. So... [Ben]: If you take purely a statistical point of view, you could be trying to do something [Ben]: like predict house prices based on a whole bunch of different predictor variables. [Ben]: And let's say your data set comes entirely from a low inflation era or a low interest
[Ben]: rate era. And you, um, maybe you do have the interest rates in your model, but the, [Ben]: the way in which you've structured your, your model and your causal understanding is, [Ben]: um, purely from like a statistical perspective and not necessarily from a causal perspective. [Ben]: And so if you, uh, go in and change things, like you start, um, marketing in a different
[Ben]: way, for example. or you move into a different regime where interest rates are much [Ben]: higher, you could come up with like embarrassingly wrong predictions when you take purely a statistical [Ben]: approach. Taking a causal approach definitely does not guarantee that you're always gonna [Ben]: make fantastic predictions. But I think it would in general. make you a little bit [Ben]: less wrong than if you took purely a statistical approach.
[Alex]: Yeah, yeah, yeah, in the kind of like in the long run that should be the best bet. [Ben]: Yeah, and when the stakes are very high, then that becomes increasingly important. So [Ben]: in medical situations, for example, if you are going to intervene in the form of like [Ben]: a surgery or a lifestyle intervention, [Ben]: you don't want to be doing that on some kind of downstream effect that like treating... [Ben]: a symptom for example, rather than an underlying cause.
[Alex]: yeah for sure and I mean actually I think we're pretty well set up now to talk [Alex]: a bit more in detail with causal pie which is one of the open source packages that [Alex]: you've single-handedly developed for all of us in the world so yeah can you give [Alex]: us an overview of causal pie and tell us how it tries to solve these challenges [Ben]: Yeah, so, yeah, I mean, just to say, I kind of led the coding efforts, but I did
[Ben]: definitely have some input and help from people. So yeah, so CausalPy is a Python-based [Ben]: package for doing causal inference. Now, just like how Bayes is a particularly large [Ben]: area, large field of study, so is causal inference. And so we absolutely do not try [Ben]: to create one overall, overarching package that deals with all causal problems. So we [Ben]: completely ignore a whole bunch of areas in the causal space. Like one thing that we
[Ben]: completely ignore is what's called causal discovery. So this is where you may be given [Ben]: a bunch of data. and your task might be to discover what the causal relationships [Ben]: are between the variables kind of automatically. And that's something that is hard and there [Ben]: are many packages already developed that try to do that with varying levels of success.
[Ben]: Rather, so CausalPy focuses on a small domain called quasi-experiments. So Quasi experiments [Ben]: are also known as natural experiments, and they differ in one key way from real experiments, [Ben]: namely that we do not have randomization in how we assign people or countries or companies, [Ben]: whatever the unit of measurement is, we do not randomly assign them to different [Ben]: groups. And so... there are many different situations where that can arise. So an example
[Ben]: might be Brexit. So if you think of Brexit as an experimental intervention and think of, [Ben]: well, what are the downstream effects of that intervention, in an ideal world, what [Ben]: you might do is conduct a randomized control trial and randomly assign Brexit type [Ben]: events to a bunch of different countries and see what happens and the idea there is [Ben]: that you're taking away the influence of confounding variables so in this Brexit
[Ben]: example you could think of each country has different compositions of Social and political [Ben]: and economic thoughts for example and different economic conditions that the people
[Ben]: are exposed to. Now, the problem, if you try to kind of ask the question of what's [Ben]: the causal impact of Brexit, is that... [Ben]: If you look at the consequences, you don't know very convincingly whether those [Ben]: consequences are due to the actual treatment, the Brexit, or the underlying confounding variables [Ben]: of the particular socioeconomic situation at the time. So there are all kinds of situations
[Ben]: like this where you just can't run a randomized control trial. And so you're left [Ben]: with... Quasar experiments, natural experiments, also known as kind of observational data sets. [Ben]: And all of these different situations can be subtly different, have different types [Ben]: of characteristics. And so there are a family of analysis methods that have evolved [Ben]: over time in order to basically help you analyze data in different types of situations.
[Ben]: CausalPy tries to bring together into one package a whole set of these things, but [Ben]: present a kind of a unified API and try to kind of strip down the ideas to kind of [Ben]: core basics so that it becomes much more approachable for people who want to run these [Ben]: kinds of analyses. [Alex]: Hmm. Yeah. That's super interesting. And, um, I mean, is there any, uh, real-world [Alex]: applications of CuzzlePie that you'd like to highlight today.
[Ben]: Yeah, yeah. So, aside from the kind of Brexit application, [Alex]: Mm-hmm. [Ben]: and that has actually been used, another kind of really cool real-world one, which we're [Ben]: kind of seeing as particularly relevant in the kind of marketing domain, is this idea
[Ben]: of geo-experiments. So, the idea here is that [Ben]: If you launch, let's say a marketing campaign or you do store renovations, you can't [Ben]: randomly assign which individual people are then exposed to that because presumably [Ben]: it's gonna be, if you do store renovations in an entire country, then everybody who goes [Ben]: to that store is gonna be exposed to it. Similarly, if you... launch a new advertising
[Ben]: campaign, you're not randomizing which individuals are exposed to that. The idea of [Ben]: geo experiments are basically that you have different geographical areas, some of [Ben]: which are exposed to some treatment or intervention. And I mentioned the examples
[Ben]: could be a marketing campaign or like store renovation, that kind of thing. Um, so [Ben]: the question you would have is, you know, did the intervention that maybe cost me money [Ben]: to do, uh, did that actually have a causal influence on an outcome that I'm interested [Ben]: in like sales, for example, or, uh, would that, would those changes have happened, uh, [Ben]: regardless? Um, and so that is tricky because. you don't necessarily have a random
[Ben]: assignment of which geographical region was actually exposed to the treatment. And [Ben]: you are likely to get systematic differences between each of the geographical regions. So [Ben]: it's quite likely that you have these kind of confounding variables that might influence [Ben]: the outcome, you know, other than the treatment that you're interested in. So for [Ben]: that. CausalPy can help because we have a synthetic control analysis method in there,
[Ben]: which you can basically use. And this is one of the examples that we've actually [Ben]: put into the package. [Alex]: Mm-hmm. [Ben]: So [Alex]: So [Ben]: yeah, [Alex]: yeah. [Ben]: many, many like real world applications of that. [Alex]: yeah yeah and will definitely put these links into the show notes they are on the documentation [Alex]: for causal price for sure folks feel free to check this out and I maybe something
[Alex]: I'm I wanna precise a bit is that idea of quasi experiments. So maybe can you define [Alex]: that a bit more for listeners because I think it's the first time we've mentioned [Alex]: that on the show. So yeah, like what is a quasi experiment and when would that be [Alex]: useful? [Ben]: Yeah, so quasi-experiments were also known as natural experiments. So I think one [Ben]: of the key characteristics of, let's say, real experiments or randomized control
[Ben]: trials would be the idea of a control group and a treatment group. And so you have [Ben]: your intervention, whatever it may be, like... vitamin D supplementation as compared [Ben]: to not having vitamin D supplementation. [Alex]: Mm-hmm. [Ben]: So if you can, then the best thing to do would be to randomise which people get vitamin [Ben]: D supplementation or which people don't. And the key reason for that is because people
[Ben]: differ in very many different ways. And so by randomly allocating, rather than letting [Ben]: people decide for themselves if they want to have vitamin D, you're then essentially [Ben]: making the control group and the treatment group comparable. They will have similar characteristics.
[Ben]: And so if you look and find differences in some kind of health outcome between the [Ben]: groups, you can then attribute that purely down to whether someone was in the treatment [Ben]: versus the control group because the groups are largely identical because of the [Ben]: randomization. Now, [Ben]: a very kind of obvious situation where you can't do that is with smoking, for example.
[Ben]: So if you want to know what is the causal effect upon health of smoking, you can't come [Ben]: along and do a randomized control trial because if you really do suspect that it has [Ben]: negative health impacts, then that would be not a very friendly thing to do. And there
[Ben]: are some things where you just literally, physically cannot randomly assign things. So [Ben]: you can't randomly assign people to be born in different countries, for example, unless [Ben]: you wanna get involved in large scale baby abduction, which I'm not too into. [Alex]: and which you are not advocating on this show [Ben]: No, not at this time. Yeah, and so what you have is randomized control trials are not
[Ben]: infallible, but they're maybe the best go-to solution. But because there are many [Ben]: situations where you can't do randomization, you're then left in this situation where you [Ben]: can either say... I don't know how to make causal claims in this situation because [Ben]: of all these confounding variables. Or you can say, well, maybe there are clever things [Ben]: that I can do in order to try to make causal claims. And very often, you know, we're
[Ben]: dealing with real world, messy data, many potential confounding variables. So many [Ben]: of these approaches rely upon making certain assumptions and sometimes those assumptions [Ben]: are realistic and you can convince yourself and others that that's the case. Other times
[Ben]: not so much and maybe you can't always make causal claims. But the real reason why [Ben]: quasi-experiments are cool to know about is because very often in like real world scientific [Ben]: or industry applications, you will be in a situation where you didn't do any randomization [Ben]: and you can't do any randomization in the future. And so do you just throw your hands [Ben]: up in the air or do you start learning about quasi experiments and explore causal [Ben]: pi?
[Alex]: Mm hmm. I see. Yeah. And so I think that's, that's pretty clear. So basically, [Alex]: any of those cases, folks, if that's something you are doing, definitely check out [Alex]: causal pie, because, well, that should help you at least jumpstart your analysis, [Alex]: because basically, as Ben was saying, it's kind of a toolbox of how to do these kind [Alex]: of experiments in PMC in the Bayesian framework and actually also if you want I think
[Alex]: there is also some scikit-learn some scikit-learn models right bent but [Ben]: Yeah, [Alex]: mainly [Ben]: that's [Alex]: out [Ben]: right. [Alex]: of the box by default it's going to be a Bayesian solution but you can do some [Alex]: comparison with with some scikit-learn models if you want right [Ben]: Yeah, we've used a Trojan Horse kind of tactic very similar to how the software [Ben]: JASP implements both traditional OLS, Frequentist methods, but then also Bayesian
[Ben]: stuff. So someone would be able to come along to CausalPy not really knowing much about [Ben]: Bayes and just run regular OLS type of approaches, but then if they wanted to... it [Ben]: would not be very much of a leap at all in order for them to basically just do the [Ben]: same analysis but from a Bayesian perspective. [Alex]: Yeah, I like that. I'm switching to Spanish now.
[Alex]: Yeah, OK, so I think that's a good wrap up of CausalPy. And... I have so many questions [Alex]: for you because of inference, but we're going to stop there because I also want to [Alex]: talk about Piemcy marketing and maybe teaching. So yeah, Piemcy marketing, that's [Alex]: another open source package to which you've contributed. So can you talk to us about [Alex]: that? And mainly what's the difference between Piemcy marketing and
[Ben]: Yeah, so I should definitely express the appropriate humility. So I have largely [Ben]: just been involved in the kind of peripheral aspects of PMC marketing, working on the docs [Ben]: and some of the kind of promotional activity that we've done. So we've had a whole bunch [Ben]: of people both inside PMC labs. and outside as well who have put a lot of time and effort
[Ben]: into that. So I just want to make sure that, um, that's clear to everybody. Um, but [Ben]: Pimacy Marketing has a similar kind of ambition of making, uh, kind of quite complex [Ben]: and sophisticated analysis, um, more openly available to regular data scientists. [Ben]: Uh, so. This kind of evolved really out of work that PIMC Labs has done with a number [Ben]: of different companies now. So one of the great case studies that we've had was to
[Ben]: work on media mix marketing, which maybe we'll talk about with HelloFresh. And we made [Ben]: some great improvements to an already great model that they had on that front. But [Ben]: in the meantime, since then, we've basically worked with many different companies, all interested [Ben]: in kind of marketing related questions. And what we felt was we would like to package [Ben]: these things up and make them kind of accessible. So the idea there is to just push
[Ben]: the envelope a little bit and, yeah, see what happens. So I'd say it has. Two core [Ben]: pillars at the moment. The first [Alex]: Mm-hmm. [Ben]: one would be this media mix modeling component. And that's all about how effective [Ben]: is my advertising? The other pillar would be customer lifetime value. And that tries [Ben]: to address questions like, on a customer level, how valuable do I expect each of these
[Ben]: people to be in terms of like ongoing revenue. Because then you can start making [Ben]: decisions like how much marketing money should you put in to begin with in order to [Ben]: acquire a customer? Because if your customers are very expensive to gain through marketing [Ben]: efforts, but you know that they have very low lifetime value, then that's not a [Ben]: winning business proposition. So. These two pillars, I think, are quite complementary.
[Ben]: So that's my quick overview of PiMC marketing.
[Alex]: yeah I like it so same for people interested we will put all the links in the show notes [Alex]: but basically that's the the quick overview and of course these two packages are open source [Alex]: so you can download them for free use them completely for free if you find bugs or [Alex]: have feature requests please open issues and even better pull requests Maybe, yeah, [Alex]: you've mentioned it a bit, MMMs, Ben, but do you have some examples of what people
[Alex]: can do with PIMC marketing? [Ben]: Yeah, so the basic idea of a media mix model is to ask the question of, are my advertising [Ben]: dollars being spent wisely? So the kind of data set that you might be working with [Ben]: is a bunch of time series of how much money did we spend on marketing through social
[Ben]: media? How much money did we spend over time through linear TV advertising and you [Ben]: may have many different advertising channels and there are lots now with different types [Ben]: of social media channels for example. Now if your advertising budgets can be very [Ben]: very large like multiple millions, tens if not hundreds of millions of dollars and [Ben]: so you don't really want to be just having a guess of how much money you should allocate
[Ben]: to each of these advertising channels. So in the past it used to be relatively easy [Ben]: to work out how effective a different advertising channels because of things like [Ben]: user tracking and cookies and so on but with the kind of increased privacy, much of [Ben]: that information is going away. And so now the only real information you have is, [Ben]: how much did I spend? But then how many new customers did I acquire over time? Or how
[Ben]: many new, how many products did I sell over time? And so media mix modeling essentially [Ben]: is fancy, multiple linear regression, which tries to say, how many customers or how [Ben]: many sales are being driven? by each channel. So that's at its basics. It's multiple
[Ben]: linear regression. And then there are some fancy extensions that you can put on top [Ben]: of that in order to deal with more complicated aspects of how you think advertising works, [Ben]: how to deal with multiple different geographical regions or different categories of products, [Ben]: for example.
[Alex]: yeah and that's something we've already talked a bit about already on the podcast Luciano [Alex]: pass with here to go through the different steps that we took for the Hellfresh model [Alex]: using MMMs also Ali Ellie McDonnell fit was here so I'm gonna I'm gonna link to [Alex]: these two episodes in the show notes for sure if people want to dig a bit deeper [Alex]: on how to use media mix marketing models but definitely super helpful and and the the
[Alex]: cool thing of piemce marketing is that it has those models already under the hood [Alex]: so you can already do a an efficient first pass at least on these kind of analysis. [Alex]: Um, we're getting to the end of the show. So I'm actually wondering if like looking [Alex]: ahead, what future directions do you see for causal pie and PMC marketing and their [Alex]: integration in industry. [Ben]: Yeah, so I mean, there's so much I could say about future directions of CausalPy.
[Alex]: Hmm [Ben]: So we definitely don't want to create an unwieldy beast that just tries to have a [Ben]: go at lots of different types of causal inference. So we do want to retain some focus. [Ben]: Um, but yeah, we want to expand these methods out. Um, so hopefully make it quite [Ben]: easy for people. to create their own kind of custom models. So we basically provide [Ben]: all the hard work in terms of causal inference, and then people can come along and
[Ben]: create their own custom models. What I would really like to see is we're starting [Ben]: to see engagement by kind of individuals, data scientists, and companies now. And so [Ben]: one of the things that I'm really excited about is to basically... get engagement with [Ben]: those people and to learn how they are actually using CausalPy and then let that be
[Ben]: a little bit demand driven. But we're an open source project, of course, so we already [Ben]: have a work in progress pull request which will add meta learning capabilities. [Ben]: So that's another causal inference method, which is a little bit distinct from what we've [Ben]: got so far. In terms of integration, sorry.
[Alex]: Now yeah, maybe are you ready to talk about the do calculus operators that is under [Alex]: development for PyMC because it's related to CodlPy [Ben]: Yeah, so the do operator basically relates back to this idea of making an intervention [Ben]: that I mentioned before. And this is quite an important component when you're trying [Ben]: to analyze the causal effect of a certain intervention. And you can do this in PyMC right
[Ben]: now. but it's maybe it does involve a slight kind of work around in order to make [Ben]: it work. So if people are interested in how you can do that, then we've got, well, [Ben]: you can look at the CausalPy code now and there are also a number of really clear [Ben]: examples on the PyMC examples repository that you can get through the PyMC website that [Ben]: show how to do it. But very soon, hopefully a matter of weeks, we will get an official
[Ben]: pm.do operator. And this is gonna be really cool because what you can do is basically [Ben]: create a causal model, like a regular PMC model that you think describes your data [Ben]: generating process. You can then provide that the model into pm.do. and it will basically [Ben]: do graph surgery because PyMC has the concept of a symbolic graph and it will allow [Ben]: you to go into particular variables and intervene and set them to particular values.
[Ben]: For example, is someone in a treatment condition or a control condition? And it will [Ben]: basically give you back out a mutilated graph which is the result of the do operation [Ben]: and then you can use that in order to basically make your interventional and counterfactual [Ben]: predictions. So that will be really cool when that comes out. [Alex]: Yeah, that's so cool. Can't wait to have that. And we love to see the, like, all
[Alex]: of these coming together, right, for the PIMC environment, basically. That's pretty [Alex]: amazing. That's really awesome. So, yeah, thank you to all of you working on that.
[Alex]: Of course, we'll definitely talk about that on the show. whether it is a classic episode [Alex]: or even I think would be even better would be a webinar format this format I just [Alex]: started where you like you the guest come on the show and share their screen and [Alex]: go through a model I think to demonstrate the dual operator that would be the the best [Alex]: thing so once it's in the package then you should probably come back to the show in
[Alex]: a webinar and walk us through one or two models and people will ask you live questions. [Alex]: It'll show on your screen. And I think that would be the best format to introduce [Alex]: people to that operator and the kind of thinking and the kind of features that now [Alex]: will be available in Poem C. So yeah, and FYI, people, so those webinars are open
[Alex]: to everybody. But you get to. at least a 50% discount on these webinars if you are [Alex]: a patron, a lot of you can just access them for free and you get access to the recording [Alex]: one month before everybody. So subscribe, support the show on Patreon, that helps a lot [Alex]: actually for the editing for a lot of stuff. And actually now we're recording the [Alex]: show on Riverside. And so that makes everything easier, especially for the guests.
[Alex]: So I have to pay for Riverside. It is not cheap, but I find it makes the experience... [Alex]: way better for everybody. So thank you to all of you who are contributing every month, [Alex]: even if it's a small amount. I am eternally, eternally grateful. That helps not only for [Alex]: editing, but for making the whole show better. So again, thanks a lot to everybody. [Alex]: And oh yeah, Ben, I wanted to ask you about education a bit before closing up the [Alex]: show,
[Alex]: We often teach workshops together, which is always a lot of fun. And you've actually [Alex]: taught very recently the first causal inference module that you developed very recently [Alex]: at the time of recording this podcast. And I'm curious, what would you say are the [Alex]: key skills to develop to start learning Bayesian stance? [Ben]: So if I were to go back in time and give myself advice, number one would be don't just
[Ben]: buy the book and let it sit on your bookshelf. You actually have to dedicate time to engage [Ben]: with it. [Alex]: Hehehe [Ben]: But for all of you with more willpower than I have, I think one of the key things [Ben]: that works for me is to kind of really engage with whatever book and educational material
[Ben]: you have. And what I mean by that is don't just kind of passively read but actually [Ben]: code along, experiment, work through the examples that you're given because that [Ben]: turns it very much into play and just in the same way that... how it was when I was [Ben]: learning BBC basic back when I was very young by experimenting and playing around. [Ben]: It allows you to see where you have gone wrong, where you understand correctly and where [Ben]: you have misunderstood things.
[Alex]: Yeah, yeah, for sure. That's... I would second that and say that's definitely a very [Alex]: important skill to develop. And... Something like... Have you noticed any common biggest [Alex]: difficulty? uh, Beijing beginners face usually when you, you teach patients stance [Alex]: or more generally, what do you think the biggest obstacles in the Beijing workflow [Alex]: are currently. [Ben]: So,
[Ben]: this might sound provocative, but it's not meant to be. I think one of the biggest [Ben]: challenges is the fact that people have been exposed to frequentist ideas for so long. [Alex]: Mm-hmm. [Ben]: That's not to say that they're wrong, but they do shape your thinking. And so there [Ben]: can be quite an intellectual twist that happens when you get exposed to... Bayesian
[Ben]: ideas. So I think that just running simulations, simulating data, just using whatever programming [Ben]: language you have is probably a good way to kind of start making that that conceptual [Ben]: twist in your mind. Because once you see that oh, I can create a data generating [Ben]: process by using random numbers and having some variables depend on other ones, I think
[Ben]: that really helps understand the Bayesian concepts. But yeah, in the past, I would have [Ben]: said that the rate limiting factor would have been availability of decent intro level [Ben]: books and the packages, but that is definitely not the case anymore. So there are tons of [Ben]: kind of really good... intro to intermediate level Bayesian resources out there. And just [Ben]: to kind of interpret your question a little bit more, I'd say that we're maybe just
[Ben]: at the beginning of that similar transition with causality. So I think that there's lots [Ben]: of potential for packages, maybe like causal pie, but also educational material to [Ben]: make it much easier. for practitioners to get a grip on things. [Alex]: Yeah, that's true. I had not seen it like that, but that's a good point about where [Alex]: we are with causal inference also. For sure that's something very interesting and
[Alex]: something I will be very curious also when we teach this material. more frequently [Alex]: in the workshops because that's definitely a topic that's very much on top of people's [Alex]: mind a lot this idea of interpretable models and white box models these seems to [Alex]: be very important for people which i understand i definitely share that concern [Alex]: so yeah cool maybe before Asking you the last two questions, looking ahead again,
[Alex]: I'm wondering if you have some, like if there are some areas that you are most excited [Alex]: about for patient stats in the future, and what would you like to see and what you [Alex]: would like to not see? [Ben]: I think, um, kind of two things that I'm excited about, and this is not to say that [Ben]: I have any, um, like specialist knowledge or expertise in these areas, but, um, one thing [Ben]: that I'm really interested in is what it becomes possible in a probabilistic programming
[Ben]: language when you have a language like Julia. So what I mean by that is, um, in order [Ben]: for... packages to do Bayesian inference, we need gradient information. And at the risk [Ben]: of just diving in a bit, in PyMC, we calculate gradient information by having a [Ben]: graph which allows you to do auto differentiation to calculate the gradients. And that's cool. [Ben]: but it does kind of mean if you want to do anything exotic, you have to learn a
[Ben]: library that allows you to do that. So right now, the one that we're relying on is [Ben]: a pi tensor. Now in Julia, you still have to find gradient information if you want [Ben]: to do efficient Bayesian inference. But now the thing that is slightly different is [Ben]: that there are packages in Julia which allow you to do auto-differentiation on base-Julia [Ben]: code. And what that means is for the user, they don't then have to learn a kind
[Ben]: of a custom package that does auto-differentiation like PyTensor. They could just learn base-Julia, [Ben]: write some kind of like reasonably arbitrary model. and then just run inference on it. So [Ben]: that to me is kind of really exciting in terms of making like relatively complicated [Ben]: models accessible quite easily. [Ben]: The other thing I'm interested in, and I think that... what PyMC is kind of making
[Ben]: great progress here is doing operations on the graphs. So [Alex]: Yeah. [Ben]: all PPLs basically give you a log probability and hopefully gradient information, but not [Ben]: all of them have kind of like an underlying explicit graph structure. But when you do have [Ben]: that graph structure there, then what that means is you are able to go in and do
[Ben]: surgery on the graphs. We've talked about one clear example of that with the do operator, [Ben]: where you can go in and replace a random variable with a constant basically, and cut [Ben]: nodes into this thing that you're intervening on. But other applications of this would be
[Ben]: graph simplification. So in the case where you have conjugate priors, for example, [Ben]: you could presumably write some code to find situations where you can go in and [Ben]: massively simplify the graph and so minimize the amount of computation time that you have [Ben]: to do. So I don't do any of this stuff, I don't implement any of it, but those are [Ben]: two things that I'm excited about at the moment.
[Alex]: Yep, I understand that that does sound super cool. Yeah, yeah, I love it. And that [Alex]: it's true that that limitation of where you have to learn another package all the time [Alex]: can can definitely be a limitation. And that's also why to basically have a loop [Alex]: of the whole show. developing that ability and that skill to fail comfortably and quickly [Alex]: is really important because then you are more comfortable learning things that make
[Alex]: you very uncomfortable at the beginning. Any other topic that i didn't ask you about [Alex]: and that you'd like to mention? [Ben]: No, I think we've done a really good job at covering quite a lot of different things. [Alex]: Yeah, I think that was quite a lot. Thanks a lot for that comprehensive overview, [Alex]: Ben. So before letting you go, let me ask you the last two questions I ask every
[Alex]: guest. That's better at the end of the show. So first one, if you had unlimited time [Alex]: and resources, which problem would you try to solve? [Ben]: Um, so I think it depends. Am I the only one who has unlimited time and resources or [Ben]: does do other [Alex]: Yeah. [Ben]: people? [Alex]: Um, let's say that's just you. [Ben]: Okay, so that puts quite a lot of pressure on me because [Alex]: Mm-hmm.
[Ben]: that would suggest that I focus on the problems that will benefit the most people. [Ben]: So if that was true, I guess [Ben]: I'd have to kind of say something like creating a circular sustainable economy because [Ben]: as someone I don't know who likes to say, anything that's unsustainable will not [Ben]: be sustained. So zooming out, taking a kind of a sci-fi view or long term view of [Ben]: human history, we clearly need to live within the kind of energy and material boundaries
[Ben]: of the planet. Otherwise things don't look so great. [Alex]: that is unfortunately I would say common answer to that question. The cool thing is [Alex]: that a lot of people answer that. So you folks can all have unlimited time and resources [Alex]: to work on that. Second question, if you could have dinner with any great scientific [Alex]: mind dead alive or fictional, who would it be? [Ben]: So, I think maybe because I don't have like a formal statistical training, I didn't
[Ben]: feel like I had a great wealth of knowledge to really pluck someone out of the air. So [Ben]: I'm going to be slightly controversial in that I don't know if you could classify
[Ben]: them as a scientist, but maybe the Buddha. I think that would be... interesting. You could [Ben]: claim that, you know, they were empirical because they're paying attention to what happens [Ben]: within one's mind, for example, and you could argue that they were experimental because [Ben]: they tried different approaches to life and meditation and kind of saw that some of [Ben]: those things didn't work and then moved on to a different approach. So I don't know
[Ben]: if anyone's going to buy the fact that the Buddha might be a scientist. I'm not really [Ben]: sure. But I'm not really sure how much they would speak either to be honest. But I [Ben]: think it would be an interesting thing to write a blog post about. [Alex]: Yeah, probably they would ask you a lot of questions. Probably that. But definitely
[Alex]: an interesting conversation. I like that answer. I would not say that qualifies [Alex]: as a scientific mind, but I do like the thought that you give to the question. [Alex]: It's definitely interesting. And that would make for a very interesting dinner for [Alex]: sure. so yeah thanks for that answer Ben you are definitely the first one to answer [Alex]: that [Ben]: Yeah. Great.
[Alex]: okay awesome well I think that's it let's call this a show thanks a lot Ben I learned [Alex]: a lot honestly uh really love I'm really happy that uh we managed to uh to do that to [Alex]: do that episode so many things um as we said let's do a webinar once the do operator [Alex]: is out in poem see and will walk you folks through the through the webinar so that [Alex]: you can readily use that after what in your own modeling and other than that as
[Alex]: usual I will put resources and a link to your website in the show notes for those [Alex]: who when I dig deeper and connect with you Thank you again, Ben, for taking the time [Alex]: and being on this show. [Ben]: Yeah, thank you very much for having me, it's fun.