#97 Probably Overthinking Statistical Paradoxes, with Allen Downey - podcast episode cover

#97 Probably Overthinking Statistical Paradoxes, with Allen Downey

Jan 09, 20241 hr 13 minSeason 1Ep. 97
--:--
--:--
Listen in podcast apps:

Episode description

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!


In this episode, I had the pleasure of speaking with Allen Downey, a professor emeritus at Olin College and a curriculum designer at Brilliant.org. Allen is a renowned author in the fields of programming and data science, with books such as "Think Python" and "Think Bayes" to his credit. He also authors the blog "Probably Overthinking It" and has a new book by the same name, which he just released in December 2023.

In this conversation, we tried to help you differentiate between right and wrong ways of looking at statistical data, discussed the Overton paradox and the role of Bayesian thinking in it, and detailed a mysterious Bayesian killer app!

But that’s not all: we even addressed the claim that Bayesian and frequentist methods often yield the same results — and why it’s a false claim. If that doesn’t get you to listen, I don’t know what will!

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie and Cory Kiser.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Links from the show:

Transcript

In this episode, I had the pleasure of speaking with Alan Derny, a professor emeritus at Allin College and a curriculum designer at brilliant.org. Alan is a renowned author in the fields of programming and data science, with books such as ThinkPython and ThinkBase to his credit. He also authors the blog Probably Overthinking It, and has a new book by the same name, which he just released in December 2023.

In this conversation, we tried to help you differentiate between right and wrong ways of looking at statistical data, we discussed the overtone paradox and the role of Bayesian thinking in it, and we detailed a mysterious Bayesian killer app. But that is not all. We even addressed the claim that Bayesian infrequentist method often yield the same results, and why it is a false claim. If that doesn't get you to listen, I don't know what will.

This is Learning Basion Statistics, episode 97, recorded October 25, 2023. Hello, Mediabasians! I have two announcements for you today. First, congratulations to the 10 patrons who won a digital copy of Alan's new book. The publisher will soon get in touch and send you the link to your free... digital copy if you didn't win. Well, you still won, because you get a 30% discount if you order with the discount code UCPNew from the UChicagoPress website. I put the link in the show notes, of course.

Second, a huge thank you to Matt Nichols, Maxime Goussensdorf, Michael Thomas, Luke Corey and Corey Kaiser for supporting the show on Patreon. I can assure you, this is the best way to start the year. Thank you so much for your support. It literally makes this show possible and it made my day. Now onto the show with Alan Downey. Show you how to be a good peasy and change your predictions. Alan Downey, welcome back to Learning Vasion Statistics. Thank you. It's great to be here.

Yeah, thanks again for taking the time. And so for people who know you already, or getting to know you. Allen was already on LearnBasedStats in episode 41. And so if you are interested in a bit more detail with his background and also much more about his previous book, ThinkBased, I recommend listening back to the episode 41, which will be in the show notes. focus on other topics, especially your new book, Alain. I don't know how you do that.

But well done, congratulations on, again, another great book that's getting out. But first, maybe a bit more generally, how do you define the work that you're doing nowadays and the topics that you're particularly interested in? It's a little hard to describe now because I was a professor for more than 20 years. And then I left higher ed about a year, a year and a half ago.

And so now my day job, I'm at brilliant.org and I am writing online lessons for them in programming and data science, which is great. I'm enjoying that. Yeah. Sounds like fun. It is. And then also working on these books and blogging. And I think of it now as almost being like a gentleman scientist or an independent scientist. I think that's my real aspiration. I want to be an 18th century gentleman scientist. I love that. Yeah, that sounds like a good objective.

Yeah, it definitely sounds like fun. It also sounds a bit similar to... what I'm doing on my end with the podcasts and also the online courses for intuitive base. And also I teach a lot of the workshops at Pimc Labs. So yeah, a lot of teaching and educational content on my end too, which I really love. So that's also why I do it. And yeah, it's fun because most of the time, like you start teaching a topic and that's a very good incentive to learn it in lots of details.

Right. So, lately I've been myself diving way more into caution processes again, because this is a very fascinating topic, but quite complex and causal inference also I've been reading up again on this. So it's been quite fun. What has been on your mind recently? Well, you mentioned causal inference and that is certainly a hot topic. It's one where I always feel I'm a little bit behind. I've been reading about it and written about it a little bit, but I still have a lot to learn.

So it's an interesting topic. Yeah, yeah, yeah. And the cool thing is that honestly, when you're coming from the Bayesian framework, to me that feels extremely natural. It's just a way of... Some concepts are the same, but they're just named differently. So that's all you have to make the connection in your brain. And some of them are somewhat new.

But if you've been doing generative modeling for a while, then just coming up with the directed acyclic graph for your model and just updating it from a generative perspective and doing counterfactual analysis, it's really, do it in the Bayesian workflow. So that's a really good, that really helps you. To me, you already have the foundations. And you just have to, well, kind of add a bit of a toolbox to it, you know, like, OK, so what's regression discontinuity design?

What's interrupted time series? Things like that. But otherwise, what's difference in differences? things like that, but these are kind of just techniques that you add on top of the foundations, but the concepts are pretty easy to pick up if you've been in a Bayesian for a while. I guess that's really the good news for people who are looking into that. It's not completely different from what you've been doing. No, I think that's right.

And in fact, I have a recommendation for people if they're coming from Bayes and getting into causal inference. Judea Pearl's book, The Book of Why, follows exactly the progression that you just described because he starts with Bayesian nets and then says, well, no, actually, that's not quite sufficient. Now for doing causal inference, we need the next steps. So that was his professional progression. And it makes, I think, a good logical progression for learning these topics. Yeah, exactly.

And well, funny enough, I've been, I've started rereading the Book of White recently. I had read it like two, three years ago and I'm reading it again because surely there are a lot of things that I didn't pick up at the time, didn't understand. And there are some stuff that are going to resonate with me more now that I have a bit more background, let's say, or... Some other people would say more wrinkles on my front head, but I don't know why they would say that.

So, Alain, already getting off topic, but yeah, I really love that. The causal inference stuff has been fun. I'm teaching that next Tuesday. First time I'm going to teach three hours of causal inference. That's going to be very fun. I can't wait for it. Like you try to study the topic and there are all angles to consider and then a student will come up with a question that you're like, huh, I did not think about that. Let me come back to you. That's really the fun stuff to me.

As you say, I think every teacher has that experience that you really learn something when you teach it. Oh yeah. Yeah, yeah. I mean, definitely. That's really one of the best ways for me to learn. a deadline, first, I have to teach that stuff. And then having a way of talking about the topic, whether that's teaching or presenting, is really one of the most efficient ways of learning, at least to me. Because I don't have the personal discipline to just learn for the sake of learning.

That doesn't really happen for me. Now, we might not be as off topic as you think, because I do have a little bit of causal inference in the new book. Oh, yeah? I've got a section that is about collider bias. And this is an example where if you go back and read the literature in epidemiology, there is so much confusion. There was the low birth weight paradox was one of the first examples, and then the obesity paradox and the twin paradox. And they're all baffling.

if you think of it in terms of regression or statistical association, and then once you draw the causal diagram and figure out that you have selected a sample based on a collider, the light bulb goes on and it's, oh, of course, now I get it. This is not a paradox at all. This is just another form of sampling bias. What's a collider for the, I was going to say the students, for the listeners? And also then what does collider bias mean and how do you get around that?

Yeah, no, this was really interesting for me to learn about as I was writing the book. And the example that I started with is the low birth weight paradox. And this comes from the 1970s. It was a researcher in California. who was studying low birth weight babies and the effect of maternal smoking. And he found out that if the mother of a newborn baby smoked, it is more likely to be low birth weight. And low birth weight babies have health effects, including higher mortality.

But what he found is that if you zoom in and you just look at the low birth weight babies, you would find that the ones whose mother smoked had better health outcomes, including lower mortality. And this was a time, this was in the 70s, when people knew that cigarette smoking was bad for you, but it was still, you know, public health campaigns were encouraging people to stop smoking, and especially mothers.

And then this article came out that said that smoking appears to have some protective effect. for low birth weight babies. That in the normal range of birth weight, it appears to be minimally harmful and for low birth weight babies, it's good. And so, he didn't quite recommend maternal smoking but he almost did. And there was a lot of confusion. It was, I think it wasn't until the 80s that somebody explained it in terms of causal inference.

And then finally in the 90s where someone was able to show using data that not only was this a mistake, but you could put the numbers on it and say, look, this is exactly what's going on. If you correct for the bias, you will find that not surprisingly smoking is bad across the board, even for low birth weight babies. So the explanation is that there's a collider and a collider in a causal graph means that there are two arrows.

coming into the same box, meaning two potential causes for the same thing. So in this case, it's low birth weight. And here's what I think is the simplest explanation of the low birth weight paradox, which is there are two things that will cause a baby to be low birth weight, either the mother smoked or there's something else going on like a birth defect. The maternal smoking is relatively benign. It's not good for you, but it's not quite as bad as the other effects.

So you could imagine being a doctor. You've been called in to treat a patient. The baby is born at a low birth weight. And now you're worried. You're saying to yourself, oh, this might be a birth defect. And then you find out that the mother smoked. You would be relieved. because that explains the low birth weight and it decreases the probability that there's something else worse going on. So that's the effect.

And again, it's caused because when they selected the sample, they selected low birth weight babies. So in that sense, they selected on a collider. And that's where everything goes wrong. Yeah. And it's like, I find that really interesting and fascinating because in a way, it comes down to a bias in the sample in a way here. But also the like, so here, in a way, you don't have really any ways of. doing the analysis without going back to the data collecting step.

But also, colliders are very tricky in the sense that if you so you have that path, as you were saying. So the collider is a common effect of two causes. And the two causes can be completely unrelated. As is often said, if you control for the collider, then it's going to open the path and it's going to allow information to flow from, let's say, X to Y and C is the collider. X is not related to Y in the causal graph.

But if you control for C, then X is going to become related to Y. That's really the tricky thing. That's why we're telling people, do not just throw. predictors at random in your models when you're doing the linear regression, for instance. Because if there is a collider in your graph, and very probably there is one at some point, if it's a complicated enough situation, then you're going to have spurious statistical correlations which are not causal.

But you've created that by basically opening the collider path. So the good news is that the path is closed if you like. naturally. So if you don't control for that, if you don't add that in your model, you're good. But if you start adding just predictors all over the place, you're very probably going to create collider biases like that. So that's why it's not as easy when you have a count found, which is kind of the opposite situation.

So let's say now C is the common cause of x and y. Well, then if you have a count found, you want to block the pass. the path that's going from X to Y through C to see if there is a path, direct path from X to Y. Then you want to control for C, but if it's a collider, you don't. So that's why, like, don't control for everything. Don't put predictors all over the place because that can be very tricky.

Yeah, and I think that's a really valuable insight because when people start playing with regression, Sure, they just, you know, you add more to the model, more is better. And yes, once you think about colliders and mediators, and I think this vocabulary is super helpful for thinking about these problems, you know, understanding what should and shouldn't be in your model if what you're trying to do is causal. Yeah. And that's also definitely something I... can see a lot.

It depends on where the students are coming from. But yeah, where it's like they show me a regression with, I don't know, 10 predictors already. And then I can't. I swear the model doesn't make really sense. I'm like, wait, did you try with less predictors? Like, you just do first the model with just an intercept and then build up from that? And no, often it turns out it's the first version of the model with 10 predictors. So you're like, oh, wait.

Look at that again from another perspective, from a more minimalist perspective. But that's awesome. I really love that you're talking about that in the book. I recommend people then looking at it because it's not only very interesting, it's also very important if you're looking into, well, are my models telling me something valuable? Are they? helping me understand what's going on or is it just something that helps me predict better? But other than that, I cannot say a lot.

So definitely listeners refer to that. And actually, the URL editor was really kind to me and Alan because, well, first 10 of the patrons are going to get the book for free at random. So thank you so much. link that you have in the show notes, you can buy the book at a 30% discount. So, even if you don't win, you will win. So, definitely go there and buy the book, or if you're a patron, enter the random draw, and we'll see what randomness has in stock for you.

And actually, so we already started diving in one of your chapters, but Maybe let's take a step back and can you provide an overview of your new book that's called Probably Overthinking It and what inspired you to write it? Yeah, well, Probably Overthinking It is the name of my blog from more than 10 years ago. And so one of the things that got this project started was kind of a greatest hits from the blog.

There were a number of articles that had... either got a lot of attention or where I thought there was something really important there that I wanted to collect and present a little bit more completely and more carefully in a book. So that's what started it. And it was partly like a collection of puzzles, a collection of paradoxes, the strange things that we see in data. So like Collider Bias, which is Berkson's paradox is the other name for that. There's Simpson's paradox.

There's one paradox after another. And that's when I started, I thought that was what the book was going to be about. It was, here are all these interesting puzzles. Let's think about them. But then what I found in every chapter was that there was at least one example that bubbled up where these paradoxes were having real effects in the world. People were getting things genuinely wrong.

And. those errors had consequences for public health, for criminal justice, for all kinds of real things that affect real lives. And that's where the book kind of took a turn toward not so much the paradox because it's fun to think about, although it is, but the places where we use data to make better decisions and get better outcomes. And then a little bit of the warnings about what can go wrong when we make some of these errors.

And most of them boil down, when you think about it, to one form of sampling bias or another. That should be the subtitle of this book is like 12 chapters of sampling bias. Yeah, I mean, that's really interesting to see that a lot of problems come from sampling biases, which is almost disappointing in the sense that it sounds really simple. But I mean, as we can see in your book, it's maybe easy to understand the problem, but then solving it is not necessarily easy. So that's one thing.

And then I'm wondering. How would you say, probably over thinking it helps the readers differentiate between the right and wrong ways of looking at statistical data? Yeah, I think there are really two messages in this book. One of them is the optimistic view that we can use data to answer questions and settle debates and make better decisions. and we will be better off if we do. And most of the time, it's not super hard.

If you can find or collect the right data, most of the time you don't need fancy statistics to answer the questions you care about with the right data. And usually a good data visualization, you can show what you wanna show in a compelling way. So that's the good news. And then the bad news is these warnings. I think the key to these things is to think about them and to see a lot of examples. And I'll take like Simpson's paradox as an example.

If you take an intro stats class, you might see one or two examples. And I think you come away thinking that it's just weird, like, oh, those were really confusing and I'm not sure I really understand what's happening. where at some point you start thinking about Simpson's paradox and you just realize that there's no paradox there. It's just a thing that can happen because why not? If you have different groups and you plot a line that connects the two groups, that line might have one slope.

And then when you zoom in and look at one of those groups in isolation and plot a line through it, there's just no reason. that second line within the group should have the same slope as the line that connects the different groups. And so I think that's an example where when you see a lot of examples, it changes the way you think about the thing. Not from, oh, this is a weird, confusing thing to, well, actually, it's not a thing at all.

The only thing that was confusing is that my expectation was wrong. Yeah, true. Yeah, I love that. I agree. always found it a bit weird to call all these phenomenon paradoxes in a way. Because as you're saying, it's more prior expectation that makes it a paradox. Whereas, why should nature obey our simple minds and priors? there is nothing that says it should. And so most of the time, it's just that, well, reality is not the way we thought it was.

That's OK. And I mean, in a way, thankfully, otherwise, it would be quite boring. But yeah, that's a bit like when data is dispersed a lot, there is a lot of variability in the data. And then we tend to say data is over dispersed. which I always find weird. It's like, well, it's not the data that's over dispersed. It's the model that's under dispersed. The data doesn't have to do anything. It's the model that has to adapt to the data. So just adapt the model.

But yeah, it's a fun way of phrasing it, whereas it's like it's the data's fault. But no, not really. It's just, well, it's just a lot of variation. And. And that made me think actually the Simpson paradox that also made me think about, did you see that recent paper by, I mean from this year, so it's quite recent for a paper from Andrew Gellman, Jessica Hellman, and Lauren Kennedy about the causal quartets? No, I missed it. Awesome, well I'll send that away and I'll put that on the show notes.

But basically the idea is, taking Simpson's paradox, but instead of looking at it from a correlation perspective, looking at it from a causal perspective. And so that's basically the same thing. It's different ways to get the same average treatment effect. So, you know, like Simpson's paradox where you have four different data points and you get the same correlation between them, well, here you have four different causal structures that give you different data points.

But if you just look at the average treatment effect, you will think that it's the same for the four, whereas it's not. You know, so the point is also, well, that's why you should not only look at the average treatment effect, right? Look at the whole distribution of treatment effects, because if you just look at the average, you might be in a situation where the population is really not diverse and then yeah, the average treatment effect is fake. effect is something representative.

But what if you're in a very dispersed population and the treatment effects can be very negative or very positive, but then if you look at the averages, it looks like there is no average treatment effect. So then you could conclude that there is no treatment effect, whereas there is actually a big treatment effect just that when you look at the average, it cancels out. So yeah, like the... The idea of the paper is the main idea is that.

And that's, I mean, I think this will be completely trivial to you, but I think it's a good way of teaching this, where you can, if you just look at the average, you can get beaten by that later on. Because basically, if you're average, you summarize. And if you summarize, you're looking some information somewhere. So you're young. You have to cut some dimension of information to average naturally. So if you do that, it comes at a cost. And the paper does a good job at showing that.

Yes, that's really interesting because maybe coincidentally, this is something that I was thinking about recently, looking at the evidence for pharmaceutical treatments for depression. There was a meta-analysis a few months ago. that really showed quite modest treatment effects, that the average is not great.

And the conclusion that the paper drew was that the medications were effective for some people and they said something like 15%, which is also not great, but effective for 15% and ineffective or minimally effective for others.

And I was actually surprised by that result because it was not clear to me how they were distinguishing between having a very modest effect for everybody or a large effect for a minority that was averaged in with a zero effect for everybody else, or even the example that you mentioned, which is that you could have something that's highly effective for one group and detrimental for another group. And exactly as you said, if you're only looking at the mean, you can't tell the difference.

But what I don't know and I still want to find out is in this study, how did they draw the conclusion that they drew, which is they specified that it's effective for 15% and not for others. So yeah, I'll definitely read that paper and see if I can connect it with that research I was looking at. Yeah. Yeah, I'll send it to you and I already put it in the show notes for people who want to dig deeper.

And I mean, that's a very common pitfall, especially in the social sciences, where doing big experiments with lots of subjects is hard and very costly. And so often you're doing inferences on very small groups. And that's even more complicated to just look at the average treatment effect. It can be very problematic. And interestingly, I talked about that. I mentioned that paper first in episode 89 with Eric Trexler, who works on the science of nutrition and exercise, basically.

So in this field, especially, it's very hard to have big samples when they do experiments. And so most of the time, they have 10, 20 people per group. is like each time I read that literature, first they don't use patient stats a lot. And I'm like, with so low sample sizes, it's, I'm like, yeah, you should use more, use BRMS, use BAMB, if you don't really know how to do the models, but really, you should. And also, if you do that, and then you also only look at the average treatment effects.

I'm guessing you have. big uncertainties on the conclusions you can draw. So yeah, I will put that episode also in the show notes for people who when I referred to it, that was a very interesting episode where we talked about exercise science, nutrition, how that relates to weight management and how from an anthropological perspective, also how the body reacts to these effects.

mostly will fight you when you're trying to lose a lot of weight, but doesn't really fight you when you gain a lot of weight. And that's also very interesting to know about these things, especially with the rampant amount of obesity in the Western societies where it's really concerning. And so these signs helps understand what's going on and how also we can help. people getting into more trajectories that are better for their health, which is the main point basically of that research.

I'm also wondering, if your book, when you wrote it, and especially now that you've written it, what would you say, what do you see as the key takeaways for readers? And especially for readers who may not have a strong background in statistics. Part of it is I hope that it's empowering in the sense that people will feel like they can use data to answer questions. As I said before, it often doesn't require fancy statistics. So... There are two parts of this, I think.

And one part is as a consumer of data, you don't have to be powerless. You can read data journalism and understand the analysis that they did, interpret the figures and maintain an appropriate level of skepticism. In my classes, I sometimes talk about this, a skeptometer, where if you believe everything that you read, That is clearly a problem.

But at the other extreme, I often encounter students who have become so skeptical of everything that they read that they just won't accept an answer to a question ever. Because there's always something wrong with a study. You can always look at a statistical argument and find a potential flaw. But that's not enough to just dismiss everything that you read.

If you think you have found a potential flaw, there's still a lot of work to do to show that actually that flaw is big enough to affect the outcome substantially. So I think one of my hopes is that people will come away with a well-calibrated skeptometer, which is to look at things carefully and think about the kinds of errors that there can be, but also take the win. If we have the data and we come up with a satisfactory answer, you can accept that question as provisionally answered.

Of course, it's always possible that something will come along later and show that we got it wrong, but provisionally, we can use that answer to make good decisions. And by and large, we are better off. This is my argument for evidence and reason. But by and large, if we make decisions that are based on evidence and reason, we are better off than if we don't. Yeah, yeah. I mean, of course I agree with that. It's like preaching to the choir. It shouldn't be controversial. No, yeah, for sure.

A difficulty I have though is how do you explain people they should care? You know? Why do you think... we should care about even making decisions based on data. Why is that even important? Because that's just more work. So why should people care? Well, that's where, as I said, in every chapter, something bubbled up where I was a little bit surprised and said, this thing that I thought was just kind of an academic puzzle actually matters. People are getting it wrong. because of this.

And there are examples in the book, several from public health, several from criminal justice, where we don't have a choice about making decisions. We're making decisions all the time. The only choice is whether they're informed or not. And so one of the example, actually, Simpson's paradox is a nice example. Let me see if I remember this. It came from a journalist, and I deliberately don't name him in the book because I just don't want to give him any publicity at all.

but the Atlantic magazine named him the pandemic's wrongest man because he made a career out of committing statistical errors and misleading people. And he actually features in two chapters because he commits the base rate fallacy in one and then gets fooled by Simpson's paradox in another.

And if I remember right, in the Simpsons Paradox example, he looked at people who were vaccinated and compared them to people who were not vaccinated and found that during a particular period of time in the UK, the death rate was higher for people who were vaccinated. The death rate was lower for people who had not been vaccinated. So on the face of it, okay, well, that's surprising. Okay, that's something we need to explain.

It turns out to be an example of Simpson's paradox, which is the group that he was looking at was a very wide age range from I think 15 to 89 or something like that. And at that point in time during the pandemic, by and large, the older people had been vaccinated and younger people had not, because that was the priority ordering when the vaccines came out. So in the group that he compared, the ones who were vaccinated were substantially older than the ones who were unvaccinated.

And the death rates, of course, were much higher in older age groups. So that explained it. range of ages together into one group, you saw one effect. And if you broke it up into small age ranges, that effect reversed itself. So it was a Simpson's paradox. If you appropriately break people up by age, you would find that in every single age group, death rates were lower among the vaccinated, just as you would expect if the vaccine was safe and effective.

And that's also where I feel like if you start thinking about the causal graph, you know, and the causal structure, that's also where that would definitely help. Because it's not that hard, right? The idea here is not hard. It's not even hard mathematically. I think anybody can understand it even if they don't have a mathematical background. So yeah, it's mainly that. And I think the most important point is that, yeah. matters because it affects decisions in the real world.

That thing has literally life and death consequences. I'm glad you mentioned it because you do discuss the base rate fallacy and its connection to Bayesian thinking in the book, right? It starts with the example that everybody uses, which is interpreting the results of a medical test. Because that's a case that's surprising when you first hear about it and where Bayesian thinking clarifies the picture completely. Once you get your head around it, it is like these other examples.

Not only gets explained, it stops being surprising. And this I'll... Give the example, I'm sure this is familiar to a lot of your listeners, but if you take a medical test, let's take a COVID test as an example, and suppose that the test is accurate, 90% accurate, and let's suppose that means both specificity and sensitivity. So if you have the condition, there's a 90% chance that you correctly get a positive test.

If you don't have the condition, there's a 90% chance that you correctly get a negative test. And so now the question is, you take the test, it comes back positive, what's the probability that you have the condition? And that's where people kind of jump onto that accuracy statistic. And they think, well, the test is 90% accurate, so there's a 90% chance that I have, let's say, COVID in this example.

And that can be totally wrong, depending on the base rate or invasion terms, depending on the prior. And here's where the Bayesian thinking comes out, which is that different people are going to have very different priors in this case. If if you know that you were exposed to somebody with COVID three days later, you feel a scratchy throat. The next day you wake up with flu symptoms. Before you even take a test, I'm going to say there's at least a 50% chance that you have COVID, maybe higher.

Could be a cold. So, you know, it's not 100%. So let's say it's 50-50. You take this COVID test. And let's say, again, 90% accuracy, which is lower than the home test. So I'm being a little bit unfair here. But let's say 90%. Your prior was 50-50. The likelihood ratio is about 9 to 1. And so your posterior belief is about 9 to 1, which is roughly 90%. So quite likely that test is correct, in this example, have COVID.

But the flip side is, let's say you're in New Zealand, which has a very low rate of COVID infection. You haven't been exposed. You've been working from home for a week, and you have no symptoms at all. You feel totally fine. What's your base rate there? What's the probability that you miraculously have COVID? 1 in 1,000 at most, probably lower. And so if you.

took a test and it came back positive, it's still probably only about one in a hundred that you actually have COVID and a 99% chance that that's a false positive. So that's, you know, as I said, that's the usual example. It's probably familiar, but it's a case where if you neglect the prior, if you neglect the base rate, you can be not just a little bit wrong, but wrong by orders of magnitude. Yeah, exactly.

And it is a classical example for us in the stats world, but I think it's very effective for non-stats people because that also talks to them. And it's also the gut reaction to a positive test is so geared towards thinking you do have the disease that I think that that's also why It's a good one. Another paradox you're talking about in the book is the Overton paradox. Could you share some insights into this one?

I don't think I know that one and how Bayesian analysis plays a role in understanding it, if any. Sure. Well, you may not have heard of the Overton paradox, and that's because I made the name up. We'll see, I don't know if it will stick. One of the things I'm a little bit afraid of is it's possible that this is something that has been studied and is well known and I just haven't found it in the literature.

I've done my best and I've asked a number of people, but I think it's a thing that has not been given a name. So maybe I've given it a name, but we'll find out. But that's not important. The important part is I think it answers an interesting question. And this is If you compare older people and younger people in terms of their political beliefs, you will find in general that older people are more conservative. So younger people, more liberal, older people are more conservative.

And if you follow people over time and you ask them, are you liberal or conservative, it crosses over. When people are roughly 25 years old, they are more likely to say liberal. By the time they're 35 or 40, they are more likely to say conservative. So we have two patterns here. We have older people actually hold more conservative beliefs. And as people get older, they are more likely to say that they are conservative.

Nevertheless, if you follow people over time, their beliefs become more liberal. So that's the paradox. By and large, people don't change their beliefs a lot over the course of their lives. Excuse me. But when they do, they become a little bit more liberal. But nevertheless, they are more likely to say that they are conservative. So that's the paradox. And let me put it to you. Do you know why? I've heard about the two in isolation, but I don't think I've heard them linked that way.

And no, for now, I don't have an intuitive explanation to that. So I'm very curious. So here's my theory, and it is partly that conservative and liberal are relative terms. I am to the right of where I perceive the center of mass to be. And the center of mass is moving over time. And that's the key, primarily because of generational replacement. So as older people die and they are replaced by younger people, the mean shifts toward liberal pretty consistently over time.

And it happens in all three groups among people who identify themselves as conservative, liberal, or moderate. All three of those lines are moving almost in parallel toward more liberal beliefs. And what that means is if you took a time machine to 1970 and you collected the average liberal and you put them in a time machine and you bring them to the year 2000. they would be indistinguishable from a moderate in the year 2000.

And if you bring them all the way to the present, they would be indistinguishable from a current conservative, which is a strange thing to realize. If you have this mental image of people in tie dye with peace medallions from the seventies being transported into the present, they would be relatively conservative compared to current views. And that is almost that time traveler example is almost exactly what happens to people over the course of their lives.

That in their youth, they hold views that are left of center. And their views change slowly over time, but the center moves faster. And that's, I call it chasing the Overton window. The Overton window, I should explain where that term comes from, is in political science, It is the set of ideas that are politically acceptable at any point in time. And it shifts over time, which is something that might have been radical in the 1970s, might be mainstream now.

And there are a number of views from the seventies that were pretty mainstream. Like a large fraction. I don't think it was a majority, but I forget the number. It might, might've been 30% of people in the 1970s thought that mixed race marriages should be illegal. Yeah. That wasn't the majority view, but it was mainstream. And now that's pretty out there. That's a pretty small minority still hold that view and it's considered extreme. Yeah, and it changed quite, quite fast.

Yes. Also, like, the acceptability of same sex marriage really changed very fast. If you look in it, you know, time series perspective. That's also a very interesting thing that these opinions can change very fast. So yeah, okay. I understand. It's kind of like how you define liberal and conservative in a way explains that paradox. Very interesting. This is a little speculative, but that's something that might have accelerated since the 1990s.

that in many of the trends that I saw between 1970 and 1990, they were relatively slow and they were being driven by generational replacement. By and large, people were not changing their minds. It's just that people would die and be replaced. There's a line from the sciences that says that the sciences progress one funeral at a time. Just a little morbid. But that is in some sense the baseline rate. societal change and it's relatively slow. It's about 1% a year.

Yeah. In the starting the 1990s, and particularly you mentioned support for same sex marriage, also just general acceptance of homosexuality changed radically. In in 1990, it was about 75% of the US population would have said that homosexuality was wrong. That was one of the questions in the general social survey. Do you think it's wrong? 75%? That's I think below 30 now. So between 1990 and now, let's say roughly 40 years, it changed by about 40 percentage points.

So that's about the speed of light in terms of societal change. And one of the things that I did in the book was to try to break that down into how much of that is generational replacement and how much of that is people actually changing their minds. And that was an example where I think 80% of the change was changed minds. not just one funeral at a time. So that's something that might be different now. And one obvious culprit is the internet. So we'll see.

Yeah. And another proof that the internet is neither good nor bad, right? It's just a tool, and it depends on what we're doing with it. The internet is helping us right now having that conversation and me having that podcast for four years. Otherwise, that would have been. virtually impossible. So yeah, really depends on what you're doing with it.

And another topic, I mean, I don't think I don't remember it being in the book, but I think you mentioned it in one of your blog posts, is the idea of a Bajan killer app. So I have to ask you about that. Why is it important in the context of decision making and statistics? I think a perpetual question, which is, you know, if Bayesian methods are so great, why are they not taking off? Why is not everybody is using them?

And I think one of the problems is that when people do the comparison of Bayesian and frequentism, and they have tried out the usual debates, they often show an example where you do the frequentist analysis and you get a point estimate. And then you do the Bayesian analysis and you generate a point estimate. And sometimes it's the same or roughly the same. And so people sort of shrug and say, well, you know, what's the big deal?

The problem there is that when you do the Bayesian analysis, the result is a posterior distribution that contains all of the information that you have about whatever it was that you were trying to estimate. And if you boil it down to a point estimate, you've discarded all the useful information. So. If all you do is compare point estimates, you're really missing the point.

And that's where I was thinking about what is the killer app that really shows the difference between Bayesian methods and the alternatives. And my favorite example is the Bayesian bandit strategy or Thompson sampling, which is an application to anything that's like A-B testing or running a medical test where you're comparing two different treatments.

you are always making a decision about which thing to try next, A or B or one treatment or the other, and then when you see the result you're updating your beliefs. So you're constantly collecting data and using that data to make decisions. And that's where I think the Bayesian methods show what they're really good for, because if you are making decisions and those decisions the whole posterior distribution because most of the time you're doing some kind of optimization.

You are integrating over the posterior or in discrete world, you're just looping over the posterior and for every possible outcome, figuring out the cost or the benefit and weighting it by its posterior probability. That's where you get the real benefit. And so, Thompson end-to-end application where people understand the problem and where the solution is a remarkably elegant and simple one. And you can point to the outcome and say, this is an optimal balance of exploitation and exploration.

You are always making the best decision based on the information that you have at that point in time. Yeah. Yeah, I see what you're saying. And I... In a way, it's a bit of a shame that it's the simplest application because it's not that simple. But yeah, I agree with that example. And for people, I put this blog post where you talk about that patient care app in the show notes because yeah, it's not super easy, I think it's way better in a written format, or at least a video.

But yeah, definitely these kind of situations in a way where you have lots of uncertainty and you really care about updating your belief as accurately as possible, which happens a lot. But yeah, in this case also, I think it's extremely valuable. But I think it can be. Because first of all, I think if you do it using conjugate priors, then the update step is trivial. You're just updating beta distributions.

And every time a new data comes in, a new datum, you're just adding one to one of your parameters. So the computational work is the increment operator, which is not too bad. But I've also done a version of Thompson sampling as a dice game. I want to take this opportunity to point people to it. I gave you the link, so I hope it'll be in the notes. But the game is called The Shakes. And I've got it up on a GitHub repository. But you can do Thompson sampling just by rolling dice.

Yeah. So we'll definitely put that in the show notes. And also to come back to something you said just a bit earlier. For sure. Then also something that puzzles me is when people have a really good patient model, it's awesome. It's a good representation of the underlying data generating process. It's complex enough, but not too much. It samples well. And then they do decision making based on the mean of the posterior estimates. And I'm like, no, that's a shame.

Why are you doing that past the whole distribution? to your optimizer so that you can make decisions based on the full uncertainty of the model and not just take the most probable outcome. Because first, maybe that's not really what you care about. And also, by definition, it's going to sample your decision. It's going to bias your decision. So yeah, that always kind of breaks my heart. But you've worked so well to get that. It's so hard to get those posterior distributions. And now you're just.

throwing everything away. That's a shame. Yeah. Do patient decision making, folks. You're losing all that information. And especially in any case where you've got very nonlinear costs, nonlinear in the size of the error, and especially if it's asymmetric. Thinking about almost anything that you build, you always have a trade off between under building and over building. Over building is bad because it's expensive. And underbuilding is bad because it will fail catastrophically.

So that's a case where you have very nonlinear costs and very asymmetric. If you have the whole distribution, you can take into account what's the probability of extreme catastrophic effects, where the tail of that distribution is really important to potential outcomes. Yeah, definitely. And. What I mean, I could continue, but we're getting short on time and I still have a lot of things to ask you. So let's move on.

And actually, I think you mentioned it a bit at the beginning of your answer to my last question. But in another of your blog posts, you addressed the claim that patient infrequentist methods often yield the same results. And so I know you like to talk about that. So could you elaborate on this and why you're saying it's a false claim? Yeah, as I mentioned this earlier, you know, frequentist methods produce a point estimate and a confidence interval.

And Bayesian methods produce a posterior distribution. So they are different kinds of things. They cannot be the same. And I think Bayesians sometimes say this as a way of being conciliatory that, you know, we're trying to let's all get along. And often, frequentist and Bayesian methods are compatible. So that's good. The Bayesian methods aren't scary. I think strategically that might be a mistake, because you're conceding the thing that makes Bayesian methods better.

It's the posterior distribution that is useful for all the reasons that we just said. So it is never the same. It is sometimes the case that if you take the posterior distribution and you summarize it, with a point estimate or an interval, that yes, sometimes those are the same as the frequentist methods.

But the analogy that I use is, if you are comparing a car and an airplane, but the rule is that the airplane has to stay on the ground, then you would come away and you would think, wow, that airplane is a complicated, expensive, inefficient way to drive on the highway. And you're right. If you want to drive on the highway, an airplane is a terrible idea. The whole point of an airplane is that it flies. If you don't fly the plane, you are not getting the benefit of an airplane.

That is a good point. And same, if you are not using the posterior distribution, you are not getting the benefit of doing Bayesian analysis. Yeah. Yeah, exactly. drive airplanes on the highway hurt you well. Actually, a really good question is that you can really see, and I think I do, and I'm probably sure you do in the work, you do see many practitioners that might be hesitant to adopt patient methods due to some perceived complexity most of the time.

So I wonder in general, what resources or strategies you recommend to those who want to learn and apply patient techniques in their work. Yeah, I think Bayesian methods get the reputation for complexity, I think largely because of MCMC. That if that's your first exposure, that's scary and complicated. Or if you do it mathematically and you start with big scary integrals, I think that also makes it seem more complex than it needs to be. I think there are a couple of alternatives.

And the one that I use in think Bayes is everything is discrete and everything is computational. So all of those integrals become for loops or just array operations. And I think that helps a lot. So those are using grid algorithms. I think grid algorithms can get you a really long way with very little tooling, basically arrays.

You lay out a grid, you compute a prior, you compute a likelihood, you do a multiplication, which is usually just an array multiplication, and you normalize, divide through by the total. That's it. That's a Bayesian update. So I think that's one approach. The other one, I would consider an introductory stats class that does everything using Bayesian methods, using conjugate priors. And don't derive anything. Don't compute why the beta binomial model works.

But if you just take it as given, that when you are estimating a proportion, you run a bunch of trials. and you'll have some number of successes and some number of failures. Let's call it A and B. You build a beta distribution that has the parameters A plus one, B plus one. That's it. That's your posterior. And now you can take that posterior beta distribution and answer all the questions. What's the mean? What's a confidence or credible interval?

But more importantly, like what are the tail probabilities? What's the probability that I could exceed some critical value? Or, again, loop over that posterior and answer interesting questions with it. You could do all of that on the first day of a statistics class. And use a computer, because we can compute. SciPy.stats.beta will tell you everything you want to know about a beta distribution. of a stats class, that's estimating proportions. It's everything you need to do.

And it handles all of the weird cases. Like if you want to estimate a very small probability, it's okay. You can still get a confidence interval. It's all perfectly well behaved. If you have an informative prior, sure, no problem. Just start with some pre-counts in your beta distribution. So day one, estimating proportions. Day two, estimate rates. You could do exactly the same thing with a Poisson gamma model. And the update is just as trivial.

And you could talk about Poisson distributions and exponential distributions and estimating rates. My favorite example is I always use either soccer, football, or hockey as my example of goal scoring rates. And you can generate predictions. You can say, what are the likely outcomes of the next game? What's the chance that I'm going to win, let's say, it's a best of seven series. The update is computationally nothing. Yeah. And you can answer all the interesting questions about rates.

So that's day two. I don't know what to do with the rest of the semester because we've just done 90% of an intro stats class. Yes. Yeah, that sounds like something I think that would work in the sense that at least that was my experience. Funny story, I used to not like stats, which is funny when you see what I'm doing today. But when I was in university, I did a lot of math. And the thing is, the stats we were doing with was pen and paper. So it was incredibly boring.

I was always, you know, dice problems and very trivial stuff that you have to do that because the human brain is not good at computing that kind of stuff, you know. did when I started having to use statistics to do electoral forecasting. I was like, but this is awesome. Like I can just simulate the distribution. I can see them on the screen. I can really almost touch them.

You know, and that was much more concrete first and also much more empowering because I could work on topics that were not trivial stuff that I only would use for board games. You know? So. I think it's a very powerful way of teaching for sure. So to play us out, I'd like to zoom out a bit and ask you what you hope readers will take away from probably overthinking it and how can the insights from your book be applied to improve decision making in various fields?

Yeah. Well, I think I'll... come back to where we started, which is it is about using data to answer questions, make better decisions. And my thesis again is that we are better off when we use evidence and reason than when we don't. So I hope it's empowering. I hope people come away from it thinking that you don't need graduate degrees in statistics to work with data to interpret the results that you're seeing in research papers, in newspapers, that it can be straightforward.

And then occasionally there are some surprises that you need to know about. Yeah. For sure. Personally, have you changed some of the ways you're making decisions based on your work for this book, Kéján? Maybe. I think a lot of the examples in the book come from me thinking about something in real life. There's one example where when I was running a relay race, I noticed that everybody was either much slower than me or much faster than me.

And it seemed like there was nobody else in the race who was running at my speed. And that's the kind of thing where when you're running and you're oxygen deprived, it seems really confusing. And then with a little bit of reflection, you realize, well, there's some statistical bias there, which is, if someone is running the same speed as me, I'm unlikely to see them. Yeah. But if they are much faster or much slower, then I'm going to overtake them or they're going to overtake me. Yeah, exactly.

And that makes me think about an absolutely awesome joke from, of course, I don't remember the name of the comedian, but very, very well-known US comedian that you may know. And the joke was, have you ever noticed that everybody that drives slower than you on the road is a jackass? and everybody that drives faster than you is a moron. It's really the same idea, right? It's like you have the right speed and you're doing the right thing and everybody else is just either a moron or a jackass.

That's exactly right. I believe that is George Carlin. This exactly George Carlin, yeah, yeah. And amazing, I mean, George Carlin is just absolutely incredible. But... Yeah, that's what is already a very keen observation of the human nature also, I think. Which is also an interesting joke in the sense that it relates to one, you know, concepts of how minds change and how people think about reality and so on. And I find it... I find it very interesting.

So for people interested, I know we're short on time, so I'm just going to mention there is an awesome book that's called How Minds Change by David McCraney. I'll put that in the show notes. And he talks about these kind of topics and that's especially interesting. And of course, patient statistics are mentioned in the book because if you're interested in optimal decision making at some point, you're going to talk about patient stats. But he's a journalist.

Like he doesn't know at all about patient stats originally. And then at some point, it just appears. I will check that out. Yeah, I'll put that into the show notes. So before asking you the last two questions, Alan, I'm curious about your predictions, because we're all scientists here, and we're interested in predictions.

I wonder if you think there is a way In the realm of statistics education, are there any innovative approaches or technologies that you believe have the potential to change, transform how people learn and apply statistical concepts? Well, I think the things we've been talking about, computation, simulation, and Bayesian methods, I think have the best chance to really change statistics education. I'm not sure how it will happen.

It doesn't look like statistics departments are changing enough or fast enough. I think what's going to happen is that data science departments are going to be created And I think that's where the innovation will be. But I think the question is, what that will mean? When you create a data science department, is it going to be all machine learning and algorithms or statistical thinking and basic using data for decision making, as I'm advocating for? So obviously, I hope it's the latter.

I hope data science becomes. in some sense, what statistics should have been and starts doing a better job of using, as I said, computation, simulation, Bayesian thinking, and causal inference, I think is probably the other big one. Yeah. Yeah, exactly. And they really go hand in hand also, as we were seeing at the very beginning of the show. Of course, I do hope that that's going to be the case. You've already been very generous with your time.

So let's ask you the last two questions, ask everyone at the end of the show. And you're in a very privileged position because it's your second episode here. So you're in the position where you can answer something else from your previous answers, which is a very privileged position because usually the difficulty of these questions is that you have to choose and you cannot answer all of it. you get to have a second round, Alain.

So first, if you had unlimited time and resources, which problem would you try to solve? I think the problem of the 21st century is how do we get to 2100 with a habitable planet and a good quality of life for everybody on it? And I think there is a path that gets us there. It's a little hard to believe when you focus on the problems that we currently see. But I'm optimistic. I really do think we can solve climate change. the slow process of making things better.

If you look at history on a long enough term, you will find that almost everything is getting better in ways that are often invisible, because bad things happen quickly and visibly, and good things happen slowly and in the background. But my hope for the 21st century is that we will continue to make slow, gradual progress and a good ending for everybody on the planet. So that's what I want to work on. Yeah, I love the optimistic tone to close out the show.

And second question, if you could have dinner with any great scientific mind, then it would be a lot more fictional. Who would it be? I think I'm going to argue with the question. I think it's based on this idea of great scientific minds, which is a little bit related to the great person theory of history, which is that big changes come from unique, special individuals. I'm not sure I buy it. I think the thing about science that is exciting to me is that it is a social enterprise.

It is intrinsically collaborative. It is cumulative. Making large contributions, I think, very often is the right person in the right place at the right time. And I think often they deserve that recognition. But even then, I'm going to say it's the system. It's the social enterprise of science that makes progress. So that's, I want to have dinner with the social enterprise of science. Well, you call me if you know how to do that. But yeah. I mean.

Choking aside, I completely agree with you and I think also it's a very good reminder to say it right now because we're recording very close to the time where Nobel prizes are awarded and yeah, these participate in the fame, like making science basically kind of like another movie industry or industries like that are played by just fame. and all that comes with it.

And yeah, I completely agree that this is especially a big problem in science because scientists are often specialized in a very small part of their field. And usually for me, it's a red flag, and that happened a lot in COVID, where some scientists started talking about epidemiology, whereas it was not their specialty.

And To me, usually that's a red flag, but the problem is that if they are very well-known scientists who may end up having the Nobel Prize, well, then everybody listens to them, even though they probably shouldn't. When you rely too much on fame and popularity, that's a huge problem. Just trying to make heroes is a big problem because it helps from a narrative perspective to make people interested in science. basically that people start learning about them.

But there is a limit where it also decorates people. Because, you know, if it's that hard, if you have to be that smart, if you have to be Einstein or Oppenheimer or any of these big or Laplace, you know, then it's just like, you don't even want to start. working on this. And that's a big problem because as you're saying, progress for scientific progress is small incremental steps done by community that works together. And there is competition of course, but that really works together.

And yeah, if you start implying that most of that is just you have to be a once in a century genius to make science. We're going to have problems, especially HR problems in the universities. So yeah, no, you don't need that. And also you're right that if you look into the previous work, like even for Einstein, the idea of relativity was already there in the time. If you look at some writings from Poincaré, one of the main French mathematicians of the 20th century.

already Poincaré just a few years before Einstein is already talking about this idea of relativity and you can see the equations also in one of his books previous to Einstein's publications. So it's like often it's, as you were saying, an incredible person that's also here at the right time, at the right place, who is in the ideas of his time. So that's also very important to highlight. I completely agree with that.

Yeah, in almost every case that you look at, if you ask the question, if this person had not done X, when would it have happened? Or who else might have done it? And almost every time the ideas were there, they would have come together. Yeah, maybe a bit later, or even maybe a bit earlier, we never know. But yeah, that's definitely the case. And I think the best proxy to the dinner we wanted to have is to have a dinner with the LBS community.

So we should organize that, you know, like an LBS dinner where everybody can join. That would actually be very fun. Maybe one day I'll get to do that. One of my wildest dreams is to organize a, you know, live episode somewhere where people could come join the show live and have a live audience and so on. We'll see if I can do that one day. If you have ideas or opportunities, feel free to let me know. And I think about it. Awesome. Alain, let's call it a show.

I could really record with you for like three hours. I literally still have a lot of questions on my cheat sheet, but let's call it a show and allow you to go to your main activities for the day. So thank you a lot, Alain. As I was saying, I put a lot of resources and a link to your website in the show notes for those who want to dig deeper. Thanks again, Alain, for taking the time and being on this show. Thank you. It's been really great. It's always a pleasure to talk with you.

Yeah. Feel free to come back to the show and answer the last two questions for a third time.

Transcript source: Provided by creator in RSS feed: download file