#102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle - podcast episode cover

#102 Bayesian Structural Equation Modeling & Causal Inference in Psychometrics, with Ed Merkle

Mar 20, 20241 hr 9 minSeason 1Ep. 102
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!


Structural Equation Modeling (SEM) is a key framework in causal inference. As I’m diving deeper and deeper into these topics to teach them and, well, finally understand them, I was delighted to host Ed Merkle on the show.

A professor of psychological sciences at the University of Missouri, Ed discusses his work on Bayesian applications to psychometric models and model estimation, particularly in the context of Bayesian SEM. He explains the importance of BSEM in psychometrics and the challenges encountered in its estimation.

Ed also introduces his blavaan package in R, which enhances researchers' capabilities in BSEM and has been instrumental in the dissemination of these methods. Additionally, he explores the role of Bayesian methods in forecasting and crowdsourcing wisdom.

When he’s not thinking about stats and psychology, Ed can be found running, playing the piano, or playing 8-bit video games.

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

Thank you to my Patrons for making this episode possible!

Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser and Julio.

Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)

Takeaways:

- Bayesian SEM is a powerful framework in psychometrics that allows for the estimation of complex models involving multiple variables and causal relationships.

-...

Transcript

Structural Equation Modeling, or SEM, is a key framework in causal inference. As I'm diving deeper and deeper into these topics to teach them and, well, finally understand them, I was delighted to host Ed Merkel on the show. A professor of psychological sciences at the University of Missouri, Ed discusses his work on Bayesian applications to psychometric models and model estimation. particularly in the context of Bayesian SEM.

He explains the importance of Bayesian SEM in psychometrics and the challenges encountered in its estimation. Ed also introduces his blaavan package in R, which enhances researchers' capabilities in Bayesian SEM and has been instrumental in the dissemination of these methods.

Additionally, he explores the role of Bayesian methods in forecasting and crowdsourcing wisdom, and when he's not thinking about stats and psychology, Ed can be found running, playing the piano, or playing 8 -bit video games. This is Learning Bayesian Statistics, episode 102, recorded February 14, 2024. Welcome to Learning Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible. I'm your host, Alex Andorra.

You can follow me on Twitter at alex -underscore -andorra. like the country. For any info about the show, learnbasedats .com is left last to be. Show notes, becoming a corporate sponsor, unlocking Bayesian Merge, supporting the show on Patreon, everything is in there. That's learnbasedats .com. If you're interested in one -on -one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate .io slash alex underscore and dora.

See you around, folks, and best Bayesian wishes to you all. Thank you for having me. Yeah, you bet. Thanks a lot for taking the time. I am really happy to have you on and I have a lot of questions. So that is perfect. Before that, as usual, how would you define the work you're doing nowadays and how did you end up working on this? Well, a lot of my work right now is with Bayesian applications to psychometric models and model estimation.

Over time, I've gotten more and more into the model estimation and computation as opposed to applications. And it was a slow process to get here. I started doing some Bayesian modeling when I was working on my PhD. I finished that in 2005 and... I felt a bit restricted by what I could do with the tools I had at that time, but things have improved a lot since then. And also I've learned a lot since then. So I have over time left some things and come back to them.

And when I come back to them, I find there's more progress that can be made. Yeah, that makes sense. And that's always super... interesting and inspiring to see such diverse backgrounds on the show. I'm always happy to see that. And by the way, thanks a lot to Jorge Sinval to do the introduction. Today is February 14th and he was our matchmaker. So thanks a lot, Jorge. And yeah, like this promises to be a great episode. So thanks a lot for the suggestion.

And Ed, actually, could you tell us the topics that you are particularly focusing on? Yeah, recently, so in psychology, psychometrics, education, there's this class of models, structural equation models. It's a pretty large class of models and I think some special cases have been really useful. Others sometimes get a bad reputation with, I think, certain groups of statistics people.

But it's this big class and it has interested me for a long time because so much can be done with this class of models. So the Bayesian estimation part has especially been interesting to me because it was relatively underexplored for a long time. And there's some unique challenges there that I have found and I've tried to make some progress on. Yeah. And we're going to dive into these topics for sure in the coming minutes.

But to still talk about your background, do you remember how you first got introduced to Bayesian inference and also why they sticked with you? Yes. I think part of how I got interested in Bayesian inference, starts a lot earlier to when I was growing up. I'm about the age where the first half of my childhood, there were no computers. And the second half of growing up, computers were in people's houses, the internet was coming around and so on.

So I grew up with having a computer in my house for the first time. And then... just messing around with it and learning how to do things on it. So then later, a while later when I was working on my PhD, I grew up with the computing topics and I enjoyed that. So I felt at the time with Bayesian estimation, some of the interesting computing things were coming out around the time I was working on my PhD. So for example, wind bugs was a big thing, say around 2000, 2001 or so.

That was when I was starting to work on my PhD. And that seemed like a fun little program where you could build these models and do some Bayesian estimation. At the time, I didn't always know exactly what I was doing, but I still found it interesting and perhaps a bit more intuitive than some of the other. methods that were out there at the time. Yeah. And actually it seems like you've been part of that movement, which introduced patient stats a lot in the psychological sciences.

Can you elaborate on the role of the patient framework in psychological research? Always a hard word to say when you have a French accent. I understand. So yeah, when I was working on my PhD, I think there was not a lot of psychology applications necessarily, or maybe it was just in certain areas. So when I started on my PhD, I was doing like some cognitive psychology modeling where you would bring.

someone into a room for an experiment and it could be about memory or something where you have them remember a list of words and then you give them a new list of words and ask them which did you see before and which are new and then you can model people's response times or accuracy. So there were some Bayesian applications definitely related to like memory modeling at that time but more generally there were less applications.

I did my PhD on some Bayesian structural equation modeling applications to missing data. At the time, I had a really hard time publishing that work. I think it was partly because I just wasn't that great at writing papers at the time, but also there weren't as many Bayesian applications. So I think people were less interested. But over time that has changed, I think with... with improved tools and more attention to Bayesian modeling. You see it more and more in psychology.

Sometimes it's just an alternative to frequentness. Like if you're doing a regression or a mixed model, Bayesian is just an alternative. Other times, like for the structural equation models, there can be some advantages to the Bayesian approach, especially related to characterizing uncertainty. And so I think there's more and more attention in psychology and psychometrics to some of those issues.

Yeah. And definitely interesting to see, to hear that the publishing has, has gotten, has become easier, at least for you. And a method you're especially working on and developing is Bayesian structural equation modeling or BSEM. So we've never covered that yet on the show. So could you give our listeners a primer on BSEM and its importance in psychometrics?

Yes. So this Bayesian structural equation modeling framework, or maybe I can start with just the structural equation modeling part, that overlaps with lots of other modeling frameworks. So item response models and factor analysis models, these are more on the measurement side, examining how say some tests or scales help us to measure a person's aptitude.

Those could all be viewed as special cases of structural equation models, but the heart of structural equation models involves, Like a series of regression models all in in one big model. So if if you know, like the directed acyclic graphs that come from causal research, especially Judea Pearl, you can think of structural equation models as a way to estimate those types of models. Like these graphs will often have many variables.

and you have arrows between variables that reflect some causal relationships. Well, now structural equation models are throwing likelihoods on top of that, typically normal likelihoods. And that gives us a way to fit these sorts of models to data. Whereas directed acyclic graph would often, you look at that and that helps you to know what is estimable and what is not estimable, say. that now the structural equation model is a way to fit that sort of thing to data.

But it also overlaps with mixed models. Like I said, the item response models, there's some ideas related to principal components in there. It overlaps with a lot of things. Yeah, that's really interesting to have that take you on structural. structural equation modeling and the relationship to causal inference in a way. And so as you were saying, it also relates to UDA pearls, to calculus and things like that.

So I definitely encourage the listener to dive deeper on these literature that's absolutely fascinating. I really love that. And that's also from my own perspective learning about those things recently, I found that it was way easier being already a Bayesian.

If you already do Bayesian models from a generative modeling perspective, then intervening on the graph and doing, like in calculus, doing an intervention is basically like doing bus operative sampling as you were already doing on your Bayesian model. But instead of having already conditioned on some data, you come up with the platonic idea of the data generative model that you have in mind.

And then you intervene on the model by setting some values on some of the nodes and then seeing what that gives you, what that intervention gives you on the outcome. And I find that really, really natural to learn already from a Bayesian perspective. I don't know what your experience has been. Oh, yeah, I think the Bayesian perspective really helps you keep these models at like the raw data level.

So you're thinking about how do individual variables cause other variables and what does that mean about data predictions? If you look at often how frequent this present these models. We have something like random effects in these models. And so from a frequentist perspective, you wanna get rid of those random effects, marginalize them out of a model. And then for these models, we're left with some structured covariance matrix.

And often the frequentist will start with, okay, you have an observed covariance matrix and then our model implies a covariance matrix. But I find that so it's... it's unintuitive to think about compared to raw data. You know, like I can see how the data from one variable can influence another variable, but now to think about what does that mean about the prediction for a covariance that I think makes it less intuitive and that's really where some of the Bayesian models have an advantage.

Yeah, yeah, definitely. And that's why my learning myself on on this front and also teaching about these topics has been extremely helpful for myself because to teach it, you really have to understand it really well. So that was a great Or said differently that you don't understand it until you teach it. I've thought that I understood things before, but then when I teach it, I realized, well, I didn't quite understand everything. Yeah, for sure. Definitely.

And what advice would you give to someone who is already a Bayesian and want to learn about these structural equation modeling, and to someone who is already doing psychometrics and would like to now learn about these structural equation modeling? What advice would you give to help them start on this path? Yeah, I think. For people who already know Bayesian models.

I think I would explain structural equation models as like a combination of say principal components or factor analysis and then regression. And I think you can, there's these expressions for the structural equation modeling framework where you have these big matrices and depending on what goes in the matrices, you get certain models.

I would almost advise against starting there because you can have this giant framework that's expressing matrices, but it gets very confusing about what goes in what matrix or what does this mean from a general perspective. I would almost advise starting smaller, say with some factor analysis models, or you can have these models where there's one unobserved variable regressed on another unobserved variable. I would say like starting with some of those models and then working your way up.

On the other hand, if someone already knows the psychometric models and is moving to Bayesian modeling, I think the challenge is to think of these models again as models of data, not as models of a covariance matrix. I guess that's related to what we talked about earlier. But if you know the frequentist models, typically the just how they talk about these models involves just a covariance matrix or tricks for marginalizing over the random effects or the random parameters in the model.

And I think taking a step back and looking at what does the model say about the data before we try to get rid of these random parameters, I think that is helpful for thinking through the Bayesian approach. Okay, yeah. Yeah, super interesting. in the then I would also want to ask you once you once you've done that so once you're into BSEM why is that useful and what is its importance in your field of psychometrics these days?

Yeah, so the Bayesian part, I would say one use is, I think it slows you down a bit. There are certain, say, specifying prior distributions and really thinking through the prior distributions. This is something you don't encounter on the frequentist side. It's going to slow you down, but I think for these models, that ends up being useful because...

You know, if you simulate data from priors and really look at what are these priors saying about the sort of data I can expect, I find that helps you understand these models in a way that you don't often get from the frequentist side. And then I guess said differently, I think over say the past 30, 40 years with these structural equation models, I think often in the field we've come to expect that I can specify this giant model and hit a button and run it.

And then I get some results and report just a few results from this big model. I think we've lost something with understanding what. exactly as this model is saying about the data. And that's a place where the Bayesian versions of these models can be really helpful. I think there was a second part to your question, but I forgot the second part. Yeah, what is the importance of BSCM these days in psychometrics? Yeah, yeah. I think there's a couple, I think key advantages.

One, again, we have random parameters that are sort of like random effects if you know mixed models. And with MCMC, we can sample these parameters and characterize their uncertainty or allow the uncertainty in these random parameters to filter through to other model predictions. That's something that's very natural to do from a Bayesian perspective. potentially not from other perspectives. So there's a random parameter piece.

Another thing that people talk about a lot is fitting these models to smaller sample sizes. So for some of these structural equation models, there's a lot happening and you can get these failures to converge if you're estimating frequentist versions of the model. Bayesian models, can still work there. I think you still have to be careful because of course if you don't have much data, the priors are going to be more influential and sensitivity analyses and things become very important.

So I think it's not just a full solution to if you don't have much data, but I think you can make some progress there with Bayesian models that are maybe more difficult with frequentist models. Okay, I see. And on the other end, what are some of the biggest challenges you've encountered in BSM estimation and how does your work address them? I've found I encounter problems as I'm working on my R package or just unestimating the models.

There's a number of problems that aren't completely evident when you start. And one I've worked on recently and I continue to work on is specifying prior distributions for these models in a way that you know exactly what the prior distributions are. in a non -software dependent way. So in some of these models, there's, say there's a covariance matrix, a free parameter. So you're estimating a full covariance matrix.

Now, in certain cases of these models, I'm going to fix some off diagonal elements of this covariance matrix to zero. but then I want to freely estimate the rest of this covariance matrix. That becomes very difficult when you're specifying prior distributions now because we have to keep this full covariance matrix positive definite. And I have prior distributions for like an unrestricted covariance matrix. You could do a Wishard or an LKJ, say.

But to have this covariance matrix where some of the entries are, say, fixed to zero, but I still have to keep this full covariance matrix positive definite. The prior distributions become very challenging there. And there's some workarounds that are, I would say, allow you to estimate the model, but make it difficult to describe exactly what prior distribution did you use here. That's a piece that continues to challenge me. Yeah, and so what are you?

What I'm working on these days to try and address that. Um I've been, I've looked at some ways to decompose a covariance matrix. So let's say the Kolesky factors or things, and we have put prior distributions on some decomposition of this covariance matrix so that it's easy to put, say, some normal priors on the elements of the decomposition while maintaining this positive definite full covariance matrix.

And, I think I made some progress there, but then you get into this situation where I want to put my prior distributions on intuitive things. If I get to like some Kolesky factor that might have some intuitive interpretation, but sometimes maybe not. And you run into this problem then of, okay, if I want to put a prior distribution on this.

could I meaningfully do that or could a user meaningfully do that versus they would just use some default because they don't know what else they would put on that. That becomes a bit of a problem too. Yeah, yeah. That's definitely also something I have to handle when I am teaching these kind of the compositions.

Like usually the way I... teach that is when you do that in a linear regression, for instance, and you would try and infer not only the intercept and the slope, but the correlation of intercept and slope. And so that way, if the intercept, like if you have a negative covariance matrix, for instance, that's inferred between the intercept and the slope. That means, well, if you observe a group and if you do that in a hierarchical model, particularly, that's very useful.

Because that means, well, if I'm in a group of the hierarchical model where the intercepts are high, that probably means that the slopes are low. So, because we have that negative covariation. And that's interesting because that allows the model to squeeze even more information from the data and so make even more informed and accurate predictions. But of course, to do that, the challenge, is that you have to infer a covariance matrix between the intercept and the slope.

How do you infer that covariance matrix that usually tends to be hard and computationally intensive? And so that's where the decomposition of the covariance matrix enters the round. So especially the Kolesky decomposition of the covariance matrix, that's what we usually recommend doing in PMC. And we have that PM .LKJKoleskykov distribution. And two parametrized that you have to give a prior on the correlation matrix, which is a bit weird.

But when you think about it, when people think about it, it's like, wait, prior as a distribution, understand a prior as a distribution on a correlation matrix is hard to understand. But actually, when you decompose, it's not that hard. because it's mainly, well, what's the parameter that's inside a correlation matrix? That's parameter that says there is a correlation between A and B. And so what is your a priori belief of that correlation between the intercept and the slope?

And so usually you don't want the completely flat prior, which stays any correlation is possible with the same degree of belief. So that means I really think that there is as much possibility of that of slopes and intercept to be completely positively correlated as they have a possibility to be not at all correlated. I'm not sure. So if you think that, then you need to use a regularizing weighting information priors as you do for any other parameters.

So you could think of coming up with a prior that's a bit more bell -shaped prior in a way that gives more mass to the low. Yeah. to smaller correlations. And then that's how usually you would do that in PMC. And that's what you're basically talking about. Of course, that's more complicated and it makes your model more complex. But once you have ran that model and have that inference, that can be extremely useful and powerful for posterior analysis. So it's trade -off. Yeah, yeah, definitely.

But that reminds me of... I would say like in psychology, in psychometrics, there's still a lot of hesitance to use informative priors. There's still the idea of I want to do something objective. And so I want my priors to be all flat, which especially like you say for a correlation or even for other parameters, I'm against that.

Now I would like to put some... information in my priors always, but that is always a challenge because like for the models I work with, users are accustomed, like I said, to specifying this big model and pressing a button and it runs and it estimates. But now you do that in a Bayesian context with these uninformative priors. Sometimes you just run into problems and you have to think more about the priors and add some information. Yeah. Which is, if you ask me, a blessing in disguise, right?

Because just because a model seems to run doesn't mean it is giving you sensible results and unbiased results. I actually love the fact that usually HMC is really unforgiving of really bad priors. So of course, it's usually something we tend to teach is, try to use priors that make sense, right? A priori. Most of the time you have more information than you think.

And if you're thinking from a betting perspective, like let's say that any decision you make with your model is actually something that's going to cost you money or give you money. If you were to bet on that prior, why wouldn't you use any information that you have at your disposal? Why would you throw away information if you knew that actually you had information that would help you make a more informed... bet and so bet that gives you actually more money instead of losing money.

And so I find that this way of framing the priors can actually like usually works on beginners because that helps them see the like the idea. It's like the idea is not to fudge your analysis, even though I can show you how to fudge your analysis, but in both ways. I can use priors which are going to bias the model, but I can also use priors that are going to completely unbiased the model, but just make it so variable that it's just going to answer very aggressively to any data point.

And do you really want that? I'm not sure. Do you really want to make very hard claims based on very small data? I'm not sure. So again, if you come back to this idea of, imagine that you're betting. Wouldn't you use all the information you have at your disposal? That's all. That's everything you're doing. That doesn't mean that information is golden. That doesn't mean you have to be extremely certain about the information you're putting in.

That just means let's try to put some more structure because that doesn't make any sense if you're modeling football players. That doesn't make any sense to allow them to be able to score 20 goals in a game. It doesn't ever happen. Why would you let the model... a low for that possibility. You don't want that. It's going to make your model harder to estimate, longer, it's going to take longer to estimate also. And so that's just less efficient. Yeah. You mentioned too of HMC being unforgiving.

And yeah, a lot of the software that I've been working on, the model is run and stand. And from time to time, well, for some of these structural equation models, there's some... Like, weekly identified parameters, or maybe even unidentified parameters, but I run into these situations where. Somebody runs a Gibbs sampler and they say, look, it just worked and it converged and now I move this model over to Stan and I'm getting these by modal posteriors or such and such.

It's sort of like a bit of an education of saying, well, the problem is at Stan. The problem was the model all along, but the Gibbs sampler just didn't. tell you that there was a problem. Yeah, exactly. Exactly. Yeah. Yeah. That's like, that's a joke. I have actually a sticker like that, which is a, which is a meme of, you know, that meme of that, that, that guy from a, I think it's from the notebook, right?

Who, who is crying and yeah, basically the sticker I have is when someone tells me that the model he has divergences in HMC. So they are switching to the Metropolis sampler and. I just dance like, yeah, sure. You're not going to have divergences with the metropolis sampler. Doesn't mean the model is converting as you want. And yeah, so that's really that thing where, yeah, actually, you had problems with the model already.

It's just that you were using a crude instrument that wasn't able to give you these diagnostics. It's like doing an MRI with a stethoscope. Yeah. Yeah, that's not going to work. It's going to look like you don't have any problems, but maybe you do. It's just like you're not using the right tool. So yeah. And also this idea of, well, let's use flat priors and just let the data speak. That can work from time to time. And that's definitely going to be the case anyways, if you have a lot of data.

Even if you're using weekly regularizing priors, that's exactly the goal. It's just to give you enough structure to the model in case the data are not informative for some parameters. The bigger the model, the more parameters, well, the less informed the parameters are going to be if your data stay what they are, keep being what they are, right? If you don't have more. And also that assumes that the data are perfect, that there's no bias, that the data are completely trustworthy.

Do you actually believe that? If you don't, well, then... You already know something about your data, right? That's your prior right here. If you think that there is sampling bias and you kind of know why, well, that's a prior information. So why wouldn't you tell that in the model? Again, from that betting perspective, you're just making your model's life harder and your inference is potentially wrong. I'm guessing that's not what you want as the modeler. Yeah, you can trust the data blindly.

Should you though? That's a question you have to answer each time you're doing a model. Yep. Most often than not, you cannot. Yeah, yeah. Yeah, the HMC failing thing, I think that's a place where you can really see the progress that's been made in Bayesian estimation. Just like say in the 20 some years that I've been doing it, I can think back to starting out with wind bugs. You're just happy to get the thing to run. and to give you some decent convergence diagnostics.

I think a lot of the things we did around the start of wind bugs, if you try to run them in Stan now, you find there were a lot of problems that were just hidden or you're kind of overlooked. Yeah, yeah, yeah, for sure. And definitely that I think we've hammered that point in the community quite a lot. in the last few years. And so definitely those points that I've been making in the last few minutes are clearly starting to percolate.

And I think the situation is way better than it was a few years ago, just to be clear and not come across as complaining statisticians. Because I'm already French. So people already imagine that I'm going to assume that I'm going to complain. So if on top of that, I complain about stats, I'm done. People are not going to listen to the podcast anymore. I think you'll be all right.

So to continue, I'd like to talk about your Blavin package and what inspired the development of this package and how does it enhance the capabilities of researchers in doing BSEM? Yeah, I think I said earlier my... PhD was about some Bayesian factor analysis models and looking at some missing data issues. I would say it wasn't the greatest PhD thesis, but it was finished.

And at the time, I thought it would be nice to have some software that would give you some somewhat simple way to specify a model. And then it could be translated to like at the time wind bugs so that you could have some easier MCMC estimation. But at that time, like, I, the, like R wasn't as quite as developed and my skills weren't quite there to be able to do that all on my own. So I left it for a few years, then around 2009 or so, I think.

Some R packages for frequent structural equation models were becoming better developed and more supported. So a few years later, I met the developer of the LaVon package, which does frequent structural equation models and did some work with him. And from there I thought, well, he's done some of the hard work already just with model specification and setting up the model likelihood. So I built this package on top of what was already there to do like the Bayesian version of that model estimation.

And then it has just gone from there. I think I continue to learn more things about these models or encounter tricky issues that I wasn't quite aware of when I started. And I just have... continue it on. Yeah. Well, that sounds like a fun project for sure. And how would people use it right now? When would you recommend using your package for which type of problems?

Well, the idea from the start was always... make the model specification and everything very similar to the LaVon package for Frequence models because that package was already fairly popular among people that use these models. And the idea was, well, they could move to doing a Bayesian version without having to learn a brand new model specification. They could already do something similar to what they had been doing on the Frequence side.

So that's like, from the start where we, the idea that we had or what we wanted to do with a package and then who would use it? I think it could be for some of these measurement problems, like I said, with item response modelers or things if they wanted to do a Bayesian version of some of these models that's currently possible and blah, blah, and another place is.

With something kind of similar to the DAGs, the directed acyclic graphs we talk about, especially in the social sciences, people have these theories about they have a collection of variables and what variables cause what other variables and they want to estimate some regression type relationships between these things. You would see it often like an observational data where you can't really do these. these manipulations the way you could in an experiment.

But the idea is that you could specify a graph like that and use Blofond to try to estimate these regression -like relationships that if the graph is correct, you might interpret it as causal relationships. Yeah, fascinating, fascinating. I love that. And I'll put the package, of course, in the show notes. And I encourage people to take a look at the website. There are some tutorials and packages of the, sorry, some tutorials on how to use the package on there.

So yeah, definitely take a look at the resources that are on the website. And of course, everything is on the show notes. Another topic I thought was very interesting from your background is that your research also touches on forecasting and subjective probability. Can you discuss how Bayesian methods improve these processes, particularly in crowdsourcing wisdom, which is something you've worked on quite a lot? Yeah, I started working on that. It was probably 2009 or 2010.

So at that time, I think... Tools like Mechanical Turk were becoming more usable and so people were looking at this wisdom of Krausen saying, can we recruit a large group of people from the internet? And if we average their predictions, do those make for good predictions? I got involved in some of that work, especially through some forecasting tournaments that were being run by the US government or some branches of the US government at the time.

I think Bayesian tools there first made some model estimations easier just the way they sometimes do in general. But also with forecasting, it's all about uncertainty. You might say, here's what I think will happen. But then you also want to have some characterization of. your certainty or uncertainty that something happens. I think that's where the Bayesian approach was really helpful.

Of course, you always have this trade -off with you are giving a forecast often to like a decision maker or an executive or someone that is a leader. Those people sometimes want the simplest forecast possible and it's sometimes difficult to convince them that, Well, you also want to look at the uncertainty around this forecast as opposed to just a point estimate. Yeah. But that's some of the ways we were using Bayesian methods, at least to try to characterize uncertainty.

Yeah. Yeah. I'm becoming more and more authoritative on these fronts, you know, just not even giving the point estimates anymore and by default giving a range for the predictions. and then people have to ask you for the point estimates. Then I can make the point of, do you really want that? Why do you want that one? And why do you want the mean more than the tail? Maybe in your case, actually, the tail scenarios are more interesting. So keep that in mind.

So yeah, people have to opt in to get the point estimates. And well, the human brain being what it is, usually it's happy with the default. And so... Making the default better is something I'm trying to actually actively do. That's a good point. So what for reporting modeling results, you avoid posterior means. All you give them is like a posterior interval or something. A range. Yeah. Yeah. Yeah, exactly. Not putting particular emphasis on the mean.

Because otherwise what's going to end up happening, and that's extremely frustrating to me, is... I mentioned that you're comparing two options. And so you have the posterior on option A, the posterior on option B. You're looking at the first plot of A and B. They seem to overlap. So then you compute the difference of the posteriors. So B minus A. And you're seeing where it spans on the real line.

And if option A and B are close enough, the HDI, so the highest density interval, is going to overlap with zero. And it seems like zero is a magic number that makes the whole HDI collapse on one point. So basically, the zero is a black hole which just sucks everything onto itself, and then the whole range is zero.

And then people are just going to say, oh, but that's weird because, no, I think there is some difference between A and B. And then you have to say, but that's not what the model is saying. You're just looking at zero and you see that the HDI overlaps zero at some point.

But actually the model is saying that, I don't know, there is an 86 % chance that option A is actually better than option B is actually better than A. So, you know, there is a five in six chance, which is absolutely non -next level that B is indeed better than A, but we can actually rule out the possibility that A is better than B. That's what the model is saying. It's not telling you that there is no difference.

And it's not telling you that A is definitely better than B. And that is still in it. I'm trying to crack. But yeah, here you cannot make the zero disappear, right? But the only thing you can do is make sure that people don't interpret the zero as a black hole. That's the main thing. Yeah, yeah. Yeah, yeah, that's a good point. I can see that being challenging for people that come from frequentist models because what they're accustomed to, the maximum likelihood estimate.

And it's all about those point estimates. But I like the idea of not even supplying those point estimates. Yeah. Yeah, yeah. I mean, and that makes sense in the way that's just a distraction. It doesn't mean anything in particular. That's mainly a distraction. What's more important here is the range. of the estimates. So, you know, like give the range and give the point estimates if people ask for it. But otherwise, that's more distraction than anything else.

And I think I got that idea from listening to a talk by Richard MacGarriff, who was talking about something he called table two fallacy. Yeah, I know that. Where usually the present the table of estimates in the table two. And usually people tend to, his point with that, people tend to interpret the coefficient on a linear regression, for instance, as all of them as causal, but they are not.

The only parameter that's really causally interpretable is the one that relates the treatment to the outcome. The other one, for instance, from a mediator to the outcome, or... the one from a confounder to the outcome, you cannot interpret that parameter as causal.

Or you have to do the causal graph analysis and then see if the linear regression you ran actually corresponds to the one you would have to run in this new causal DAG to identify or the direct or the total causal effect of that new variable that you're taking as the treatment. basically you're changing the treatment here. So you have to change the model potentially.

And so you cannot interpret and should absolutely not interpret the parameters that are not the one from the treatment to the outcome as causally interpretable. And so to avoid that fallacy, he was suggesting two options or you actually provide the interpretation of that parameter in the current DAG that you have.

And say, if it's not causally interpretable in that case, which DAG you would have, which regression, sorry, which model would have to use, which is different from the one you actually have RAM to actually be able to interpret that coefficient causally. Or you just don't report these parameters, these coefficients, because they are not the point of the analysis. The point of the analysis is to relate the treatment to the outcome and see what the effect of the treatment is on the outcome.

not what the treatment of a camp founder on the outcome is. So why would you report that in the first place? You can report it if people ask for it, but you don't, you should not report it by default. Yeah, yeah.

There's some good like tie -ins to structural equation models there too, because I think like in some of those, some of McElroy's examples, he dabbles a little bit in structural equation model and to, it's kind of like a one possible solution here to, to really saying what could we interpret causally or not in the presence of confounding variables or like there's the colliders that also cause problems if you include them in a regression. Yeah, he does a little bit.

I've seen some of his examples like what structural equation model source of things. I think there's something interesting there about informing what predictors should go in a regression or. what could we interpret causally out of a particular model? Yeah, exactly. And I have actually linked to the table 2 fallacy thing I was talking about, his video of that. So this will be in the show notes for people who want to dig deeper. Yes. And, yeah, so we're in this discussion.

I really love to talk about these topics, as you can see, and I've really deeply enjoyed diving deeper into them. And still, I'm diving deeper into these topics for 2024. That's one of my objectives, so that's really fun. Yeah. Maybe let's talk about latent viable models, because you also work on that. And if I understood correctly, they are quite crucial in psychology. So how do you approach these models, especially in the context of patient stance?

And maybe explain, also give us a primer on what latent viable models are. Yeah, I would. So sometimes I almost use them as like just another term for structural equation model. They're very related. I would say. I would say if I'm around psychology or psychometrics people, I would use the term structural equation model.

But if I'm around statistics people, I might more often use the term latent variable model because I think that term latent variable, or maybe sometimes people might say a hidden variable or something that's unobserved. But it's like in... in structural equation modeling, that is sort of just like a random effect or a random parameter that we assume has some influence on other observed variables. And that you can never observe it. That's right.

And so the traditional example is... maybe something related to intelligence or say like a person's math aptitude, something you would use a standardized test for. You can't directly observe it. You can ask many questions that get at a person's math aptitude. And we could assume, yes, there's this latent aptitude that each person has that we are trying to measure with all of our questions on a standardized test. That sort of gets at the idea of latent variable.

Yeah. Yeah. And like, or another example would be the latent popularity of political parties. Like, you never really observed them. Actually, you just have an idea with polls. You had a better idea with elections, but even elections are not a perfect image of that because nobody, like, not everybody goes and vote. So that's thank you again. actually never observe the actual popularity of political parties in the total population because, well, even elections don't make a perfect job of that.

Yeah, yeah, yeah. Yeah, and then people will get into a lot of deep philosophy conversations about does this latent variable even exist and how could one characterize that? And Personally, I don't often get into those deep philosophy conversations. I just more think of this as a model than within this model. It could be a random parameter. And I guess maybe it's just my personal bias. I don't think about it too abstractly.

I just think about how does this latent variable function in a model and how can I fit this model to data? Yeah, I see. And so in these cases, how do you found that using a basin framework has been helpful? Yeah, I think related to it, I was discussing before about these latent variables are often like random effects. And so from a Bayesian point of view, you can sample those parameters and look at how their uncertainty filters through to other parts of your model. That's all.

very straightforward from a Bayesian point of view. I think those are some of the big advantages. OK, I see. I see. Yeah. If we de -zoom a bit, I'm actually curious, what would you say is the biggest hurdle in the Bayesian workflow currently?

Um There's always challenges with how long does it take MCMC to run, especially for people coming from frequentist models or things where, for some frequentist models, especially with these structural equation or latent variable models, you can get some maximum likelihood estimates in a couple of seconds. And there's cases with MCMC, it might take much longer depending on how the model was set up or how tailored. your estimation strategy is to a particular model.

So I think speed is always an issue. And that I think could maybe detract some people from doing Bayesian modeling sometimes. I would say maybe the other barrier to the workflow is just getting people to slow down and just be happy with slowing down with working through their model. I think especially in the social sciences where I work, people become too accustomed to specifying their model, pressing a button, getting the results immediately and writing it and being done.

And I think that's not how good Bayesian modeling happens. Good Bayesian modeling, you sit back a little bit and think through everything. And... I think is a challenge convincing people sometimes to make that a habitual part of the workflow. Yeah. Bayesian models need love. You need to give it love for sure. I personally have been working lately on an academic project like that where we're writing a paper on, basically it's a trade paper on biology, marine biology trade.

And the model is extremely complex. And that's why I'm on this project is to work with the academics working on it who are extremely knowledgeable, of course, but on their domain. And me, I don't understand anything about the biology part, but I'm just here to try and make the model work. And the one is tremendously complicated because the phenomenon they are studying is extremely complex.

So. Yeah, but like here, the amazing thing is that the person leading the project, Aaron McNeil, has a huge appetite for that kind of work, right? And really love doing the Bayesian model, coding it, and then improving it together. But definitely that's a big endeavor, takes a lot of time. But then the model is extremely powerful afterwards and you can get a lot of inferences that you cannot have with a classic trivial model. So, you know, there is no free lunch, right?

If your model is trivial, your inferences probably will be, unless you're extremely lucky and you're just working on something that nobody has worked on before. So then it's like, just a forest completely new. But otherwise, if you want interesting inferences, you have to have an interesting model. And that takes time, takes dedication, but for sure it's extremely... interesting and then after once it gives you a lot of power. So, you know, it's a bit of a...

That's also a bit frustrating to me in the sense that the model is actually not going to be really part of the paper, right? People just care about the results of the model. But me, it's like, and I mean, it makes sense, right? It's like when you buy a car, yeah, the engine is important, but you care about the whole car, right? But I'm guessing that the person who built the engine is like, yeah, but without the engine, it's not even a car. So why don't you give credit to the engine?

But that makes sense. But it was really fun for me to see because for me, the model is really the thing. But it's actually almost not even going to be a part of the paper. It's going to be an annex or something like that. Yeah. That's really weird. Put it in the appendix. Yeah. Yeah. So I've already taken a lot of your time, Ed. So let's head up for the last two questions. Before that, though, I'm curious, looking forward, what exciting developments do you foresee in patient psychometrics?

Uh, the one that I see coming is related to the speed issue again. So, um, I, what there's, there's more and more MCMC stuff with GPUs. And I was at a stand meeting last year where they're talking about, um, you know, imagine being able to run hundreds of parallel chains that all like share a burn in so that, you know, one chain isn't going to go off and do something really crazy. I think all of that is really interesting.

And I think that could really improve some of these bigger psychometric models that can take a while to run if we could do lots of parallel chains and be pretty sure that they're gonna converge. I think is something coming that will be very useful. Yeah, that definitely sounds like an awesome project. So before letting you go, Ed, I'm going to ask you the last two questions I ask every guest at the end of the show.

First one, if you had unlimited time and resources, which problem would you try to solve? Yes. So I guess people should say, you know, world hunger or world peace or something, but I think I would probably go for something that's closer to what I do. And one thing that comes to mind involves maybe improving math education or making it more accessible to more people.

I think at least in the US, like for younger kids growing up with math, it feels a little bit like sports where if you are fortunate to have gotten into it really early, then you like have this advantage and you do well. But if you come into math late, say maybe as a teenager, I think what happens sometimes is, You see other people that are way ahead of you, like solving problems you have no idea how to do.

And then you get maybe not so enthusiastic and you just leave and do something else with your life. I think more could be done just to try to get more interested people like staying in math related fields and doing more work there. I think. with unlimited resources, that's the sort of thing that I would try to do. Yeah, I love that. And definitely I can, yeah, I can understand why you would say that. That's a very good point. As I was to say, I was late coming around to math myself.

I think I don't know what happens in every country, but in the US, it feels like... You're just expected to think that math is this tough thing that's not for you. And unless you have like influences in your life that would convince you otherwise, I think a lot of kids just don't even make an attempt to do something with math. Yeah, yeah, that's a good point. And second question, if you could have dinner with any great scientific mind, dead, alive, or fictional, who would it be?

Yeah, this is one that is easy to overthink or to really make a big thing about. But so here's one thing that I think about. There's, I think it's called Stigler's law about it's related to this idea that the person who is known for like a major finding or scientific result often isn't the one that did the hard work.

Maybe they were the ones that that were like promoted themselves the most or or otherwise just got their name attached and so If I'm having dinner, I want it to be more of a low -key dinner. So I don't necessarily want to go for the most famous person that is the most known for something because I worry that they would just like promote themselves the whole time or you would feel like you're talking to a robot because they're They're like, they see themselves as kind of above everyone.

So with that in mind, and keeping it on the Bayesian viewpoint, one person that comes to mind is Arianna Rosenbluth, who was one of the, I think was the first to like program a Metropolis Hastings algorithm and did it in the context of the Manhattan project during World War II. So I think she would be an interesting person to have dinner with. She clearly did some important work.

Didn't quite get the recognition that some others did, but also I think she didn't have a traditional academic career. So that means that dinner, you know, you could talk about some work things, but also I think she would be interesting to talk to just, you know, just about other non -work things. That's the kind of dinner that I would like to have. So that's my answer. Love it. Love it, Ed. Fantastic answer. And definitely invite me to that dinner. That would be fascinating.

Fantastic. Thanks a lot, Ed. We can call it a show. That was great. I learned a lot. And as usual, I will put a link to your website and your socials and tutorials. in the show notes for those who want to dig deeper. Thank you again. All right. Thanks for taking the time and being on the show. Thanks for having me. It was fun. This has been another episode of Learning Bayesian Statistics. Be sure to rate, review, and follow the show on your favorite podcatcher, and

visit learnbaystats .com for more resources about today's topics, as well as access to more episodes to help you reach true Bayesian state of mind. That's learnbaystats .com. Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraam. Check out his awesome work at bababrinkman .com. I'm your host. Alex and Dora. You can follow me on Twitter at Alex underscore and Dora like the country. You can support the show and unlock exclusive benefits by visiting patreon

.com slash LearnBasedDance. Thank you so much for listening and for your support. You're truly a good Bayesian change your predictions after taking information and if you think and I'll be less than amazing. Let's adjust those expectations. Let me show you how to be a good Bayesian Change calculations after taking fresh data in Those predictions that your brain is making Let's get them on a solid foundation

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast