In this episode, Jonathan Templin, professor of Psychological and Quantitative Foundations at the University of Iowa, shares insight into his journey in the world of psychometrics. Jonathan's research focuses on diagnostic classification models, psychometric models that seek to provide multiple reliable scores from educational and psychological assessment. He also studies patient statistics as applied in psychometrics, broadly.
So naturally, we discussed the significance of psychometrics in psychological sciences and how Bayesian methods are helpful in this field. We also talked about challenges in choosing appropriate prior distributions, best practices for model comparison, and how you can use the multivariate normal distribution to infer the correlations between the predictors of your linear progressions.
This is a deep, reaching conversation that concludes with the future of Bayesian statistics in Psychological, Educational, and Social Sciences. Hope you'll enjoy it. This is Learning Bayesian Statistics, episode 94, recorded September 11, 2023. Hello, my dear Bayesians! This time, I have the pleasure to welcome three new members to our Bayesian crew, Bart Trudeau, Noes Fonseca, and Dante Gates. Thank you so much for your support, folks. It's the main way this podcast gets funded.
And Bart and Dante, get ready to receive your exclusive merch in the coming month. Send me a picture, of course. Now let's talk psychometrics and modeling with Jonathan Templin. Jonathan Templin, welcome to learning patient statistics. Thank you for having me. It's a pleasure to be here. Yeah, thanks a lot. Quite a few patrons have mentioned you in the Slack of the show. So I'm very honored to honor their request and have you on the show.
And actually thank you folks for bringing me all of those suggestions and allowing me to discover so many good patients out there in the world doing awesome things in a lot of different fields using our. favorite tools to all of us based in statistics. So Jonathan, before talking about all of those good things, let's dive into your origin story. How did you come to the world of psychometrics and psychological sciences and how sinuous of a path was it? That's a good question.
So I was an odd student, I dropped out of high school. So I started my... college degree and community college, that would be the only place that would take me. I happened to be really lucky to do that though, because I had some really great professors and I took a, once I discovered that I probably could do school, I took a statistics course, you know, typical undergraduate basic statistics. I found that I loved it.
I decided that I wanted to do something with statistics and then in the process, I took a research methods class in psychology and I decided somehow I wanted to do statistics in psychology. So moved on from community college, went to my undergraduate for two years at Sacramento state and Sacramento, California also was really lucky because I had professor there that said, Hey, there's this field called quantitative psychology. You should look into it.
If you're interested in statistics and psychology along the same time, he was teaching me something called factor analysis. I now look at it as more principal components analysis, but I wanted to know what was happening underneath the hood of factor analysis. And so that's where he said, no, really, you should go to the graduate school for that. And so that's what started me. I was fortunate enough to be able to go to the University of Illinois for graduate studies.
I did a master's, a PhD there, and in the process, that's where I learned all about Bates. So it was a really lucky route, but it all wouldn't have happened if I didn't go to community college, so I'm really proud to say I'm a community college graduate, if you will. Yeah. Nice. Yeah. So it kind of happened. somewhat easily in a way, right? Good meeting at the right time and boom. That's right. And the call of the eigenvalue is what really sent me to graduate school.
So I wanted to figure out what that was about. Yes, that is a good point. And so nowadays, What are you doing? How would you define the work you're doing and what are the topics that you are particularly interested in? I would put my work into the field of item response theory, largely. I do a lot of multidimensional item response theory. There are derivative fields I think I'm probably most known for, one of which is something called cognitive diagnosis or diagnostic classification modeling.
Basically, it's a classification based method to try to... Classify students, or I work in the College of Education, so most of this is applied to educational data from assessments, and our goal is to, whenever you take a test, not just give you one score, give you multiple valid scores, try to maximize the information we can give you.
My particular focus these days is in doing so in classroom-based assessments, so how do we understand what a student knows at a given point in the academic year and try to help make sure that they make the most progress they can. Not. to remove the impact of the teacher actually to provide the teacher with the best data to work with the child, to work with the parents, to try to move forward.
But all that boils down to interesting measurements, psychometric issues, and interesting ways that we look at test data that come out of classrooms. Okay. Yeah, that sounds fascinating. Basically trying to give a distribution of results instead of just one point estimate. That's it also and tests have a lot of error. So making sure that we don't over deliver when we have a test score.
Basically understanding what that is and accurately quantifying how much measurement error is or lack of reliability there is in the score itself. Yeah, that's fascinating. I mean, we can already dive into that. I have a lot of questions for you, but it sounds very interesting. So yeah. So what does it look like concretely? these measurement errors and the test scores attached to them, and basically how do you try to solve that?
Maybe you can take an example from your work where you are trying to do that. Absolutely. Let me start with the classical example. If this is too much information, I apologize. But to set the stage, for a long time in item response theory, we understand that a person's... Latentability estimate, if you want to call it that, is applied in education. So this latent variable that represents what a person knows, it's put onto the continuum where items are.
So basically items and people are sort of ordered. However, the properties of the model are such that how much error there might be in a person's point estimate of their score depends on where the score is located on the continuum.
So this is what, you know, theory gave rise to, you know, theory in the 1970s gave rise to our modern computerized adaptive assessments and so forth, that sort of pick an item that would minimize the error, if you will, different ways of describing what we pick an item for. But that's basically the idea.
And so from a perspective of where I'm at with what I do, a complicating factor in this, so that architecture that I just mentioned that historic version of adaptive assessments that really been built on large scale measures. So thousands of students and really what happens in a classical census you would take a marginal maximum likelihood estimate of certain parameter values from the model.
You'd fix those values as if you knew them with certainty and then you would go and estimate a person's parameter value along with their standard error conditional standard error measurement.
The situations I work in don't have large sample size but we all in addition to a problem with sort of the asthmatotic convergence, if you will, of those models, we also have a, not only we have not have large sample sizes, we also have multiple, multiple scores effectively, multiple latent freqs that we can't possibly do.
So when you look at the same problem from a Bayesian lens, sort of an interesting feature happens that we don't often see, you know, frequentness or a classical framework in that process of fixing the parameters of the model, the item parameters to a value, you know, disregards any error in the estimate as well.
Whereas if you're in a simultaneous estimate, for instance, in a markup chain where you're sampling these values from a posterior in addition to sampling students, it turns out those that error around those parameters can propagate to the students and provide a wider interval around them, which I think is a bit more accurate, particularly in smaller sample size. situation. So I hope that's the answer to your question.
I may have taken a path that might have been a little different there, but that's where I see the value at least in using Bayesian statistics and what I do. Yeah, no, I love it. Don't shy away from technical explanation on these podcasts. That's the good thing of the podcast. Don't have to shy away from it. It came at a good time. I've been working on this, some problems like this all day, so I'm probably in the weeds a little bit. Forgive me if I go at the deep end of it. No, that's great.
And we already mentioned item response theory on the show. So hopefully people will refer back to these episodes and that will give them a heads up. Well, actually you mentioned it, but do you remember how you first got introduced to Bayesian methods and why did they stick with you? Very, very much.
I was introduced because in graduate school, I had the opportunity to work for a lab run by Bill Stout at the University of Illinois with other very notable people in my career, at least Jeff Douglas, Louis Roussos, among others. And I was hired as a graduate research assistant. And my job was to take a program that was a metropolis Hastings algorithm and to make it run. And it was written in Fortran.
So basically, I It was Metropolis Hastings, Bayesian, and it was written in language that I didn't know with methods I didn't know. And so I was hired and said, yeah, figure it out with good luck. Thankfully, I had colleagues that could help actually probably figure it out more than I did. But I was very fortunate to be there because it's like a trial by fire. I was basically going line by line through that.
This was a little bit in the later part of, I think it was the year 2001, maybe a little early 2002. But something instrumental to me at the time were a couple papers by a couple scholars in education at least, Rich Pates and Brian Junker had a paper in 1999, actually two papers in 1999, I can even, you know, it's like Journal of Educational Behavioral Statistics. It's like I have that memorized.
But in their algorithm, they had written down the algorithm itself and it was a matter of translating that to the diagnostic models that we were working on. But that is why it stuck with me because it was my job, but then it was also incredibly interesting. It was not like a lot of the research that I was reading and not like a lot of the work I was doing in a lot of the classes I was in. So I found it really mentally stimulating, entirely challenging.
It took the whole of my brain to figure out. And even then I don't know that I figured it out. So that helps answer that question. Yeah. So basically it sounds like you were thrown into the Beijing pool. Like you didn't have any choice. I was. When I was Bayesian, it was nice because at the time, you know, this is 2001, 2002, in education, no measurement in psychology.
You know, we knew of Bayes certainly, you know, there's some great papers from the nineties that were around, but, you know, we weren't, it wasn't prominent. It wasn't, you know, I was in graduate school, but at the same time I wasn't learning it, I mean, I knew the textbook Bayes, like the introductory Bayes, but not, definitely not. Like the estimation side.
And so it was timing wise, you know, people would look back now and say, okay, why didn't I go grab Stan or grab, at the time I think we had, Jets didn't exist, there was bugs. And it was basically, you have to, you know, like roll your own to do anything. So it was, it was good. No, for sure. Like, yeah, no, it's like telling, it's like asking Christopher Columbus or That's right. It's a lot more direct. Just hop on the plane and... Wasn't an option. Exactly. Good point.
But actually nowadays, what are you using? Are you still doing your own sampler like that in Fortran or are you using some open source software? I can hopefully say I retired from Fortran as much as possible. Most of what I do is install these days a little bit of JAGS, but then occasionally I will... trying to write my own here or there. The latter part I'd love to do more of, because you can get a little highly specialized.
I just like that, I feel like the time to really deeply do the development work in a way that doesn't just have an R package or some package in Python that would just break all the time. So I'm sort of stuck right now with that, but it is something that I'm grateful for having the contributions of others to be able to rely upon to do estimation. Sorry. Yeah, no, exactly. I mean, So first, Stan, I've heard he's quite good. Of course, it's amazing.
A lot of Stan developers have been on this show, and they do absolutely tremendous work. And yeah, as you were saying, why code your own sampler when you can rely on samplers that are actually waterproof, that are developed by a bunch of very smart people who do a lot of math. and who do all the heavy lifting for you, well, just do that.
And thanks to that, Bayesian computing and statistics are much more accessible because you don't have to actually know how to code your own MCMC sampler to do it. You can stand on the shoulders of giants and just use that and superpower your own analysis. So it's definitely something we tell people, don't code your own samplers now. You don't need to do that unless you really, really have to do it. But usually, when you have to do that, you know what you're doing.
Otherwise, people have figured that out for you. Just use the automatic samplers from Stan or Pimsy or Numpyro or whatever you're using. It's usually extremely robust and checked by a lot of different pairs of eyes and keyboards. having that team and like you said, full of people who are experts in not only just mathematics, but also computer science makes a big difference. Yeah. I mean, I would not be able to use patient statistics nowadays if these samplers didn't exist, right?
Because I'm not a mathematician. So if I had to write my own sample each time, I would just be discouraged even before starting. Yeah. It's just a challenge in and of itself. I remember the old days where That would be it. That's my dissertation. That was what I had to do. So it was like six months work on just the sampler. And even then it wasn't very good. And then they might actually do the studying. Yeah, exactly.
Yeah. I mean, to me really that probabilistic programming is one of the super power of the Beijing community because that really allows. almost anybody who can code in R or Python or Julia to just use what's being done by very competent and smart people and for free. Right. Yeah. Also true. Yeah. What a great community. I'm really, really impressed with the size and the scope and how things have progressed in just 20 years. It's really something. Yeah. Exactly. And so actually...
Do you know why, well, do you have an idea why Bayesian statistics is useful in your field? What do they bring that you don't get with the classical framework? Yeah, in particular, we have a really nasty... If we were to do a classical framework, typically the gold standard in... the field I work in is sort of a marginal maximum likelihood. The marginal mean we get rid of the latent variable to estimate models. So that process of marginalization is done numerically.
We numerically integrate across likelihood function. Most cases, there are some special case models that we really are too simplistic to use for what we do where we don't have it. So if we want to do multidimensional versions If you think about numeric integration, for one dimension you have this sort of discretized set of a likelihood to take sums across different, what we call quadrature points of some type of curve.
For the multidimensional sense now, going from one to two, you effectively squared the number of points you have. So that's just too latent variable. So if you want two bits of information from an assessment from somebody, now you've just made your marginalization process exponentially more difficult, more time-consuming. But really, the benefit of having two scores is very little compared to having one.
So if we wanted to do five or six or 300 scores, that marginalization process becomes really difficult. So from a brute force perspective, if we take the a Bayesian sampler perspective, there is not the exponential increase of computation in the linear increase in the latent variables. And so from a number of steps the process has to take from calculation is much smaller. Now, of course, Markov chains have a lot of calculations.
So, you know, maybe overall the process is longer, but it is, I found it to be necessity, basing statistics to estimate in some form shows up in this multidimensional likelihood, basically evaluation. created sort of hybrid versions of EM algorithms where the E-step is replaced with the Bayesian type method. But for me, I like the full Bayesian approach to everything.
So I would say that just in summary though, what Bayes brings from a brute force perspective is the ability to estimate our models in a reasonable amount of time with a reasonable amount of computations. There's the added benefit of what I mentioned previously, which which is the small sample size, sort of the, I think, a proper accounting or allowing of error to propagate in the right way if you're going to report scores and so forth, I think that's an added benefit.
But from a primary perspective, I'm here because I have a really tough integral to solve and Bayes helps me get around it. Yeah, that's a good point. And yeah, like as you were saying, I'm guessing that having priors And generative modeling helps for low sample sizes, which tends to be the case a lot in your field. Also true. Yeah. The prior distributions can help. A lot of the frustration with multidimensional models and psychometrics, at least in practical sense.
You get a set of data, you think it's multidimensional. The next process is to estimate a model. in the classic sense that those models sometimes would fail to converge. Uh, and very little reason why, um, oftentimes it's failed to emerge. I had a class I taught four or five years ago where I just asked people to estimate five dimensions and not a single person couldn't could get, I had a set of data for each person.
Not a single person could get it in marriage with the default options that you'd see that like an IRT package. Um, so having the ability to sort of. Understand potentially where non-convergence or why that's happening, which parameters are finding a difficult spot. Then using priors to sort of aid an estimation as one part, but then also sort of the idea of the Bayesian updating.
If you're trying to understand what a student knows throughout the year, Bayesian updating is perfect for such things. You know, you can assess a student in November and update their results that you have potentially from previous parts in the year as well, too. So there's a lot of benefits. I guess I could keep going. I'm talking to a BASE podcast, so probably I already know most of it.
Yeah. I mean, a lot of people are also listening to understand what BASE is all about and how that could help them in their own field. So that's definitely useful if we have some psychometricians in the audience who haven't tried yet some BASE, well, I'm guessing that would be useful for them. And actually, could you share an example?
If you have one of a research project where BASE and stats played a a crucial role, ideally in uncovering insights that might have been missed otherwise, especially using traditional stats approaches? Yeah, I mean, just honestly, a lot of what we do just estimating the model itself, it sounds like it should be trivial. But to do so with a full information likelihood function is so difficult.
I would say almost every single analysis I've done using a multidimensional has been made possible because of the Bayesian analyses themselves. Again, there are shortcut methods you would call that. I think there are good methods, but again, there are people, like I mentioned, that sort of a hybrid marginal maximum likelihood. There's what we would call limited information approaches that you might see in programs like M plus, or there's an R package named Laban that do such things.
But those only use functions of the data, not the full data themselves. I mean, it's still good, but it's sort of I have this sense that the full likelihood is what we should be using. So to me, just a simple example, take a, I was working this morning with a four dimensional assessment, an assessment, you know, 20 item test, kids in schools. And you know, I would have a difficult time trying to estimate that with a full maximum likelihood method. And so Bayes made that possible.
But beyond that, if we ever want to do something with the test scores afterwards, right? So now we have a bunch of Markov chains of people's scores themselves. This makes it easy to be able to then not forget that these scores are not measured perfectly. And take a posterior distribution and use that in a secondary analysis as well, too. So I was doing some work with one of the Persian Gulf states where they were trying to like a vocational interest survey.
And some of the classical methods for this, sort of they disregarded any error whatsoever. And they basically said, oh, you're interested in, I don't know, artistic work or you know, numeric work of some sort. And they would just tell you, oh, that's it. That's your story. Like, I don't know if you've ever taken one of those. What are you gonna do in a career? You're in a high school student and you're trying to figure this out.
But if you propagate, if you allow that error to sort of propagate, through the way Bayesian methods make it very easy to do, you'll see that while that may be the most likely choice of what you're interested in or what your sort of dimensions that may be most salient to you in your interests, there are many other choices that may even be close to that as well. And that would be informative as well too. So we sort of forget, we sort of overstate how certain we are in results.
And I think a lot of the Bayesian methods built around it. So That was one actually project where I did write the own algorithm for it to try to estimate these things because it was just a little more streamlined. But it seemed it seemed that would rather than telling a high school student, hey, you're best at artistic things. What we could say is, hey, yeah, you may be best at artistic, but really close to that is something that's numeric, you know, like something along those lines.
So while you're strong at art. You're really strong at math too. Maybe you should consider one of these two rather than just go down a path that may or may not really reflect your interests. Hope that's a good example. Yeah. Yeah, definitely. Yeah, thanks. And I understand how that would be useful for sure. And how does, I'm curious about the role of priors in all that, because that's often something that puzzles beginners.
And so you obviously have a lot of experience in the Bayesian way of life in your field. So I'm curious, I'm guessing that you kind of teach the way to do psychometric analysis in the Bayesian framework to a lot of people. And I'm curious, especially on the prior side, and if there are other interesting things that you would like to share on that, feel free. My question is on the priors.
How do you approach the challenge of choosing appropriate prior distributions, especially when you're dealing with complex models? Great question. And I'm sure each field does it a little bit differently. I mean, as it probably should, because each field has its own data and models and already established scientific knowledge. So that's my way of saying. This is my approach. I'm 100% confident that it's the approach that everybody should take. But let me back it up a little bit.
So generally speaking, I teach a lot of students who are going into, um, many of our students end up in the industry for educational measurement here in the United States. Um, I like, we usually denote our score parameters with theta. I like to go around saying that, yeah, I'm teaching you to have to sell That's sort of what they do, you know, in a lot of these industry settings, they're selling test scores.
So if you think that that's what you're trying to do, I think that guides to me a set of prior choices that try to do the least amount of speculation. So what I mean by that. So if you look at a measurement model, like an item response model, you know, there's a set of parameters to it. One parameter in particular, in item response theory, we call it the discrimination parameter or Factor analysis, we call it factor loading, and linear regression, it would be a slope.
This parameter tends to govern the extent to which an item relates to the latent variable. So the higher that parameter is, the more that item relates. Then when we go and do a Bayes theorem to get a point estimate of a person's score or a posterior distribution of that person's score, the contribution of that item. is largely reflected by the magnitude of that parameter.
The higher the parameter that is, the more that item has weight on that distribution, the more we think we know about a person.
So in doing that, when I look at setting prior choices, what I try to do for that is to set a prior that would be toward zero, mainly, actually at zero mostly, try to set it so that we want our data to tell more of the job than our prior, particularly if we're trying to, if this score has a big, uh, meaning to somebody you think of, um, well, in the United States, the assessment culture is a little bit out of control, but, you know, we have to take tests to go to college.
We have to take tests to go to graduate school and so forth. Uh, then of course, if you go and work in certain industries, there's assessments to do licensure, right? So if you, you know, for instance, my family is a, I come from that family of nurses, uh, it's a very noble profession, but to, to be licensed in a nurse in California, you have to pass an exam. provide that score for the exam that we're not, that score reflects as much of the data as possible unless a prior choice.
And so there are ways that, you know, people can sort of use priors, they're sort of not necessarily empirical science benefit, you can sort of put too much subjective weight onto it. So when I talk about priors, when I talk about the, I try to talk about the ramifications of the choice of prior on certain parameters, that discrimination parameter or slope, I tend to want to have the data to force it to be further away from zero because then I'm being more conservative, I feel like.
The rest of the parameters, I tend to not use heavy priors on what I do. I tend to use some very uninformative priors unless I have to. And then the most complicated prior for what we do, and the one that's caused historically the biggest challenge, although it's, I think, relatively in good place these days thanks to research and science, is the prior that goes on a covariance or correlation matrix. That had been incredibly difficult to try to estimate back in the day.
But now things are much, much easier in modern computing, in modern ways of looking, modern priors actually. Yeah, interesting. Would you like to walk us a bit through that? What are you using these days on priors on correlation or covariance matrices? Because, yeah, I do teach those also because... I love it.
Basically, if you're using, for instance, a linear regression and want to estimate not only the correlation of the parameters, the predictors on the outcome, but also the correlation between the predictors themselves and then using that additional information to make even better prediction on the outcome, you would, for instance, use a multivariate normal on the parameters on your slopes. of your linear regression, for instance, what primaries do you use on that multivariate?
What does the multivariate normal mean? And a multivariate normal needs a covariance matrix. So what primaries do you use on the covariance matrix? So that's basically the context for people. Now, John, basically try and take it from there. What are you using in your field these days? Yeah, so going with your example, I have no idea. You know, like, if you have a set of regression coefficients that you say are multivariate normal, yes, there is a place for a covariance in the prior.
I never try to speculate what that is. I don't think I have, like, the human judgment that it takes to figure out what the, like, the belief, your prior belief is for that. I think you're talking about what would be analogous to sort of the, like, the asthmatotic covariance matrix.
The posterior distribution of these parameters where you look at the covariance between them is like the asymptotic covariance matrix in ML, and we just rarely ever speculate off of the diagonal, it seems like, on that. I mean, there are certainly uses for linear combinations and whatnot, but that's tough.
I'm more thinking about, like, when I have a handful of latent variables and try to estimate, now the problem is I need a covariance matrix between them, and they're likely to be highly correlated, right? So... In our field, we tend to see correlations of psychological variables that are 0.7, 0.8, 0.9. These are all academic skills in my field that are coming from the same brain. The child has a lot of reasons why those are going to be highly correlated.
And so these days, I love the LKJ prior for it. It makes it easy to put a prior on a covariance matrix and then if you want to rescale it. That's one of the other weird features of the psychometric world is that because these variables don't exist, to estimate covariance matrix, we'd have to make certain constraints on the, on some of the item parameters, the measurement model for instance.
If we want a variance of the factor, we have to set one of the parameters of the discrimination parameters to a value to be able to estimate it. Otherwise, it's not identified. work that we talk about for calibration when we're trying to build scores or build assessments and their data for it, we fix that value of the variance of a factor to one. We standardize the factor zero, meaning variance one, very simple idea.
The models are equivalent in a classic sense, in that the likelihoods are equivalent, whether we do one way or the other. When we put products on the posteriors aren't entirely equivalent, but that's a matter of a typical Bayesian issue with transformations.
But In the sense where we want a correlation matrix, prior to the LKJ, prior, there were all these sort of, one of my mentors, Rod McDonald, called devices, little hacks or tricks that we would do to sort of keep covariance matrix, sample it, right? I mean, you think about statistically to sample it, I like a lot of rejection sampling methods. So if you were to basically propose a covariance or correlation matrix, it has to be positive. semi-definite, that's a hard term.
It has to be, you have to make sure that the correlation is bounded and so forth. But LKJ takes care of almost all of that for me in a way that allows me to just model the straight correlation matrix, which has really made life a lot easier when it comes to estimation. Yeah, I mean, I'm not surprised that does. I mean, that is also the kind of priors I tend to use personally and that I teach also.
In this example, for instance, of the linear regression, that's what I probably end up using LKJPrior on the predictors on the slopes of the linear regression. And for people who don't know, Never used LKJ prior. LKJ is decomposition of the covariance matrix. That way, we can basically sample it. Otherwise, it's extremely hard to sample from a covariance matrix. But the LKJ decomposition of the matrix is a way to basically an algebraic trick.
that makes use of the Cholesky decomposition of a covariance matrix that allows us to sample the Cholesky decomposition instead of the covariance matrix fully, and that helps the sampling. Thank you. Thank you for putting that out there. I'm glad you put that on. Yeah, so yeah. And basically, the way you would parametrize that, for instance, in Poem C, you would use pm.lkj, and basically you would have to parameterize that with at least three parameters, the number of dimensions.
So for instance, if you have three predictors, that would be n equals 3. The standard deviation that you are expecting on the predictors on the slopes of the linear regression, so that's something you're used to, right? If you're using a normal prior on the slope, then the sigma of the slope is just standard deviation that you're expecting on that effect for your data and model. And then you have to specify a prior on the correlation of these slopes.
And that's where you get into the covariance part. And so basically, you can specify a prior. So that would be called eta in PIME-Z on the LKJ prior. And the bigger eta, the more suspicious of high correlations your prior would be. So if eta equals 1, you're basically expecting a uniform distribution of correlations. That could be minus 1, that could be 1, that could be 0. All of those have the same weight.
And then if you go to eta equals 8, for instance, you would put much more prior weight on correlations eta. Close to zero, much of them will be close to zero in 0.5 minus 0.5, but it would be very suspicious of very big correlations, which I guess would make a lot of sense, for instance, social science. I don't know in your field, but yeah. I typically use the uniform, the one setting, at least to start with, but yeah, I think that's a great description. Very good description.
Yeah, I really love these kinds of models because they make linear regression even more powerful. To me, linear regression is so powerful and very underrated. You can go so far with plain linear regression and often it's hard to really do better. You have to work a lot to do better than a really good linear regression. I completely agree with you. Yeah, I'm 100% right there.
And actually then you get into sort of the... quadratic or the nonlinear forms in linear regression that map onto it that make it even more powerful. So yeah, it's absolutely wonderful. Yeah, yeah. And I mean, as Spider-Man's uncle said, great power comes with great responsibility.
So you have to be very careful about the priors when you have all those features, so inversing functions because they the parameter space, but same thing, well, if you're using a multivariate normal, I mean, that's more complex. So of course you have to think a bit more about your model structure, about your prior. And also the more structure you add, if the size of the data is kept equal, well, that means you have more risk for overfitting and you have less informative power per data point.
Let's say so. That means the prior. increase in importance, so you have to think about them more. But you get a much more powerful model after once and the goal is to get much more powerful predictions after once. I do agree. These weapons are hard to wield. They require time and effort. And on my end, I don't know for you. Jonathan, but on my end, they also require a lot of caffeine from time to time. Maybe. Yeah. I mean, so that's the key. You see how I did the segue. I should have a podcast.
Yeah. So as a first time I do that in the podcast, but I had that. Yeah. So I'm a big coffee drinker. I love coffee. I'm a big coffee nerd. But from time to time, I try to decrease my caffeine usage, you know, also because you have some habituation effects. So if I want to keep the caffeine shot effect, well, I have to sometimes do a decrease of my usage. And funnily enough, when I was thinking about that, a small company called Magic Mind, they came to me...
They sent me an email and they listened to the show and they were like, hey, you've got a cool show. I would be happy to send you some bottles for you to try and to talk about it on the show. And I thought that was fun. So I got some Magic Mind myself. I drank it, but I'm not going to buy Jonathan because I got Magic Mind to send some samples to Jonathan. And if you are watching the YouTube video, Jonathan is going to try the Magic Mind right now, live.
So yeah, take it away, Jon. Yeah, this is interesting because you reached out to me for the podcast and I had not met you, but you know, it's a conversation, it's a podcast, you have to do great work. Yes, I'll say yes to that. Then you said, how would you like to try the Magic Mind? And I thought... being a psych major as an undergraduate, this is an interesting social psychology experiment where a random person from the internet says, hey, I'll send you something.
So I thought there's a little bit of safety in that by drinking it in front of you while we're talking on the podcast. But of course, I know you can cut this out if I hit the floor, but here it comes. So you're drinking it like, sure. Yeah, I decided to drink it like a shot, if you will. It was actually tasted much better than I expected. It came in a bottle with green. It tasted tangy, so very good.
And now the question will be, if I get better at my answers to your questions by the end of the podcast, therefore we have now a nice experiment. But no, I noticed it has a bit of caffeine, certainly less than a cup of coffee. But at the same time, it doesn't seem offensive whatsoever. Yeah, that's pretty good. Yeah, I mean, I'm still drinking caffeine, if that's all right. But yeah, from time to time, I like to drink it. My habituation, my answer to that is just drink more. That's fine.
Yeah, exactly. Oh yeah, and decaf and stuff like that. But yeah, I love the idea of the product is cool. I liked it. So I was like, yeah, I'm going to give it a shot. And so the way I drank it was also basically making myself a latte coffee, I would use the Magic Pint and then I would put my milk in the milk foam. And that is really good. I have to say. See how that works. Yeah. So it's based on, I mean, the thing you taste most is the matcha, I think.
And usually I'm not a big fan of matcha and that's why I give it the green color. I think usually I'm not, but I had to say, I really appreciated that. You and me both, I was feeling the same way. When I saw it come in the mail, I was like, ooh, that added to my skepticism, right? I'm trying to be a good scientist. I'm trying to be like, yeah. But yeah, it was actually surprisingly, tasted more like a juice, like a citrus juice than it was matcha. So it was much nicer than I expected.
Yeah, I love that because me too, I'm obviously extremely skeptical about all those stuff. So. I like doing that. It's way better, way more fun to do it with you or any other nerd from the community than doing it with normal people from the street because I'm way too skeptical for them. They wouldn't even understand my skepticism. I agree. I felt like in a scientific community, I've seen some of the people you've had on the podcast, we're all a little bit skeptical about what we do.
I could bring that skepticism here and I'd feel like at home, hopefully. I'm glad that you allowed me to do that. Yeah. And that's the way of life. Thanks for trusting me because I agree that seeing from a third party observer, you'd be like, that sounds like a scam. That guy is just inviting me to sell him something to me. In a week, he's going to send me an email to tell me he's got some financial troubles and I have to wire him $10,000.
Waiting for that or is it, what level of paranoia do I have this morning? I was like, well, who are my enemies and who really wants to do something bad to me? Right? So, I don't believe I'm at that level. So I don't think I have anything to worry about. It seems like a reputable company. So it was, it was amazing. Yeah. No, that was good. Thanks a lot MagicMine for sending me those samples, that was really fun.
Feel free to give it a try, other people if you want, if that sounded like something you'd be interested in. And if you have any other product to send me, send them to me, I mean, that sounds fun. I mean, I'm not gonna say yes to everything, you know, I have standards on the show, and especially scientific standards. But you can always send me something. And I will always analyze it. You know, somehow you can work out an agreement with the World Cup, right?
Some World Cup tickets for the next time. True. That would be nice. True. Yeah, exactly. Awesome. Well, what we did is actually kind of related, I think, I would say to the other, another aspect of your work. And that is model comparison. So, and it's again, a topic that's asked a lot by students. Especially when they come from the classical machine learning framework where model comparison is just everywhere. So often they ask how they can do that in the Bayesian framework.
Again, as usual, I am always skeptical about just doing model comparison and just picking your model based on some one statistic. I always say there is no magic one matching bullet, you know, in the Bayesian framework where it's just, okay, model comparisons say that, so for sure. That's the best model. I wouldn't say that's how it works. And you would need a collection of different indicators, including, for instance, the LOO, the LOO factor, that tells you, yeah, that model is better.
But not only that, what about the posterior predictions? What about the model structure? What about the priors? What about just the generative story about the model? But talking about model comparison, what can you tell us, John, about the some best practices for carrying out effective model comparisons? Kajen is best practice. I'll just give you what my practice is. I will make no claim that it's best. It's difficult. I think you hit on all the aspects of it in introducing the topic.
If you have a set of models that you're considering, the first thing I'd like to think about is not the comparison between them as much as how each model would fit a data set of data post-serial predictive model checking is, you know, from an amazing sense is where really a lot of the work for me is focused around. Interestingly, what you choose to check against is a bit of a challenge, particularly, you know, in certain fields in psychometrics, at least the ones I'm familiar with.
I do see a lot of, first of all, model fit, well-researched area in psychometrics in general. Really, there's millions of papers in the 1980s, maybe not millions, but it seems like that many. And then another, it's always been something that people have studied. I think recently there's been a resurgence of new ideas in it as well. So it's well-covered territory from the psychometric literature. It's less well-covered, at least in my view, in Bayesian psychometrics.
So what I've tried to do, with my work to try to see if a model fits absolutely is to look at, there's this, one of the complicating factors is that a lot of my data is discrete. So it's correct and correct scored items. And in that sense, in the last 15, 20 years, there's been some good work in the non-Bayesian world about how to use what we call limited information methods to assess model fit. So instead of, looking at model fit to the entire contingency table.
So if you have a set of binary data, let's say 10 variables that you've observed, technically you have 1,024 different probabilities that have permutations of ways they could be zeros and ones. And model fit should be built toward that 1,024 vector of probabilities. Good luck with that, right? You're not gonna collect enough data to do that. And so... What a group of scientists Alberto Medeo Alavarez, Lissai and others have created are sort of model fit to lower level contingency tables.
So each marginal moment of the day, each mean effectively, and then like a two-way table between all pairs of observed variables. In work that I've done with a couple of students recently, we've tried to replicate that idea, but more on a Bayesian sentence. So could we come up with and M, like a statistic, this is called an M2 statistic. Could we come up with a version of a posterior predictive check for what a model says the two-way table should look like?
And then similar to that, could we create a model such that we know saturates that? So for instance, if we have 10 observed variables, we could create a model that has all 10 shoes to two-way tables estimated perfect, what we would expect to be perfect.
Now, of course, there's posterior distributions, but you would expect with you know, plenty of data and, you know, very diffused priors that you would get point estimates, EAP estimates, and that should be right about where you can observe the frequencies of data. Quick check. So, um, the idea then is now we have two models, one of which we know should fit the data absolutely. And one of which we know, uh, we're, we're wondering if it fits now that the comparison comes together.
So we have these two predictive distributions. Um, how do we compare them? Uh, and that's where, you know, different approaches we've taken. One of those is just simply looking at the distributional overlaps. We tried to calculate a, we use the Kilnogorov Smirnov distribution, sort of the sea where moments are percent wise of the distributions with overlap, because if your model's data overlaps with what you think that the data should look like, you think the model fits well.
And if it doesn't, it should be far apart and won't fit well. That's how we've been trying to build. It's weird because it's a model comparison, but one of the comparing models we know to be what we call saturated, it should fit the data the best and no other model, all the other models should be subsumed into it. So that's the approach I've taken recently with posterior predictive checks, but then a model comparison. We could have used, as you mentioned, the LOO factor or the LOO statistic.
And maybe that's something that we should look into also. We haven't yet, but one of my recent graduates, new assistant professor at University of Arkansas here in the United States. Ji Hang Zhang had done a lot of work on this in his dissertation and other studies here. So that's sort of the approach I take. The other thing I want to mention though is when you're comparing amongst models, you have to establish that model for that absolute fit first.
So the way I envision this is you sort of compare your model to this sort of saturated model. You do that for multiple versions of your models and then effectively choose amongst the set of models you're comparing that sort of fit. But what that absolute fit is, is like you mentioned, it's nearly impossible to tell exactly. There's a number of ideas that go into what makes a good for a good fitting model. Yeah. And definitely I encourage people to go take a look at the Lou paper.
I will put a link in the show note to that paper. And also if you're using Arvies, whether in Julia or Python, we do have. implementation of the Loo algorithm. So comparing your models with obviously extremely simple, it's just a call to compare and then you can even do a plot of that. And yeah, as you were saying, the Loo algorithm doesn't have any meaning by itself. Right? The Loo score of a model doesn't mean anything. It's in comparison to another, to other models.
So yeah, basically having a baseline model that you think is already good enough. And then all the other models have to compare to that one, which basically could be like the placebo, if you want, or the already existing solution that there is for that.
And then any model that's more complicated than that should be in competition with that one and should have a reason to be used, because otherwise, why are you using a more complicated model if you could just use a simple linear regression, because that's what I use most of the time for my baseline model. Right?
Baseline model, just use a simple linear regression, and then do all the fancy modeling you want and compare that to the linear regression, both in predictions and with the Loo algorithm. And well, if there is a good reason to make your life more difficult, then use it. But otherwise, why would you?
And yeah, actually talking about these complexities, something I see is also that many, many people, many practitioners might be hesitant to adopt the patient methods due to the fact that they perceive them as complex. So I'm wondering yourself, what resources or strategies would you recommend to those who want to learn and apply patient techniques in their research? And especially in your field of psychometrics.
Yeah. I think, um, starting with an understanding of sort of just the output, you know, the basics of if you're, if you have data and if your responsibility is providing analysis for it, uh, finding either a package or somebody else's program that makes the coding quick. So like you've mentioned linear regression, if you use VRMS and R, you know, which will translate that into Stan. You can quickly go about getting a Bayesian result fast.
And I found that to me, the conceptual consideration of what a posterior distribution is actually is less complex than we think about when we think about all the things that we're drilled into in the classical methods, like, you know, what, where does the standard error come from and all this other, you know, asymptotic features in Bayes it's, it's visible, like you can see a posterior distribution, you can plot it, you can, you know, touch it, almost like touch it and feel it, right?
It's right there in front of you. So for me, I think the thing I try to get people to first is just to understand what the outputs are. Sort of what are the key parts of it. And then, you know, hopefully that gives that mental representation of where that, where they're moving toward. And then at that point, start to add in all the complexities.
Um, but it is, I think it's, it's incredibly challenging to try to, to teach Bayesian methods and I actually think the further along a person goes, not learning the Bayesian version of things. Makes it even harder because now you have all this well-established, um, can we say routines or statistics that you're used to seeing that are not Bayesian, uh, that may or may not have a direct, um, analog in the Bayes world. Um, but that may not be a bad thing.
So, um, thinking about it, actually, I'm going to take a step back here. Can conceptually, I think it's, this is the challenge, um, we face in a program like I do right here. I'm working right now. I work with, um, nine other tenure track. or Tender to Tender Tech faculty, which is a very large program. And we have a long-running curriculum, but sort of the question I like to ask is, what do we do with Bayes? Do we have a parallel track in Bayes? Do we do Bayes in every class?
Because that's a heavy lift for a lot of people as well. Right now, it's, I teach the Bayes classes, and occasionally some of my colleagues will put Bayesian statistics in their classes, but it's tough. I think if I were you know, anointed myself king of how we do all the curriculum. I don't know the answer I'd come to. I go back and forth each way. So, um, I would love to see what a curriculum looks like where they only started with base and only kept it in base.
Cause I think that would be a lot of fun. Um, and the quit, the thought question I asked myself that I don't have an answer for is would that be a better mechanism to get students up to speed on the models they're using, then it would be in other contexts and other classical contexts, I don't, I don't know. Yeah. Yay. Good point. Yeah, two things. First, King of Curriculum, amazing title. I think it should actually be renamed to that title in all campuses around the world.
The world's worst kingdom is the curriculum. Yeah. I mean, that's really good. Like you're going to party, you know, and so what are we doing on King of Curriculum? So long as the crown is on the head, that's all that matters, right? That would drop some jaws for sure. And second, I definitely would like the theory of the multiverse to be true, because that means in one of these universes, there is at least one where Bayesian methods came first.
And I am definitely curious to see what that world looks like and see how... Yeah, what... What's that world where people were actually exposed to patient methods first and maybe to frequency statistics later? Were they actually exposed to frequency statistics later? That's the question. No, but yeah, jokes aside, I would be definitely curious about that. Yeah, well, I don't know that I'll have that experiment in my lifetime, but maybe like in a parallel universe somewhere.
Before we close up the show, I'm wondering if you have a personal anecdote or example of a challenging problem you encountered in your research or teaching related to vision stats and how you were able to navigate through it? Yeah. I mean, maybe it's too much in the weeds, but that first experience I was in graduate school trying to learn. code. It was coding a correlation matrix of tetrachore correlations. And that was incredibly difficult.
One day, one of my colleagues, Bob Henson, figured it out with the likelihood function and so forth. But that was the holdup that we had. And it's incredible because I say this because again, we're not, I mentioned it. do a lot of my own package coding or whatnot. But I think you see a similar phenomenon if you misspecify something in your model in general and you get results and the results are either all over the place or entire number line.
For me, it was the correlations, posterior distribution looked like a uniform distribution from negative one to one. That was, that's a bad thing to see, right? So just the, the anecdote I have with this is, it's less, I guess it's less like awesome, like when you're like, oh, Bayes did this and then.
couldn't have done it otherwise, but it's more the perseverance that goes to sticking with the Bayesian side, which is, um, Bayes also provides you the ability to check a little bit of your work to see if it's completely gone sideways. Right. So, uh, you see a result like that. You have that healthy dose of skepticism.
You start to investigate more in my case, it took years, a couple of years of my life, uh, working in concert with other people, uh, as grad students, but, um, was fixed, it was almost obvious that it was. I mean, it was, you went from this uniform distribution across negative one to one to something that looked very much like a posterior distribution that we're used to seeing, send around a certain value of the correlation.
And again, it was, for us, it was figuring out what the likelihood was, but for most packages, at least that's not a big deal. I think it's already specified in your choice of model and prior. But at the same time, just remembering that Uh, it's sort of the, the frustration part of it, not making it work is actually really informative. Uh, you get that and you, you can build and you can sort of check your work if you go forward analytically.
I mean, not analytically brute force, the sampling part, but that's sort of a check on your work. Trying to say, so not a great example, not a super inspiring example, but, um, more perseverance pays off in days and in life. So it's sort of the analog that I get from it. Yeah. Yeah, no, for sure. I mean, um, is perseverance is so important because you're definitely going to encounter issues. I mean, none of your models is going to work as you thought it would.
So if you don't have that drive and that passion for the thing that you're standing, it's going to be extremely hard to just get it through the finish line because it's not going to be easy. So, you know, it's like choosing a new sport. If you don't like what the sport is all about, you're not going to stick with it because it's going to be hard. So that perseverance, I would say, come from your curiosity and your passion for your field and the methods you're using.
And the other thing I was going to add, this is tangential, but let me just add it, you have the chance to go visit Bay's grave in London, take it. I had to do that last summer. I just, I was in London, I had my children with me and we all picked some spot we wanted to go to. And I was like, I'm going to go find and take a picture in front of Bayes grave. And I sort of brought up an interesting question. Like I don't know the etiquette of taking photographs in front of a deceased grave site.
This is at least providing it. But then ironically, as you're sitting there, as I was sitting there on the tube, leaving, I sat next to a woman and she had Bayes theorem on her shirt. It was the Bayes School of Economics. So something like this. in London, I was like, it was like, okay, I have reached the Mecca. Like the perseverance led to like, like a trip, you know, my own version of the trip to, to London. Uh, but definitely, uh, definitely worth the time to go.
If you want to be surrounded, uh, once you reach that, that level of perseverance, uh, you're part of the club and then you can do things like that. Fine vacations around, you know, holidays around base, base graves. Yeah. I mean. I am definitely gonna do that. Thank you very much for giving me another idea of a nerd holiday. My girlfriend is gonna hate me, but she always wanted to visit London, so you know, that's gonna be my bait.
It's not bad to get to, it's off of Old Street, you know, actually well marked. I mean the grave site's a little weathered, but it's in a good spot, a good part of town, so you know, not really heavily touristy, amazingly. Oh yeah, I'm guessing. But you know. I am guessing that's the good thing. Yeah, no, I already know how I'm gonna ask her. Honey, when I go to London? Perfect. Let's go to Bay's. Let's go check out Bay's Grave. Yeah, I mean, that's perfect. That's amazing.
So say, I mean, you should send me that picture and that should be your picture for these episodes. I always take a picture from guests to illustrate the episode icon, but you definitely need that. picture for your icon. I can do that. I'll be happy to. Yeah. Awesome. Definitely. So before asking you the last two questions, I'm just curious how you see the future of patient stats in the context of psychological sciences and psychometrics.
And what are some exciting avenues for research and application that you envision in the coming years or that you would really like to see? Oh, that's a great question. Terrible. So I, you know, interestingly, in psychology, you know, quantitative psychology sort of been on a downhill swing for, I don't know, 5060 years, there's fewer and fewer programs, at least in the United States, where people are training.
But despite that, I feel like the use of Bayesian statistics is up in a lot of a lot of different other areas. And I think that I think that affords a bit. better model-based science. So you have to specify a model, you have to model in mind, and then you go and do that. I think that benefit makes the science much better. You're not just using sort of what's always been done. You can sort of push the envelope methodologically a bit more.
And I think that that, and Bayesian statistics in one way, another benefit of them is now you can code an algorithm that likely will work without having to know, like you said, all of the underpinnings, the technical side of things, you can use an existing package to do so. I like to say that that's going to continue to make science a better practice.
I think the fear that I have is sort of the sea of the large language model-based version of what we're doing in machine learning, artificial intelligence. But I will be interested to see how we incorporate a lot of the Bayesian ideas, Bayesian methods into that as well. I think that there's potential. Clearly, people are doing this, I mean, that's what runs a lot of what is happening anyway. So I look forward to seeing that as well.
So I get a sense that what we're talking about is really what may be the foundation for what the future will be. I mean, maybe we will, maybe instead of that parallel universe, if we could come back or go into the future just in our own universe in 50 years, maybe what we will see is curriculum entirely on Bayesian methods. And from, you know, I just looked at your. topic list you had recently talking about variational inference and so forth.
The use of that in very large models themselves, I think that is very important stuff. So it may just be the thing that crowds out everything else, but that's speculative and I don't make a living making prediction, unfortunately. So that's the best I can do. Yeah. Yeah, yeah. I mean, that's also more of a wishlist question. So that's all good. Yeah. Awesome. Well, John, amazing. I learned a lot. We covered a lot of topics. I'm really happy.
But of course, before letting you go, I'm going to ask you the last two questions I ask every guest at the end of the show. Number one, you had unlimited time and resources. Which problem would you try to solve? Well, I would be trying to figure out how we know what a student knows every day of the year so that we can best teach them where to go next. That would be it.
Right now, there's not only the problem of the technical issues of estimation, there's also the problem of how do we best assess them, how much time do they spend doing it and so forth. That to me is what I would spend most of my time on. That sounds like a good project. I love it. And second question, if you could have dinner with any great scientific mind that life are fictional. who did be. All right. I got a really obscure choice, right? It's not like I'm picking Einstein or anything.
I really, I have like two actually, I've sort of debated. One is economist Paul Krugman, who writes for the New York Times, works at City University of New York now. You know, Nobel laureate. Loved his work, loved his understanding of, for the interplay between model and data and understanding is fantastic. So I would just. sit there and just have to listen to everything you had to say, I think. The other is there's a, again, obscure thing.
One of my things I'm fascinated by is weather and weather forecasting. Uh, if you know, I'm in education or psychological measurement. Uh, and there's a guy who started the company called the weather underground. His name is Jeff Masters. Uh, you can read his work on a blog at Yale these days, climate connections, something along those lines. Anyway, since sold the company, but he's fascinating about modeling, you know, Right now we're in the peak of hurricane season in the United States.
We see these storms coming off of Africa or spinning up everywhere and sort of the interplay between, unfortunately, the climate change and then other atmospheric dynamics. This just makes for an incredibly complex system that's just fascinating and how science approaches prediction there. So I find that to be great. But those are the two. I had to think a lot about that because there's so many choices, but those two people are the ones I read the most, certainly when it's not just in my field.
Nice. Yeah, sounds fascinating. And weather forecasting is definitely incredible. Also, because the great thing is you have feedback every day. So that's really cool. You can improve your predictions. Like the missing data problem. You can't sample every part of the atmosphere. So how do you incorporate that into your analysis as well? No, that's incredible. Multiple average models and stuff. Anyway, yeah.
Yeah, that's also a testimony to the power of modeling and parsimony, you know, where it's like, because I worked a lot on electoral forecasting models and, you know, classic way people dismiss models in these areas. Well, you cannot really predict what people are going to do at an individual level, which is true. I mean, you cannot, people have free will, you know, so you cannot predict at an individual level what they are going to do, but you can.
quite reliably predict what masses are going to do. Yeah, basically, where the aggregation of individual points, you can actually kind of reliably do it. And so the power of modeling here where you get something that, yeah, you know, it's not good. It's, you know, the model is wrong, but it works because it simplifies things, but doesn't simplify them to a point where it doesn't make sense anymore.
Kind of like the standard model in physics, where we know it doesn't work, it breaks at some point, but it does a pretty good job of predicting a lot of phenomena and we observe. So, do you prefer that? Is it free will or is it random error? Well, you have to come back for another episode on that because otherwise, yes. That's a good one. Good point. Nice. Well, Jonathan, thank you so much for your time.
As usual, I will put resources and a link to your website in the show notes for those who want to dig deeper. Thank you again, Jonathan, for taking the time and being on this show. Happy to be here. Thanks for the opportunity. It was a pleasure to speak with you and I hope it makes sense for a lot of people. Appreciate your time.