[Alex Andorra]: Charles Margossian, welcome to Learning Vision Statistics. [Charles Margossian]: Hi Alex, thanks for having me. [Alex Andorra]: Yeah, thanks a lot for taking the time. I'm super happy to have you here. [Alex Andorra]: It's been a while since I wanted to have you on the show and now I managed to [Alex Andorra]: find the slot and thank you also to a few patrons who sent me messages [Alex Andorra]: to tell me that they would really like to hear about you on the show. So thanks
[Alex Andorra]: a lot folks for being so proactive and giving me. ideas for the show. So Charles, [Alex Andorra]: let's dive in and [Charles Margossian]: But [Alex Andorra]: as [Charles Margossian]: maybe [Alex Andorra]: usual... [Charles Margossian]: I can bring up a very quick anecdote, which [Alex Andorra]: Oh sure. [Charles Margossian]: is I think two, three weeks ago, your show came up. And I told my
[Charles Margossian]: colleagues, I would feel like I have made it in the Bayesian. statistics [Charles Margossian]: world if I get an invitation to speak on learning Bayesian statistics. So [Charles Margossian]: I'm thrilled to be here and I think sort of the mysterious patrons [Charles Margossian]: who have incited this meeting. [Alex Andorra]: Yes, I'm sure they will recognize themselves. So, yeah, let's start with your
[Alex Andorra]: origin story, actually. That's also something I found super interesting. How [Alex Andorra]: did you come to the world of statistics and pharmacometrics and epidemiology? [Charles Margossian]: Mm-hmm. Right. So I became interested in statistics and data as an undergrad. [Charles Margossian]: Actually, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: I was studying physics. I was already in the US. I was at Yale and [Charles Margossian]: I was working with an astronomy lab [Alex Andorra]: Mm-hmm. [Charles Margossian]: on Exoplanet. We had a lot of data. [Alex Andorra]: Mm-hmm. [Charles Margossian]: So that generally got me interested in data science. But I wasn't [Charles Margossian]: introduced to Bayesian methods. And I always was a bit uncomfortable
[Charles Margossian]: with how we handling uncertainty, what might we do about it. And I was fortunate [Charles Margossian]: after I graduated to actually get a job at a, at a biotech company where, [Charles Margossian]: so the company was met from research group. They were [Alex Andorra]: Mm-hmm. [Charles Margossian]: based in Connecticut and the supervisor I got there, William Gillespie. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: is one of the pioneers of Bayesian methods in pharmacometrics. And [Charles Margossian]: so he introduced me to Bayesian statistics and it made a lot of sense. [Charles Margossian]: And I realized, oh, this is what I have been looking for when I was [Charles Margossian]: doing astronomy. Right. And I started, you know, piecing this together.
[Charles Margossian]: And to be fair, I hadn't done a lot of statistics before. I don't have [Charles Margossian]: the experience of struggling with classical statistics for a decade before [Charles Margossian]: being rescued by Bayesian statistics. I encountered Bayesian very early [Charles Margossian]: on. But even then, the way we counterfeit uncertainty, the fact that [Charles Margossian]: everything reduced to essentially one equation was extremely compelling
[Charles Margossian]: to me. And Bill Gillespie, hired me [Alex Andorra]: Mm-hmm. [Charles Margossian]: at Matrim Research Group to work on Stan. So this was back in 2015. [Charles Margossian]: And at the time, Stan was fairly new, very exciting, very promising. [Charles Margossian]: And you had these specialized software in pharmacometrics, but they were [Charles Margossian]: not open source. They were not really supporting Bayesian statistics,
[Charles Margossian]: or at least that was not their priority. And they didn't give you flexibility [Charles Margossian]: that a lot of my colleagues were looking for. On the other hand, you had [Charles Margossian]: Stan, which was open source, which had great algorithms to do vision modeling, [Charles Margossian]: but which lacked a lot of the features required to do pharmacometrics [Charles Margossian]: modeling, right? One example was support for differential equation solvers.
[Charles Margossian]: And all these models are based [Alex Andorra]: Yeah. [Charles Margossian]: on ODEs. And at the time, Stan had limited support for ODEs. and more [Charles Margossian]: generally implicit functions. [Charles Margossian]: And then there were some more specialized things like handling, you
[Charles Margossian]: know, the event schedule of clinical trials. And the project ended up [Charles Margossian]: being, we're going to write some general features, we're going to [Charles Margossian]: contribute them to Stan, and then we're going to write a specialized [Charles Margossian]: extension called Torsten. And that's going to have, you know, more bespoke [Charles Margossian]: features targeted at a pharmacometrics audience. And so... That was
[Charles Margossian]: my exposure to Stan. I can tell you about my first poll request, which [Charles Margossian]: Bob Carpenter and Daniel Lee reviewed [Alex Andorra]: Mm-hmm. [Charles Margossian]: extensively. They were very patient with my C++. And [Alex Andorra]: Hehehe [Charles Margossian]: my plan, my loose plan had been, well, I'm gonna work for one or two [Charles Margossian]: years. I'm gonna learn programming and statistics because I had gained
[Charles Margossian]: an appreciation for it in astronomy. And then I'm going to do a PhD
[Charles Margossian]: in physics. But Andrew Gellman, who I had met in an official capacity [Charles Margossian]: like through my work, encouraged me to apply to the statistics program [Charles Margossian]: at Columbia University, saying, you know, if you do statistics, you'll [Charles Margossian]: still be able to work on the natural sciences and the physics that [Charles Margossian]: you're interested in, but you [Alex Andorra]: Yeah.
[Charles Margossian]: know, you'll have, maybe you'll be able to make a more unique contribution [Charles Margossian]: as a statistician. So I took his word for it. And yeah, and once [Charles Margossian]: I had the offer from Columbia, I mean, it was a difficult decision. Um, but [Charles Margossian]: I decided, okay, I'm going to do a PhD in statistics. And that was a big [Charles Margossian]: change, the field. Um, but I did pursue that. I was able to continue
[Charles Margossian]: working on Stan and on Stan adjacent, uh, projects. And just a year [Charles Margossian]: ago, I completed the PhD. So I would say that's the origin story. Um, [Charles Margossian]: that eventually led me to the Flatiron Institute. So that's where [Charles Margossian]: I'm currently, it's in New York. We're a non-profit. We focus on applying [Charles Margossian]: computational methods to the basic sciences. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: So big emphasis on collaboration. We have people who do astronomy, who do quantum [Charles Margossian]: physics, do biology. And so that really resonates with why I wanted [Charles Margossian]: to do statistics in the first place, which I wanna solve problems in [Charles Margossian]: the sciences.
[Alex Andorra]: Yeah, I can really relate to that. That was also kind of what happened to [Alex Andorra]: me too, even though the first field was political science and not astronomy. [Charles Margossian]: Okay. [Alex Andorra]: But yeah, [Charles Margossian]: Yeah. [Alex Andorra]: definitely in the end, applying the methods became more interesting than the [Alex Andorra]: field in itself. So continued [Charles Margossian]: Right, [Alex Andorra]: on that path.
[Charles Margossian]: right, and I think that, you know, the attitude I had, so I had offers [Charles Margossian]: to do a PhD in physics, a PhD in statistics, and a bunch of different [Charles Margossian]: fields. And at the time, my attitude was, look, wherever I go, I'm gonna [Charles Margossian]: have to do, you know, analyze some data, understand where that data
[Charles Margossian]: comes from. I'm gonna have to, you know, have the tools to do the [Charles Margossian]: analysis or do the statistics. [Alex Andorra]: Mm-hmm. [Charles Margossian]: So the idea is, I think our loyalty is not to a discipline or to a [Charles Margossian]: field, it's really to a problem. [Alex Andorra]: Mm-hmm. [Charles Margossian]: Whatever [Alex Andorra]: Yeah. [Charles Margossian]: problem we work on, there's always a lot that we need to learn. We're never
[Charles Margossian]: experts in a new problem, in a new research problem. And so kind of like, [Charles Margossian]: try not to take the field that I mean, whether it's for the PhD or even, [Charles Margossian]: right now I'm in computational mathematics, right? You know, whatever that [Charles Margossian]: means. that I care more about the problems I work on than the department [Charles Margossian]: I'm affiliated with.
[Alex Andorra]: Yeah, yeah, completely, completely really do that for sure. I like the fact of [Alex Andorra]: using the method to solve an interesting problem, whether it is about astronomy, [Alex Andorra]: political science, biology, [Charles Margossian]: Mm-hmm. [Alex Andorra]: epidemiology, it's like, and in the end, I find that even more interesting [Alex Andorra]: because you get to work on a variety of topics that you wouldn't have otherwise.
[Alex Andorra]: And also you [Charles Margossian]: Right. [Alex Andorra]: learn so much. So that's really [Charles Margossian]: Yeah, [Alex Andorra]: the cool thing. [Charles Margossian]: yeah. [Alex Andorra]: Yeah. Thanks a lot for this introduction. And so right now, well, these [Alex Andorra]: days you're working at the Flatiron Institute, as you were saying, and it seems [Alex Andorra]: to be a very diverse organization. We had Bob Carpenter actually [Charles Margossian]: Mm-hmm.
[Alex Andorra]: recently on the show, so I'm going to link to his episode in the show notes. [Alex Andorra]: But you, Charles, what are the topics you are... particularly interested in [Alex Andorra]: these days at the flat iron. [Charles Margossian]: Yeah, I think that I have a little bit two pools right now. There are [Charles Margossian]: a bit on the methodology side [Alex Andorra]: Mm-hmm. [Charles Margossian]: of things. And one of them is variational inference, which is a form
[Charles Margossian]: of, it's called approximate Bayesian method, is what it's called. And really [Charles Margossian]: what it's... What I'm trying to understand is when should we use [Charles Margossian]: an approximate method like variational inference, because it's incredibly [Charles Margossian]: popular. It's used in a lot of fields of machine learning. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And it's used a lot of times in artificial intelligence. And yet you [Charles Margossian]: have so many smart, brilliant people who are completely distrustful of variation [Charles Margossian]: and inference. And it's also not difficult to construct an example [Charles Margossian]: where it really doesn't do the job, where it really fails in a spectacular [Charles Margossian]: way. And I think that ultimately it depends on the problem you apply
[Charles Margossian]: it to. And what we need to do is understand. you know, when can we [Charles Margossian]: get away with the approximations that variational inference proposes to do? [Charles Margossian]: Why does it sometimes work really well? Or why do we sometimes really [Charles Margossian]: get punished for using variational inference? And I can give you a [Charles Margossian]: very simple example of that, [Alex Andorra]: Yeah.
[Charles Margossian]: which was, so this was a recent work with. with Lauren Saul, who is [Charles Margossian]: also at the Flatiron Institute. He kind of has machine learning there. And [Charles Margossian]: there's a statement that goes around, which is to say, well, variation [Charles Margossian]: inference will give you good estimates of the expectation value for
[Charles Margossian]: a target distribution. So that's great. We often care about expectation [Charles Margossian]: values, whether [Alex Andorra]: Mm-hmm. [Charles Margossian]: that's in statistical physics or in Bayesian statistics. [Alex Andorra]: Mm-hmm. [Charles Margossian]: But on the other hand, it will underestimate uncertainty.
[Alex Andorra]: Mm-hmm. Okay. [Charles Margossian]: Okay, well, what does it mean to underestimate uncertainty? How do [Charles Margossian]: we, you know, how do we make sense of a statement like that, which [Charles Margossian]: has appeared time and time again, over the last two decades. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And really we realized that there were two measures of uncertainty [Charles Margossian]: that seem to come up again and again, one is the marginal variances and [Charles Margossian]: the other one is the entropy [Alex Andorra]: Mm-hmm. [Charles Margossian]: and so people like the entropy because it's a multivariate notion. [Charles Margossian]: of uncertainty and it's a generalization of the marginal variance.
[Charles Margossian]: So when people say variation on inference underestimates uncertainty, they [Charles Margossian]: usually mean you're underestimating the marginal variances and you're underestimating [Charles Margossian]: the entropy. [Alex Andorra]: Okay. [Charles Margossian]: And so what we ended up doing is demonstrating this on the Gaussian [Charles Margossian]: case, right? You have a Gaussian target, you're approximating it
[Charles Margossian]: with a Gaussian with a diagonal covariance matrix. So that's called [Charles Margossian]: the factorized approximation or the mean field approximation. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And indeed you underestimate the marginal variance and you underestimate
[Charles Margossian]: the entropy. Here's where it gets interesting, which is as you start [Charles Margossian]: going to higher dimensions, And for some reason, I don't think people, [Charles Margossian]: people have only looked at, you know, really the two dimensional case,
[Charles Margossian]: because that's the figure that fits on that page. Right. But if you start [Charles Margossian]: going to higher dimensions and you take limits where the dimension goes [Charles Margossian]: to infinity, you can construct examples where you actually get very, very [Charles Margossian]: accurate estimates of the entropy, but you are underestimating the [Charles Margossian]: marginal variances in every dimension in an arbitrarily bad man.
[Charles Margossian]: And so the two notions of uncertainty are not at all equivalent and [Charles Margossian]: not at all interchangeable. And what ends up happening is you look at fields [Charles Margossian]: where variation inference is applied. So [Alex Andorra]: Mm-hmm. [Charles Margossian]: for example, in statistical physics, where, where people want to estimate [Charles Margossian]: the entropy of an easing model, for example, and that's [Alex Andorra]: Mm-hmm.
[Charles Margossian]: where this factor is this mean field approximation comes [Alex Andorra]: Okay. [Charles Margossian]: from. Well, here it works just fine. You actually get good estimates [Charles Margossian]: of the entropy, right? In certain limits. In machine learning, where [Charles Margossian]: you're trying to maximize marginal likelihoods, rather than default Bayesians, [Charles Margossian]: actually, you know, you get good estimates of those marginal likelihoods.
[Charles Margossian]: At least that's our working conjectures. But in [Alex Andorra]: Okay. [Charles Margossian]: Bayesian statistics, where we have interpretable quantities, and we know [Charles Margossian]: those marginal variances mean something for those parameters that have [Charles Margossian]: a meaning. Well, here, variation inference might really not or at
[Charles Margossian]: least not vanilla implementations in it. And that's an example of how, by studying [Charles Margossian]: an example, we can start understanding why is it that some people are [Charles Margossian]: so enthusiastic about variation inference and other people are so [Charles Margossian]: distrustful of it. It really depends on what is the measure of uncertainty [Charles Margossian]: that you care about, and that in turn is informed by the problem you
[Charles Margossian]: want to solve. So I bring this up as an archetype of the work that we're [Charles Margossian]: trying to do. [Alex Andorra]: Mm-hmm. [Charles Margossian]: We wanna understand this method one in one. [Alex Andorra]: Mmm. Yeah, that's really interesting. [Charles Margossian]: First big topic. [Alex Andorra]: Yeah. [Charles Margossian]: The other big topic is I still do a lot of MCMC. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: I care a lot about MCMC because as we'll see in pharmacometrics, I [Charles Margossian]: really think that is our best tool to solve many problems. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And here I'm trying to understand some more fundamental questions, [Charles Margossian]: right? So people often say, well, you know, yeah, MCMC is great but [Charles Margossian]: it's too computationally expensive. That's why we'll [Alex Andorra]: Yeah.
[Charles Margossian]: use approximate methods. And one thing I will argue is actually it's [Charles Margossian]: computationally expensive, but we don't really have a very good sense [Charles Margossian]: of how much computation we should throw at MCMC. Because ultimately [Charles Margossian]: we have three fundamental tuning parameters. One is the number of chains [Charles Margossian]: that we use. The other one is how long is the warmup or the burning
[Charles Margossian]: phase? And then the third one is how long is the sampling phase? And [Charles Margossian]: actually, [Charles Margossian]: it's not clear what is the optimal computation that you should throw [Charles Margossian]: at an MCMC problem. I think people rely on heuristics. More often, [Charles Margossian]: they rely on conventions. These are the defaults in Stan, in PMC, in
[Charles Margossian]: TensorFlow probability. But actually, you know, we need to think about [Charles Margossian]: have we used a warm up phase that's too long or too short or [Alex Andorra]: Mm-hmm. [Charles Margossian]: sampling phase that's too long or too short, right? [Alex Andorra]: Yeah. [Charles Margossian]: Or how many chains is it useful to do? Especially now we have GPUs, [Charles Margossian]: right? In theory, I could run MCMC with 10,000 chains. And people do
[Charles Margossian]: that now, right? I mean, some people do that. It selects you do that [Charles Margossian]: now, right? But what are the implications of doing that, right? I'll [Charles Margossian]: tell you an implication. If I have 10,000 chains, that I'm running on [Charles Margossian]: a GPU. then I think the sampling phase can be one iteration per chain. [Charles Margossian]: We can discuss this and we can argue for or against it, but it changes
[Charles Margossian]: a little bit our perspective of how much computation we throw at it. Another [Charles Margossian]: perspective I like is if you have a strict computational budget, [Charles Margossian]: and then you say, well, I'm not going to use MCMC, it's too expensive. [Charles Margossian]: Let me run variation inference. I know it's biased. I know it's approximate, [Charles Margossian]: but at least it, it finishes running in, you know, within 10 minutes and
[Charles Margossian]: like, well, run 10 minutes of MCMC. It'll be biased. It'll be approximate [Charles Margossian]: and it will finish running in 10 minutes. And then ask yourself, well, [Charles Margossian]: how good is this estimate? thinking a little bit more carefully about, [Charles Margossian]: you know, how much computation do we really need for MCMC and for the different [Charles Margossian]: problems that we might be trying to solve. [Alex Andorra]: Mm-hmm.
[Alex Andorra]: Yeah, that's really interesting. Thanks a lot for that very clear presentation. [Alex Andorra]: I really love it. It's actually continue on that path because I wanted to ask [Alex Andorra]: you about that a bit later in the show anyways. Yeah, several things that [Alex Andorra]: bumped into my mind. First thing is MCMC is also an approximation method. So... [Alex Andorra]: Why [Charles Margossian]: Yes.
[Alex Andorra]: do we say, why do we, and I know we usually do that in the field, we define [Alex Andorra]: variational inference as an approximation method, which kind of underlies, [Alex Andorra]: assume that MCMC is not, but it is. So can you maybe draw the distinction? [Alex Andorra]: What makes the difference between the two methods? And why do we call them approximation [Alex Andorra]: for variation?
[Charles Margossian]: Yeah, absolutely. So I think that. people think like a lot of statisticians [Charles Margossian]: asymptotically. So you know that asymptotically in what sense when [Charles Margossian]: you run a single chain for an infinite number of iterations, MCMC [Charles Margossian]: is not only gonna generate samples from stationary distribution, which [Charles Margossian]: oftentimes is the posterior distribution, but also multicolor estimators
[Charles Margossian]: with an arbitrary precision. Your multicolor estimator will converge true [Charles Margossian]: expectation value or the true variance or whatever it is you're trying [Charles Margossian]: to estimate. Whereas with variational inference, here we have to be a [Charles Margossian]: little bit careful because what does it mean for the asymptotic of variation [Charles Margossian]: inference? So you might say, okay, I'm gonna run the optimization for
[Charles Margossian]: an infinite number of iterations. So let's assume that the optimizer [Charles Margossian]: does converge. And actually until recently, this was not really shown. [Charles Margossian]: There's a recent pre-print that I know Robert Gower and Justin Dumkey [Charles Margossian]: have been working on where they actually showed that, yes, under certain [Charles Margossian]: conditions, stochastic optimization will converge for variation inference. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: But then even if it doesn't converge, you say, well, let me think about [Charles Margossian]: the approximation that minimizes my objective function. So oftentimes [Charles Margossian]: the Colbert library divergence. That's what I get asymptotically. Well, that [Charles Margossian]: will still not be in general, my target distribution. [Alex Andorra]: Mm-hmm. [Charles Margossian]: Even asymptotically, I'm still approximate. I think that's why people
[Charles Margossian]: draw this distinction between MCMC and variational inference. It's [Charles Margossian]: really in the asymptotic sense that MCMC is an exact method, whereas [Charles Margossian]: VI remains approximate. Even if you've thrown an infinite amount of [Charles Margossian]: computation, you're probably not. Right? Now, in practice, we're not asymptotic, [Charles Margossian]: right? We work with finite computation. And so I think it's very important
[Charles Margossian]: to recognize that, yes, MCMC is also an approximate method. It's not [Charles Margossian]: unbiased because you don't initialize from the stationary distribution. [Charles Margossian]: You do not reach the stationary distribution, right? So when I hear statements [Charles Margossian]: like, well, first we wait for MCMC to converge to the stationary distribution. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: I think people have the right intuition, right? I don't think it's [Charles Margossian]: a misleading statement, but nonetheless, it's an incorrect statement. [Charles Margossian]: What we do is we wait for MCMC to get close enough to their stationary [Charles Margossian]: distribution. And the reason why we care about being close enough [Charles Margossian]: to the stationary distribution is because we want the bias to be small, right?
[Charles Margossian]: And so [Alex Andorra]: Thank you. [Charles Margossian]: when we think about, you know, convergence diagnostic. the way I think [Charles Margossian]: we really should start thinking about a quantity like the r hat statistics, [Charles Margossian]: for example. And r hat is interesting. r hat has been around for three [Charles Margossian]: decades and frankly, there's still debates about what does r hat measure.
[Charles Margossian]: I mean, this is very existential, right? We have an estimator, but it's not clear [Charles Margossian]: what the estimate is. And my perspective, my most recent perspective [Charles Margossian]: is that what really matters, the reason I care about convergence is because [Charles Margossian]: I want the bias of my Monte Carlo estimator to be sufficiently small. [Charles Margossian]: It's not going to be zero, but it has to be sufficiently small. And so
[Charles Margossian]: can R hat tell me something about how small my bias is? And here's the [Charles Margossian]: paradox, which is that R hat the way you compute it, it's a ratio [Charles Margossian]: of two standard deviations. Right. So you know that when you measure variance, [Charles Margossian]: it doesn't tell you something about bias. Right. And yet, you know, we say [Charles Margossian]: our hat tells you if your warmup phase is long enough.
[Alex Andorra]: Yep. [Charles Margossian]: Not everyone agrees that that's what it tells you, but you know, that's [Charles Margossian]: my perspective and, um, and I think it's a reasonable perspective, [Charles Margossian]: right? But the, the point of the warmup phase, you know, at a, the primary [Charles Margossian]: point, not the only point is for the bias to go down. Right? So we're [Charles Margossian]: faced with this paradox, this very fundamental question. Can R hat actually
[Charles Margossian]: give us any useful information? And there was a recent paper that argued that, [Charles Margossian]: well, since R hat is really just looking at the variance, it's a one-to-one [Charles Margossian]: map. So this was some really nice work by, [Charles Margossian]: I know that Kida Vats is one of the co-authors on the paper, and [Charles Margossian]: there's another co-author. They call it revisiting the Gellman-Rubin statistic. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And they argue, well, R hat is just a reframing of variance and of [Charles Margossian]: effective sample size. And to me, effective sample size only matters [Charles Margossian]: when you're looking at the sampling phase, because it tells you, is
[Charles Margossian]: your variance low enough? And so we had to think a little bit hard about [Charles Margossian]: this because it wasn't completely satisfactory because either it means [Charles Margossian]: that r hat is not a useful convergence diagnostic in some sense, or actually [Charles Margossian]: there's more going on. And what we realize is you look at the variance [Charles Margossian]: that's being measured by r hat. So really what r hat ends up measuring
[Charles Margossian]: is you're running a bunch of chains. It could be four chains, it could [Charles Margossian]: be more. Each chain generates one Monte Carlo estimator. And then you average [Charles Margossian]: the per chain Monte Carlo estimator. So now you look at the variance [Charles Margossian]: of a single chain Monte Carlo estimator, that variance you can actually
[Charles Margossian]: decompose it into a non-stationary variance and a persistent variance. And what [Charles Margossian]: we realize, and you have to be careful, you have to do that analysis [Charles Margossian]: for non-stationary Markov chains. Otherwise [Alex Andorra]: Mm-hmm. [Charles Margossian]: you completely miss the non-stationary variance. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And the non-stationary variance is, you know, actually, it's a measure [Charles Margossian]: of how well you've forgotten your initial point. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And you can show in some cases that it decays at the same rate as the [Charles Margossian]: squared bias. So you're not directly measuring the squared bias, but [Charles Margossian]: because NCMC is what it is, the non-stationary variance gives you a
[Charles Margossian]: proxy clock for the bias. And so our argument is that, well, what's interesting [Charles Margossian]: about our hat is not that it measures, you know, the persistent variance, [Charles Margossian]: which you can then relate to the effective sample size, but that it measures [Charles Margossian]: the non-stationary variance, right? And this then led us to, you know, we're [Charles Margossian]: coming up with revisions of our hat, which more directly measure the
[Charles Margossian]: non-stationary variance rather than the total variance. So that we [Charles Margossian]: actually get an estimator that unambiguously tells you something about [Charles Margossian]: the length of the warm up phase. And then you ask the question of [Charles Margossian]: the length of the sampling phase in a second and separate state.
[Charles Margossian]: So, sorry, it's conceptual explanation. So that this is a paper that we [Charles Margossian]: have a preprint out called the nested r hat. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And this is joint work with, so Andrew Gellman, Akiva Terry, Matt [Charles Margossian]: Hoffman, so some of the usual suspects. We also have Pavel Junssov [Charles Margossian]: and Yonel Viyouduran. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: So we've all worked on this together. And [Charles Margossian]: A cool anecdote here is that what we were really interested in is those [Charles Margossian]: regimes where we're running hundreds of chains in parallel, or thousands
[Charles Margossian]: of chains in parallel, like GPU-friendly MCMC. And what this made [Charles Margossian]: us do is instead of thinking asymptotics in the limit where we have an infinitely [Charles Margossian]: long chains, which is how asymptotics for MCMC have worked for the [Charles Margossian]: past, you know, five or six decades, right? because that's what people [Charles Margossian]: did. They ran long chains, right?
[Alex Andorra]: Yep. [Charles Margossian]: And so even though asymptotics are property of infinity, we want to somehow [Charles Margossian]: get close to asymptotic regime, right? That's where we care about this [Charles Margossian]: asymptotic analysis. And here we thought, well, let's take asymptotics [Charles Margossian]: in another direction. Let's say we have a finite number of chains, [Charles Margossian]: but what happens when we have an infinite number of chains? And then
[Charles Margossian]: suddenly you can do asymptotic analysis. on non-stationary Markov chains. [Charles Margossian]: Right? So the problem is if I take an asymptotic in the length of [Charles Margossian]: the chain, well, I've made my chain stationary. And then there are [Charles Margossian]: only so many properties that I can study. If I take asymptotics in [Charles Margossian]: the other direction, which is the number of chains, and [Alex Andorra]: Mm-hmm.
[Charles Margossian]: suddenly I can start making statements about non-stationary Markov chains, [Charles Margossian]: I can elicit terms such as, you know, non-stationary variance. And [Charles Margossian]: this was a cool example where the hardware you know, trying to work [Charles Margossian]: with, you know, GPUs running a lot of chains. Actually, I think is really [Charles Margossian]: changing our theoretical approach and our conceptual understanding of
[Charles Margossian]: MCMC. And I think we're going to get a lot of breakthroughs from this [Charles Margossian]: kind of perspective. [Alex Andorra]: Hmm. Yeah, super interesting. I put the two papers you mentioned in the show [Alex Andorra]: notes. And so that makes me think, basically, what would you say right [Alex Andorra]: now practically for people, when would variational inference usually be [Alex Andorra]: most helpful, especially in comparison to MCMC?
[Charles Margossian]: Yeah, so great. So I think there are two things to unpack here. One is [Charles Margossian]: when can variational inference still give you accurate answers? [Alex Andorra]: Mm-hmm. [Charles Margossian]: And two, when do you actually not need an accurate answer? [Charles Margossian]: Right? So, in the first question, it really depends on the family of [Charles Margossian]: variation and inference that you're willing to use, the family of approximation.
[Charles Margossian]: And also it turns out the objective function. So I mentioned earlier [Charles Margossian]: that when I have this mean field approximation, and I'll just remind [Charles Margossian]: what mean field means is, I'm assuming that all my latent variables [Charles Margossian]: are independent. Now we don't believe that that's true in practice,
[Charles Margossian]: but that makes the computation much cheaper. And when you have, millions of [Charles Margossian]: observations, millions of parameters, you need an algorithm where [Charles Margossian]: the cost scales linearly with the number of observations. And if I don't [Charles Margossian]: have this mean field assumption, I get things that can scale quadratically or [Charles Margossian]: cubically.
[Charles Margossian]: And so when I do this approximation, well, I'm gonna get some things [Charles Margossian]: wrong, but maybe I'll get the things that I care about right, which [Alex Andorra]: Mm-hmm. [Charles Margossian]: could be the first moment or it could be the entropy, right? if you change [Charles Margossian]: the objective function, which actually the KL divergence, you can kind [Charles Margossian]: of reverse it. You can show that you get arbitrarily poor estimates
[Charles Margossian]: of the entropy, but good estimates of the marginal variances. Right, [Charles Margossian]: so it turns out that the choice of objective function that you use to [Charles Margossian]: measure, you know, the disagreement between your approximation and [Charles Margossian]: your target matters. So there's a real question of, you know, what [Charles Margossian]: are the quantities you care about? because we don't care about the
[Charles Margossian]: whole distribution. We care about some summaries of the posterior [Charles Margossian]: distribution. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And then that informs when you use them. So I think that's the first [Charles Margossian]: question. And [Alex Andorra]: Mm-hmm. [Charles Margossian]: I wanna emphasize that there's a lot of great work by Tamara Broderick [Charles Margossian]: and her group, Justin Dumke and his colleagues about trying to get
[Charles Margossian]: more accurate variation inference. And sometimes that works really [Charles Margossian]: well. Which is a bit of a... I can't really give you a more precise [Charles Margossian]: prescription than that, because we have to go into the details of the [Charles Margossian]: different problems. [Charles Margossian]: But then the second point I made is sometimes you don't need a really
[Charles Margossian]: accurate answer. So what are examples of that? So in machine learning, [Charles Margossian]: let's say you're just training a model, and maybe you're more interested [Charles Margossian]: in either a more complicated model or using more data than improving the [Charles Margossian]: accuracy of the inference. And then you [Alex Andorra]: Mm-hmm.
[Charles Margossian]: look at something like performing a task, a classification prediction [Charles Margossian]: and so forth under a computational budget, [Alex Andorra]: Mm-hmm. [Charles Margossian]: right? It turns out that it's better to have a sophisticated model with [Charles Margossian]: very approximate inference than a less sophisticated model with more accurate [Charles Margossian]: inference. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: Now, it's hard to know, and my big problem with variation inference [Charles Margossian]: is it's hard to know which regime you're gonna be in. and to actually [Charles Margossian]: justify it. And even once you've run variation inference, you don't have [Charles Margossian]: that many diagnostics that can tell you, well, you know, you're doing [Charles Margossian]: okay, but you would do much better if you improve the inference. So
[Charles Margossian]: I think that it's an open question. But then the other example that I [Charles Margossian]: wanna bring up where we don't always need accurate inference is when [Charles Margossian]: we're developing models. So this takes back into this idea of the Bayesian [Charles Margossian]: workflow. Uh, that now has been championed. Uh, you know, so Andrew [Charles Margossian]: Gellman and colleagues wrote a lot about it. Michael Bettencourt wrote
[Charles Margossian]: a lot about it. David Bly wrote about a lot about it. You know, arguably [Charles Margossian]: George Box, right. Wrote a lot about it. And, um, and you know, if you, [Charles Margossian]: if you ever work on an applied project, you sit down, you come up [Charles Margossian]: with the first iteration of your model and, uh, arguably there are a [Charles Margossian]: lot of problems with that model. And you don't need super accurate inference
[Charles Margossian]: to diagnose the problems with this model. You do a quick fit, a quick [Charles Margossian]: approximation, and usually something obvious is gonna pop up. [Alex Andorra]: Mm-hmm. [Charles Margossian]: Then you revise the model. Then you do again, quick inference, okay? And [Charles Margossian]: you keep refining and refining. And only once you have actually a [Charles Margossian]: polished version of the model, do I think that it makes sense to, you
[Charles Margossian]: know, get out the big gun and the very accurate inference. And I [Charles Margossian]: think that, you know, if we talk about pharmacometrics and epidemiology, [Charles Margossian]: I'll give you some very precise examples of those situations. [Alex Andorra]: Yeah, for sure. We'll do that in a few minutes. But yeah, thanks a lot for
[Alex Andorra]: that tour. That makes a lot of sense, actually, all of that. I will use [Charles Margossian]: I realize my [Alex Andorra]: those. [Charles Margossian]: answers are very long, but your questions are very deep. [Alex Andorra]: Yeah, yeah, yeah. No, no, that's really good. I mean, that's why the podcast [Alex Andorra]: is for also, you know, going deep into the explanation that you cannot really
[Alex Andorra]: do in a paper, right? In the papers, usually it's a more technical audience [Alex Andorra]: first. [Charles Margossian]: Mm-hmm. [Alex Andorra]: And so like, you're not going to really explain the difference between volitional [Alex Andorra]: inference and MCMC in a paper, because if the audience already is supposed
[Alex Andorra]: to know that, well, why would you do that in a paper? So. [Charles Margossian]: Yeah, and frankly the paper, so what ends up being the discussion [Charles Margossian]: that ends up being in the paper is usually the discussion you've [Charles Margossian]: had with the reviewer, which is a subset, which is really a subset [Charles Margossian]: of everything you would like to discuss. So, yeah, it's nice to have [Charles Margossian]: a more free format [Alex Andorra]: Yeah.
[Charles Margossian]: to really think about those questions. And what I will say is that [Charles Margossian]: actually the one format that was... that I really like where all these [Charles Margossian]: questions come up is, you know, is teaching, is like workshops. [Alex Andorra]: Yeah. [Charles Margossian]: And so when you do a workshop on Bayesian modeling, you do a workshop on [Charles Margossian]: Stan, on PMC or something like that, all the questions that I've brought
[Charles Margossian]: up, you know, how long should the sampling phase be? How long should [Charles Margossian]: the warmup phase be? Should I use this or that algorithm? Those are questions [Charles Margossian]: that a person sitting at a workshop Intro to Stan, intro to your [Charles Margossian]: favorite language would ask. And so these end up being forums where [Charles Margossian]: we do discuss these fundamental questions. Because even though they're deep,
[Charles Margossian]: they're elementary nonetheless. And I mean that in the most positive sense [Charles Margossian]: of the word elementary possible. [Alex Andorra]: Yeah, now completely you have completely outmasked the way I pick questions [Alex Andorra]: for the show. I'm just doing the same as you did. Yeah, I teach a lot of
[Alex Andorra]: workshops too. And these questions are basically the questions that a lot of beginners [Alex Andorra]: ask, where it's like they often have used variational inference because for [Alex Andorra]: some reason, especially when they come from the classical machine learning [Alex Andorra]: world, then using variational inference makes more sense to them because [Alex Andorra]: it's closer from home, basically [Charles Margossian]: Mm-hmm.
[Alex Andorra]: closer to home. And so yeah, like in the end the natural question is when [Alex Andorra]: should I use variational inference? Why should I use it? Why should I even bother [Alex Andorra]: with MCMC? Things like that. So I have a lot of the questions I've been [Charles Margossian]: Yeah. [Alex Andorra]: asking. [Charles Margossian]: And these are totally open questions. I mean, correct me. Maybe you
[Charles Margossian]: have an answer that I missed. You know, we have heuristics and we [Charles Margossian]: have good pointers and we have good case studies. But so much of this [Charles Margossian]: remains unanswered. I think [Alex Andorra]: Yeah, yeah, [Charles Margossian]: not [Alex Andorra]: yeah. [Charles Margossian]: unapproachable. Let me be clear. Totally approachable. And you can, [Charles Margossian]: you know, it's not crippling. We can totally still do a Bayesian modeling.
[Charles Margossian]: But there are open questions that linger. [Alex Andorra]: Yeah. And then you have to strike the right balance between when you're answering [Alex Andorra]: such a question from a beginner, where you want to be like intellectually honest [Alex Andorra]: and saying, [Charles Margossian]: Mm-hmm.
[Alex Andorra]: yeah, like these are still open questions, but at the same time, you don't [Alex Andorra]: want them to walk away with the feeling that, well, this is completely... [Alex Andorra]: like is completely undefined and I cannot really use these methods because [Alex Andorra]: no, um, there is no clear rule of about what I would use in when [Charles Margossian]: Mm-hmm.
[Alex Andorra]: and why. So, uh, like that's always a, an important balance to strike and [Alex Andorra]: not, not always an easy one. [Charles Margossian]: Yeah, yeah, I absolutely relate to that. [Alex Andorra]: Yeah, so actually before we dig into a bit more of the applications, I'm [Alex Andorra]: curious basically on the work you're doing right now because I really love [Alex Andorra]: the fact that you're both working on MCMC and on approximate patient inference.
[Alex Andorra]: So I'm wondering what are the frontiers? currently in that field of algorithms [Alex Andorra]: that you find particularly excited about. You already mentioned basically the [Alex Andorra]: progress of the hardware, which opens a lot of avenues. I'm curious if there [Alex Andorra]: are other things you have your eye on. [Charles Margossian]: Yeah, I think that...
[Charles Margossian]: So. Well, I think hardware is an important question. And I think it's [Charles Margossian]: a difficult question. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And I wanna talk, I think, you know, maybe I wanna say a little bit more [Charles Margossian]: because I think [Alex Andorra]: Mm-hmm. [Charles Margossian]: frontier is good. And not only that, I think it's an ambiguous frontier. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: Because I don't think, you know, to me, it's not clear. you know, [Charles Margossian]: how much we're gonna get out of hardware for MCMC, for example, [Charles Margossian]: and you know, what are gonna be the limits of that? And so what I'm [Charles Margossian]: excited is that now we have GPUs, now we have several algorithms that
[Charles Margossian]: are GPU friendly. And I'll explain a little bit what that means, [Charles Margossian]: but essentially, So at a very fundamental level, what it means is [Charles Margossian]: you can run a lot of Markov chains [Alex Andorra]: Mm-hmm. [Charles Margossian]: in a synchronized time. And so you're not waiting for the slowest chain, [Charles Margossian]: basically. That's kind of the intuition here. And people are like,
[Charles Margossian]: well, this is great. If you run a lot of chains, we can really make [Charles Margossian]: the sampling phase much shorter. Like again, so let's say your target [Charles Margossian]: effective sample size is 2000. And actually what effective sample [Charles Margossian]: size you should target. That's another interesting and very fundamental
[Charles Margossian]: question. And it turns out a question where different people have [Charles Margossian]: different opinions on [Alex Andorra]: Mm-hmm. [Charles Margossian]: within the Bayesian community. But let's say now for sake of argument, you're [Charles Margossian]: targeting an effective sample size of 2000. And I'm not saying this [Charles Margossian]: is what you should do or not do, but let's just say 2000. Then once
[Charles Margossian]: you run 2000 chains and you warm them up enough, right? But really [Charles Margossian]: you only need one good sample per chain, [Alex Andorra]: Thanks for watching! [Charles Margossian]: which means your sampling phase can be one iteration. And then all your [Charles Margossian]: questions about how long should the warmup be, it's no longer about [Charles Margossian]: adapting the kernels so that I have a low auto correlation during the
[Charles Margossian]: sampling phase. Actually the auto correlation only matters in so far [Charles Margossian]: as it reduces your bias. All right, and so [Alex Andorra]: Mm-hmm. [Charles Margossian]: I think that suddenly we've greatly simplified the question of diagnosing [Charles Margossian]: how much computation we need to throw at the algorithm. If we have [Charles Margossian]: a lot of variance, then suddenly it just becomes about bias decay.
[Charles Margossian]: And the sampling problem becomes much closer to the optimization problem. [Charles Margossian]: Right. And then I can go into questions. So then there are interesting [Charles Margossian]: things where you actually run a lot of chains. You can pool information [Charles Margossian]: between the different chains to [Alex Andorra]: Mm-hmm. [Charles Margossian]: make the warmup phase shorter. And for some [Alex Andorra]: Yeah.
[Charles Margossian]: problems, like you have these multimodal problems, like it's exponentially [Charles Margossian]: shorter, like you're never going to get a good answer with a single chain [Charles Margossian]: that doesn't use cross adaptation. Right. And so here, I want to give [Charles Margossian]: a shout out to Marie-Lou Gabrielle and her work on MCMC that uses normalizing [Charles Margossian]: flow for adaptation. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: Right. That's the thing, that's a technique where they actually, you [Charles Margossian]: know, run 10,000 workers or chains. And I remember when I was talking [Charles Margossian]: to her and colleagues, I was like, well, you know, once you've, you're [Charles Margossian]: already running 10,000 chains to jump between the modes, you actually [Charles Margossian]: don't need a long sampling phase at all, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: right? So that's one aspect of it. But then even for more ordinary problems, [Charles Margossian]: you can show that the time it takes you to reduce the bias goes down [Charles Margossian]: with the number of chains, because you are pooling information between [Charles Margossian]: the Markov chains. And this is not something that we really understand. [Alex Andorra]: Hmm. [Charles Margossian]: Um, and so, you know, where I see the frontier is that actually,
[Charles Margossian]: if I run a lot of chains, I also get more accurate diagnostics. My computations [Charles Margossian]: of our hat and its generalization become much more reliable. And I think the [Charles Margossian]: Holy grail would be something where we don't have users specify, um, the [Charles Margossian]: length of the warmup phase or the length of the sampling phase. We have [Charles Margossian]: them think about what is your target ESS? That's the number of chains
[Charles Margossian]: that you run. And then we're going to automatically stop the warmup [Charles Margossian]: phase. when we hit a certain target, right? And then suddenly, [Charles Margossian]: we're starting to do optimal computation for MCMC. And I think that to do [Charles Margossian]: optimal computation, at least in the way that I've described it, we [Charles Margossian]: need those GPUs. And at the same time, I think that there are a lot
[Charles Margossian]: of problems that are not gonna be amiable to GPUs, right? It's still, [Charles Margossian]: there's still this fundamental sequential component. which is the bias has [Charles Margossian]: to go down, the warmup needs to happen, right? At some point, adding [Charles Margossian]: more chains is not gonna help you. Whether the speed up you're gonna
[Charles Margossian]: get on this is not gonna be arbitrarily large, right? And then the benefit [Charles Margossian]: you're gonna get from various reduction by running more chains, well, [Charles Margossian]: once you've read. If your target ESS is 2000, maybe it doesn't help to [Charles Margossian]: run 10,000 chains, right? [Alex Andorra]: Yeah.
[Charles Margossian]: At least not immediately, right? So those are very clear, you know, questions [Charles Margossian]: that arise about, you know, ultimately, what are, how far are we [Charles Margossian]: going to be able to go with this ES, this, you know, running many [Charles Margossian]: chains. But what I want to emphasize is that there's a, you know, there's [Charles Margossian]: a computational game, which people think about, our algorithms are
[Charles Margossian]: faster. I also think there's a conceptual game. And there is the opportunity [Charles Margossian]: to make MCMC more black box. [Alex Andorra]: Mm-hmm. [Charles Margossian]: We're less reliant on some very fundamental tuning algorithms. [Charles Margossian]: And therefore, you know, we kind of get into this regime where the computation
[Charles Margossian]: that we're using is just the right amount. And that's really, you know, [Charles Margossian]: if I have to say, like, I have another one or two years as a postdoc, [Charles Margossian]: very optimistically, that's the problem I'd like to solve. Right? Like, [Charles Margossian]: we remove the fundamental tuning parameters of MCMC. And there are other [Charles Margossian]: approaches towards that. I'm not going to pretend that this is the only
[Charles Margossian]: angle to tackle this problem. Let me be absolutely clear. I think it's a [Charles Margossian]: very, very promising. [Alex Andorra]: Yeah, yeah, [Charles Margossian]: Yeah, [Alex Andorra]: for sure. That's fascinating. [Charles Margossian]: what I will say is since we're going to go into applications is one [Charles Margossian]: limitation [Alex Andorra]: Mm-hmm. [Charles Margossian]: with GPUs is they're very bad at solving ODEs. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And a lot of the problems that I care about, you [Alex Andorra]: Have [Charles Margossian]: know, [Alex Andorra]: a day! [Charles Margossian]: have likelihoods. And in order to evaluate those likelihoods, you [Charles Margossian]: need to solve an OD. [Alex Andorra]: Mm-hmm. [Charles Margossian]: So I don't think we're at a stage where running a thousand chains [Charles Margossian]: on a GPU is going to solve all the problems in pharmacometrics that I
[Charles Margossian]: am, you know. deeply invested in, right? That said, you know, we have clusters [Charles Margossian]: of CPUs where maybe we can run 60 to 120 chains and that can get us [Charles Margossian]: some of the way.
[Alex Andorra]: Hmm, yeah. Yeah, I mean, when you were talking about that, I was thinking [Alex Andorra]: that'd be super cool to have, you know, in Stan or Pimcey afterwards, like [Alex Andorra]: at some point, optimize the way the number of chains and samples that are [Alex Andorra]: taken instead of having them, because right now it's like, okay, we're gonna [Alex Andorra]: run as many chains as we can with the GP or CPU we have. So that's [Charles Margossian]: Mm-hmm.
[Alex Andorra]: already kind of automated. But the number of samples not really automated [Alex Andorra]: is just a rule of thumb. Where it's like, OK, we think in general, these number [Alex Andorra]: of samples work well. In PyMC, it's 1,000 per chain after warming up. But [Alex Andorra]: what would be [Charles Margossian]: Thanks [Alex Andorra]: super [Charles Margossian]: for watching!
[Alex Andorra]: cool is like, OK, PyMC or Stan, see what's there, the resources. And then it's [Alex Andorra]: like, OK, so given the complexity of the posterior that we can see right now, [Alex Andorra]: we are going [Charles Margossian]: Yeah. [Alex Andorra]: to run that many chains and that many sample per chain. That'd be super [Alex Andorra]: cool, because also that would be something that would be easier for the beginners, [Alex Andorra]: because they are [Charles Margossian]: Mm-hmm.
[Alex Andorra]: really, really sometimes very anxious about having a lot of samples, even [Alex Andorra]: though you know, no, you don't need 10,000 samples per chain for that simple [Alex Andorra]: regression, but it's hard to explain. [Charles Margossian]: Yeah, and so what ends up, also what ends up happening, very real [Charles Margossian]: experience, and I do it myself, is the simple models that probably
[Charles Margossian]: don't need that many iterations. I run them for a lot of iterations because [Charles Margossian]: that's computationally cheap to do. And then the hard models. where [Charles Margossian]: each iteration is very expensive and the posterior distribution is [Charles Margossian]: much more complicated, I actually end up, and where I would need more [Charles Margossian]: iterations, I end up running less iterations. [Alex Andorra]: Yeah.
[Charles Margossian]: Right, and I think a lot of people will sympathize with that. That's [Charles Margossian]: my experience interacting with practitioners. I'll give you another example [Charles Margossian]: of things I've seen my colleagues in epidemiology do is that when [Charles Margossian]: their model starts getting really complicated, they start using somewhat [Charles Margossian]: less over dispersed initializations. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: Even though we know that that's what we need for the convergence diagnostic [Charles Margossian]: to be reliable. And I'll tell you a little bit more about that because [Charles Margossian]: actually that's another question is like, what does it mean for initialization [Charles Margossian]: to be over dispersed? And I have some answers to that are not quite, [Charles Margossian]: you know, the regular answers. Um, but that's a huge problem. Like
[Charles Margossian]: when models get hard and again, those ODE based models. Right? You cannot [Charles Margossian]: solve those ODE's if you throw insane parameter values at your model. [Charles Margossian]: And so now people have to have to make compromises. And I think that, [Charles Margossian]: you know, especially statisticians were a little bit cavalier. We try to, we tend [Charles Margossian]: to be conservative, uh, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: because maybe in a way, you know, that's the role of the statistician [Charles Margossian]: and the theorist is to be conservative and to be, and to play it safe.
[Charles Margossian]: But when the safe and the conservative heuristics, um become impractical [Charles Margossian]: uh we need to think harder about you know okay if we don't want to be [Charles Margossian]: too conservative but we still want to be safe what do we need to do [Charles Margossian]: and that's where [Alex Andorra]: Mm-hmm.
[Charles Margossian]: these questions of you know optimal warm-up length optimal sampling [Charles Margossian]: phase optimal over dispersed initialization uh really come in into play [Charles Margossian]: because if you're too conservative in your prescriptions You might [Charles Margossian]: make your editor happy, but actually practitioners are going to have [Charles Margossian]: a really hard time following those prescriptions.
[Alex Andorra]: Yeah, [Charles Margossian]: And then they do [Alex Andorra]: yeah. [Charles Margossian]: things. I'm not saying that they do silly things instead, right? But [Charles Margossian]: they do other things that are maybe less principled. [Alex Andorra]: Mm-hmm. Yeah. Yeah, fascinating. I could spend the whole episode on these topics.
[Alex Andorra]: I really love it. Thanks a lot for diving so deep into these, Charles. But [Alex Andorra]: let's get a bit more practical here and talk about [Charles Margossian]: Uh huh. [Alex Andorra]: what you do, basically, with epidemiology and pharmacometrics. So first, [Alex Andorra]: can you define pharmacometrics for us? I personally don't know what that is, [Charles Margossian]: Yeah.
[Alex Andorra]: how that differs from epidemiology and what do patient statistics bring to epidemiology [Alex Andorra]: and pharmacometrics. [Charles Margossian]: Yeah, so pharmacometrics, I mean, the way I would think about pharmacometrics [Charles Margossian]: is pharmacometrics is to pharmacology, what econometrics is to economics. [Alex Andorra]: Mm-hmm. [Charles Margossian]: There's some people who want to emphasize that they're using quantitative [Charles Margossian]: methods.
[Alex Andorra]: Mm-hmm. [Charles Margossian]: Now, the particular field of pharmacometrics that I've worked on is called [Charles Margossian]: PKPD modeling, so pharmacokinetics and pharmacodynamics. And essentially, what [Charles Margossian]: happens is, let's say there's a new treatment that's being developed. [Charles Margossian]: So it could be a new drug compound or it could be a number of things. [Charles Margossian]: We're usually interested in two questions. One, how does the drug get
[Charles Margossian]: absorbed and how does it diffuse in the patient's body? That's called [Charles Margossian]: the pharmacokinetics. And two, once the drug diffuses in the body, [Charles Margossian]: what does it do to the body? And so that includes targeting a disease, [Charles Margossian]: but also side effects. I was worried about how toxic the treatment [Charles Margossian]: might be. And what people are trying to do with these models is based
[Charles Margossian]: on early data from clinical trials. So either on human beings or even on [Charles Margossian]: animals, they try to predict. what is going to happen when we look at a broader [Charles Margossian]: population of individuals and when we kind of like start changing the [Charles Margossian]: treatment. So you have some, you know, some medical treatments can [Charles Margossian]: get very complicated. You have questions of, you know, you know,
[Charles Margossian]: I have a certain drug compound. How much do I administer? How often [Charles Margossian]: do I administer it? Is it better to take, you know, half a dose? every [Charles Margossian]: half hour or only a single dose every hour. And so you have all these [Charles Margossian]: combinatorics of possibilities. If I really increase the dose, do I immediately [Charles Margossian]: get better effects? Or does it saturate? We often have these nonlinear
[Charles Margossian]: effects. These are called Michaelis-Menten models, where at some point, [Charles Margossian]: adding more dose erases the cost and doesn't help the patient. [Alex Andorra]: Yeah. [Charles Margossian]: Right. And one way to do it would be [Charles Margossian]: you run a ton of clinical trials and then you hope that something works [Charles Margossian]: out that's extremely expensive and time consuming and, you know, well, Maybe
[Charles Margossian]: it's safer for the humans than the animals. Let me put it this way. Or [Charles Margossian]: based on a bit of data, and then a really mechanistic model where you [Charles Margossian]: actually really bake in some of your expertise as a pharmacologist, [Charles Margossian]: as a biomedical engineer, you try to understand the underlying system [Charles Margossian]: so that when you're trying out different regimens, you can really [Alex Andorra]: Mm-hmm.
[Charles Margossian]: predict. what are gonna be the doses that are more promising? So it's [Charles Margossian]: very useful to do exploratory analysis. And then it's also useful to do, [Charles Margossian]: once a drug hits the market, you actually collect very imperfect data, you [Charles Margossian]: actually have uncertainty, but you still want to keep learning about the [Charles Margossian]: dose and the dosing regimen, right, from [Alex Andorra]: Mm-hmm.
[Charles Margossian]: data from hospitals. And sometimes the data is rare, you have rare diseases, [Charles Margossian]: rare conditions, and that kind of thing. And so that is... If you will, [Charles Margossian]: then... [Charles Margossian]: the domain of pharmacometrics. And what's extremely interesting is that within [Charles Margossian]: that domain, you have, so first of all, you have what's called mechanistic
[Charles Margossian]: models. And what I mean by that is the parameters are interpretable. [Charles Margossian]: The relationship between models [Alex Andorra]: Mm-hmm. [Charles Margossian]: is interpretable. Now, at contrast that with a neural network, for [Charles Margossian]: example, where I might get good predictions, but then if I want to [Charles Margossian]: do out of sample predictions, which is actually really what we want to do,
[Charles Margossian]: right? in pharmacometrics, right? Examples of other samples would be a [Charles Margossian]: different dosing regimen, or I've tested the drug on an adult population. [Charles Margossian]: What happens if I do it for children, right? That's the kind of pediatric [Charles Margossian]: questions. We need to bake in the mechanistic understanding, which [Charles Margossian]: doesn't exclude a role in neural networks. I think these can also
[Charles Margossian]: play a role, but I'll leave that for now. But then you have various [Charles Margossian]: degrees details in the mechanism. You have some equations that are [Charles Margossian]: very simple. So the two-compartment model with first-order absorption says the [Charles Margossian]: human body is three compartments. There's the gut, where the drug arrives when [Charles Margossian]: you already administer it. There is the central compartment where
[Charles Margossian]: the drug diffuses quickly. So usually that includes the blood. And then [Charles Margossian]: maybe there's a peripheral compartment, so tissues where the drug diffuses [Charles Margossian]: more slowly. That's obviously not a very detailed description of the human [Charles Margossian]: body. And then you have models that are more complicated. So at Matron, [Charles Margossian]: Matthew Riggs and some colleagues, they work on this bone mineral
[Charles Margossian]: density, where actually they had a lot of different parameters. And [Charles Margossian]: now instead of having a system of three differential equations, you [Charles Margossian]: have 30 differential equations. You have a ton of parameters, but you have [Charles Margossian]: a lot of information, prior information, about these parameters. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And then you have people who really throw differential equations with [Charles Margossian]: you know, hundreds of states at them and, you know, thousands of [Charles Margossian]: interpretable parameters. And frankly, I don't think we have the Bayesian [Charles Margossian]: computation to fit those models, even though in theory, they lend themselves [Charles Margossian]: extremely well to a Bayesian analysis, right? I think that realistically
[Charles Margossian]: we're somewhere in the semi-mechanistic regime. So these are models [Charles Margossian]: that have some level of sophistication, but already we pay a dire price [Charles Margossian]: for this sophistication, which is that the computation. can take hours [Charles Margossian]: or days to fit. And so there's like really this potential for, better [Charles Margossian]: Bayesian computation can really allow people to deploy better models
[Charles Margossian]: and more sophisticated. The other big aspect of pharmacometrics is [Charles Margossian]: usually we have trials with data from different patients, there's [Charles Margossian]: heterogeneity [Alex Andorra]: Mm-hmm. [Charles Margossian]: between the patients or similarity [Alex Andorra]: Yeah. [Charles Margossian]: between patients, so that lends itself very well to hierarchical modeling.
[Alex Andorra]: Yeah, [Charles Margossian]: And [Alex Andorra]: for [Charles Margossian]: we [Alex Andorra]: sure. [Charles Margossian]: know hierarchical modeling is hard, right? It tends to create these posterior [Charles Margossian]: distributions with geometries that are very frustrating. And I spent a lot [Charles Margossian]: of time worrying about hierarchical models. I've done a lot of work [Charles Margossian]: on the... nested Laplace approximations. So [Alex Andorra]: Mm-hmm.
[Charles Margossian]: that's another nice example of an approximation. It's not variation inference [Charles Margossian]: has very complimentary qualities. And what the nested Laplace approximation [Charles Margossian]: allows you to do is marginalize out the latent variables in a hierarchical [Charles Margossian]: model. [Charles Margossian]: And people often explain nested Laplace as, oh, it's great, it reduces
[Charles Margossian]: the dimension of your problem. And then we can throw a quadratic [Charles Margossian]: at the remaining parameters. We've had models where we had thousands of [Charles Margossian]: hyperparameters, those were genetic problems where we were using horseshoe [Charles Margossian]: parameters to select Gs. So even once you marginalize out, you know, [Charles Margossian]: the latent variables, you still have a high dimensional problem.
[Charles Margossian]: So we threw Hamiltonian Monte Carlo at it, but it still made a big [Charles Margossian]: difference because we simplified the geometry of the posterior distribution [Charles Margossian]: by doing this marginalizations for the hierarchical models. So I'm very excited [Charles Margossian]: about the prospect of having, you know, and that's the plus approximation [Charles Margossian]: in Stan.
[Alex Andorra]: Thanks [Charles Margossian]: We [Alex Andorra]: for [Charles Margossian]: have [Alex Andorra]: watching! [Charles Margossian]: a prototype. that works really well. We have some really cool automatic [Charles Margossian]: differentiation supporting it. [Alex Andorra]: Hmph. [Charles Margossian]: But, and you know, the problem is again, I wanna try this on OD based [Charles Margossian]: models. [Alex Andorra]: Yeah.
[Charles Margossian]: I don't get a good approximation. I don't get efficient automatic differentiation. [Charles Margossian]: I get something that's unstable. Now the simple examples where I got [Charles Margossian]: it working, it actually gave surprisingly accurate results, right? [Charles Margossian]: But this is again an example where here's this awesome. algorithm and statistical [Charles Margossian]: methods, and it just gets frustrated by the nature of the problems we
[Charles Margossian]: encounter in pharmacometrics. Even though, you know, these are hierarchical [Charles Margossian]: models and these methods are designed for hierarchical models. But again, [Charles Margossian]: if your likelihood is not a general linear model, yeah, suddenly [Charles Margossian]: those approximations become much more tricky. And that's why, that's why, [Charles Margossian]: you know, I think that We have to use MCMC for these models. [Alex Andorra]: Hmm.
[Alex Andorra]: Yeah, that's interesting. So in these cases, yeah, that makes it a bit clearer, [Alex Andorra]: I think, for people in the practical cases where you would have to do that trade-off, [Alex Andorra]: basically. How do you choose between the trade-off between the different [Alex Andorra]: methods? I think it's very important. [Charles Margossian]: And that said, I do want to say that there's some really cool approximations [Charles Margossian]: that people [Alex Andorra]: Mm-hmm.
[Charles Margossian]: do deploy in pharmacometrics. [Alex Andorra]: Mm-hmm. [Charles Margossian]: I recently read a... [Charles Margossian]: I'm an anonymous reviewer, so I'm not going to give too many details. [Alex Andorra]: Ha, sure.
[Charles Margossian]: But what I like, you know, it's, there were questions of what if we fix [Charles Margossian]: those parameters or what if, you know, we draw these parameters from [Charles Margossian]: their priors because they're removed away enough from the data, [Alex Andorra]: Mm-hmm. [Charles Margossian]: but maybe they're not influenced that much by the data, so the posterior [Charles Margossian]: stays close to the prior, but nonetheless, we need that uncertainty for
[Charles Margossian]: the interpretable quantities in the middle. Right. [Alex Andorra]: Hm-hm. [Charles Margossian]: And so people are coming up with these compromises. And of course, now [Charles Margossian]: we're again in the business of we have these awesome computational constraints. [Charles Margossian]: People come up with these approximations either of the inference or even,
[Charles Margossian]: you know, they say, well, let's use a simpler model. We know there's [Charles Margossian]: a more complicated model out there but maybe we still get all the answers [Charles Margossian]: that we need with the simpler model. Right. And so now we get again [Charles Margossian]: in this, you know, the question of understanding what are the simplifications [Charles Margossian]: that we get away with? What are the ones where we pay a heavy price?
[Charles Margossian]: Can we actually quantify the price we're paying? Can we diagnose when [Charles Margossian]: the simplification is too dire or not? And as far as I can tell, [Charles Margossian]: the answer is, you know, we're not at the stage of diagnosing the problem, [Charles Margossian]: but at least now people are taking this problem seriously enough that
[Charles Margossian]: they're building these case studies. Now based on these case studies [Charles Margossian]: where, you know, we do try out the different models and we do fit [Charles Margossian]: the complicated methods and we're able to say something about the simpler [Charles Margossian]: methods because we did fit the complicated methods. So it's an academic [Charles Margossian]: exercise in a way. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: But at least it gives us, you know, that's how it starts. And that's [Charles Margossian]: how we're going to start developing the intuition and the heuristics [Charles Margossian]: to then deal with, you know. deploying the approximations and the [Charles Margossian]: simplifications. In practice.
[Alex Andorra]: Yeah, that makes sense. And so actually, do you have an example of a research [Alex Andorra]: project that where you applied patient statistics in this field of pharmacometrics [Alex Andorra]: and where really patient statistics helped you uncover very important insights? [Alex Andorra]: thanks to all those methods that you've talked about.
[Charles Margossian]: Yeah. So I've never led a project in pharmacometrics. I've always collaborated [Charles Margossian]: with pharmacologists. [Alex Andorra]: Mm-hmm. [Charles Margossian]: So, you know, and it's true that my work has been more methodological,
[Charles Margossian]: has been more developing towards in itself. That's it. What I can talk [Charles Margossian]: about is some of the interactions I've had with pharmacometricians and some of [Charles Margossian]: the works were maybe a more contributor than say, the project lead. [Charles Margossian]: And I'll give you one example. And then I think that if we have time [Charles Margossian]: and we talk a little bit about epidemiology, I have a very [Alex Andorra]: Mm-hmm.
[Charles Margossian]: good example in epidemiology. [Alex Andorra]: Ah, okay, I was... [Charles Margossian]: But this is a, this was super cool actually. This is, this is a bit [Charles Margossian]: anecdotal. [Alex Andorra]: Mm-hmm. [Charles Margossian]: I don't know that there's a preprint on this now, but I was in Paris,
[Charles Margossian]: I was visiting in CERN, which does a lot of medical work. And I was [Charles Margossian]: visiting France Mautre's group, and they have fantastic people there. [Charles Margossian]: And I'm talking with Julie Bertrand, and she's interested in pharmacogenetics, [Charles Margossian]: right? So we're... We have some genome sequencing or some gene tracking [Charles Margossian]: that's involved, and we're trying to see how a patient reacts to a treatment,
[Charles Margossian]: to a certain condition. And in that case, what ended up happening, [Charles Margossian]: so one of the things that was frustrating is they're trying to identify [Charles Margossian]: what are the genes that are meaningful, that seem to have a meaningful [Charles Margossian]: connection to the outcome of the treatments. And they couldn't get anything [Charles Margossian]: that is statistically significant in the traditional sense, right?
[Charles Margossian]: There wasn't one gene or one SNP that unembarrassed bigously stood [Charles Margossian]: out as, yes, this is a meaningful snip and it should intervene in [Charles Margossian]: oilier analysis. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And what they had done was a Bayesian analysis where they had used [Charles Margossian]: a horseshoe prior. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And so what the horseshoe prior does is it does like, it's a regularization
[Charles Margossian]: tool that does a soft selection. And the way that soft selection [Charles Margossian]: manifests is there is a quantity that you can look at. and it will [Charles Margossian]: usually be bimodal. And one mode will indicate that the covariate that [Charles Margossian]: corresponds to this snip is completely [Charles Margossian]: regressed to zero. So that's an indication that this is not a very
[Charles Margossian]: useful explanatory variable. And then the second mode tells you actually [Charles Margossian]: this variable matters and it's not regressed to zero. And so why [Charles Margossian]: there are two modes is because there is uncertainty. But the very [Charles Margossian]: cool thing is that even though there was no single snip that stood [Charles Margossian]: out as this is the meaningful snip, uh, you had two snips that came up and
[Charles Margossian]: they both had a bimodal, right? So what that means is you couldn't definitely [Charles Margossian]: say that, uh, either snip mattered, right? But you say, okay, with some [Charles Margossian]: probability, either of the snips can matter. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And here's where it gets very interesting is that actually, because [Charles Margossian]: you have a multivariate posterior distribution, you can go a bit further
[Charles Margossian]: and you realize that the two snips are anti-correlated, right? And so [Charles Margossian]: what that means is when you have a lot of posterior mass at one mode [Charles Margossian]: for one snip that says, this variable matters, don't regress it to [Charles Margossian]: zero, the other covariate would always get regressed to zero [Alex Andorra]: Hmm. [Charles Margossian]: and vice versa, right? [Alex Andorra]: Mm-hmm.
[Charles Margossian]: So what the multivariate analysis tells you and what this proper treatment [Charles Margossian]: of uncertainty tells you is like, yeah, you can't say that one snip [Charles Margossian]: is statistically significant in the traditional sense, but now you have [Charles Margossian]: this more comprehensive treatment of uncertainty that tells you, but [Charles Margossian]: you know what? It has to be one of those two. You can't tell for
[Charles Margossian]: sure which one, but it has to be one of those two. And that's a nice [Charles Margossian]: example where we're really, not just looking at the maximum likelihood [Charles Margossian]: estimator or even the expectation value or just a variance, we're [Charles Margossian]: really looking at essentially the quantiles, the extreme quantiles [Alex Andorra]: Mm-hmm. [Charles Margossian]: of the posterior distribution. And excuse me. [Charles Margossian]: Sorry about that.
[Charles Margossian]: we're looking at multiple variables and [Alex Andorra]: Mm-hmm. [Charles Margossian]: the uncertainty across those multiple variables, right? So I think [Charles Margossian]: that's a very neat example and you know. paper to look forward to when [Charles Margossian]: it comes out or again. Again, this was anecdotal. This was a conversation [Charles Margossian]: in the laboratory, [Alex Andorra]: Yeah. [Charles Margossian]: but I got very excited about that exam.
[Alex Andorra]: Yeah, I mean, that sounds super exciting. Thanks a lot for sharing that. And [Alex Andorra]: yeah, like if when the paper is out, please get in touch and then that'd [Alex Andorra]: be a fun thing to talk about again. And actually, so we're running a bit [Alex Andorra]: long, but do you still have a few more minutes because I still have a few
[Alex Andorra]: questions for you. Or you want [Charles Margossian]: Yeah, [Alex Andorra]: to [Charles Margossian]: yeah, [Alex Andorra]: close out? [Charles Margossian]: I can stick around for a bit. Yeah. [Alex Andorra]: Okay, awesome. Yeah. So, uh, yeah, I'd like to, to talk to you about, uh, [Alex Andorra]: priors and then maybe, um, Torsten, uh, or an epidemiology example, uh, or both
[Alex Andorra]: of them. Um, so yeah, maybe, um, basically you have, uh, been working on, [Alex Andorra]: on Torsten, which is, if I understood correctly, uh, pharmacometrics application [Alex Andorra]: of Stan models. So that was interesting to me because the Bayesian field [Alex Andorra]: has been evolving really, really fast lately, especially with the new techniques, [Alex Andorra]: the new software tools, and you've definitely been part of that effort,
[Alex Andorra]: Charles. So I'm wondering if any recent developments that have particularly [Alex Andorra]: excited you, especially in your field of pharmacometrics and epidemiology, [Alex Andorra]: and also in relationship with what you're doing with Tolston. [Charles Margossian]: Yeah, okay, so let me talk about, let me give one example, right? [Alex Andorra]: Mm-hmm, yeah. [Charles Margossian]: So not a comprehensive answer, but an administrative answer. [Alex Andorra]: Yeah, it's great.
[Charles Margossian]: Um, and so one, one fundamental, you know, yet another fundamental question [Charles Margossian]: that would come up at a workshop, um, is okay. I have this ODE integrator, [Charles Margossian]: right? I need to solve an ODE to evaluate my likelihood and the ODE's [Charles Margossian]: come with certain tuning parameters. In [Alex Andorra]: Mm-hmm. [Charles Margossian]: particular, what is the precision with which you should solve your
[Charles Margossian]: ODE? And. And that is going to have an impact on the quality of your [Charles Margossian]: entrance and also the run time. Right, because if you solve the OD [Charles Margossian]: with a very strict tolerance, it takes much longer to solve that OD. Right. [Charles Margossian]: And so again, that comes back to the question of how much computation [Charles Margossian]: should we throw at the problem. And For the longest time, I didn't
[Charles Margossian]: have a good answer. And maybe I still don't to that question. And the [Charles Margossian]: way this manifested is either, you know, at workshops when teaching [Charles Margossian]: the subject, but even when we were writing the Stan manual, we have [Charles Margossian]: a page on OD integrators. We are, we have a, you know, we state, these [Charles Margossian]: are the default values that we use, but we don't have any clear recommendations
[Charles Margossian]: on what is the precision you should use. And we kind of assume that [Charles Margossian]: the user knows their ODE well enough that they'll know which ODE integrator [Charles Margossian]: to pick and what tolerance to set. [Alex Andorra]: Mm-hmm. [Charles Margossian]: which is not realistic. But it's just we didn't have an answer to that [Charles Margossian]: question. And so recently, there was a paper. And so I know that.
[Charles Margossian]: Let me look up exactly who the... I know Aki is a co-author on it, [Charles Margossian]: Aki the Tarii. [Alex Andorra]: Mm-hmm. [Charles Margossian]: But I want to give a shout out to the lead author. [Alex Andorra]: Yeah, for sure. And we'll put that also in the show notes for this episode. [Charles Margossian]: Yeah, so there you go. So Juho Timonen is the lead authors, is the [Charles Margossian]: lead author, and then there are a bunch of other people on it. So
[Charles Margossian]: their paper is an important sampling approach for Bayesian ODIs. And so [Charles Margossian]: essentially, what they realize is, if when you're solving, when you're [Charles Margossian]: using a numerical integrator to evaluate your likelihood, really you're [Charles Margossian]: not computing the true likelihood, you're computing an approximation
[Charles Margossian]: of this likelihood. And we have a lot of tools in statistics when we're [Charles Margossian]: not dealing with the exact likelihood, but some approximation to this [Charles Margossian]: likelihood, notably in the field of importance sampling. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And what they came up with is a way to use those tools that exist
[Charles Margossian]: in importance sampling. to actually check whether the approximate likelihood [Charles Margossian]: and therefore the tuning parameter of the ODE integrators are as precise [Charles Margossian]: enough [Alex Andorra]: Mm-hmm. [Charles Margossian]: or not, right? And so that gives you a diagnostic that you can use.
[Charles Margossian]: It's not a completely perfect diagnostic and I think I'm still trying [Charles Margossian]: to test that idea and play around with it and see how well it really [Charles Margossian]: works. So I wanna try it out on pharmacometrics problems. Um, right [Charles Margossian]: now I'm writing, uh, I've been tasked with writing, um, a tutorial on
[Charles Margossian]: Torsten. So we released part one a while ago. Now we're writing part [Charles Margossian]: two and we promised that in part two, we'd explain how to tune the ODEs. [Alex Andorra]: Mm-hmm. [Charles Margossian]: Um, except we only know so much how to tune the ODEs. So I am trying [Alex Andorra]: Yeah. [Charles Margossian]: this method. I am trying this method, um, on the ODEs, but it's just
[Charles Margossian]: getting me, you know, thinking about. are the tolerances we're using [Charles Margossian]: too conservative? Are they too strict? We actually get important
[Charles Margossian]: speed ups. I'm teaching a course this September in Leuven, the Advanced [Charles Margossian]: Summer School in Basie Methods, where I'm gonna have the students [Charles Margossian]: on an epidemiology problem, on a pharmacokinetic problem, try out different [Charles Margossian]: tolerances and see the differences and then build this diagnostic [Charles Margossian]: that's based on important sampling to check whether the precision with
[Charles Margossian]: which they're solving their ODEs is making meaningful changes to [Charles Margossian]: the inference, right? [Alex Andorra]: Mm-hmm. [Charles Margossian]: And so again, I think that this is one tuning parameter where either [Charles Margossian]: we're using ODE solvers that are not precise enough or ODE solvers
[Charles Margossian]: that are too slow. We're not being optimal in our computation. And this [Charles Margossian]: is preventing us from either getting accurate answers or deploying [Charles Margossian]: models with the sophistication that we would want to. And so that's a development [Charles Margossian]: that I'm excited about. The one caveat that I will throw in this is [Charles Margossian]: that right now we're still thinking about it as a single tuning parameter.
[Charles Margossian]: When what I've observed in practice is that the behavior of the ODE [Charles Margossian]: can change widely depending on where you are in the parameter space. [Alex Andorra]: Mm-hmm. [Charles Margossian]: So for certain parameter values, you don't need to be super precise.
[Charles Margossian]: And for other parameter values, you need a lot more precision or you [Charles Margossian]: need a different type of integrator because the OD just behaves in [Charles Margossian]: a different way. [Alex Andorra]: Yeah.
[Charles Margossian]: Now very concretely, how does this manifest? So I don't wanna say it's [Charles Margossian]: hopeless, but what ends up happening is during the warmup phase, [Charles Margossian]: where we start the Markov chains far off in the parameter space, or we [Charles Margossian]: haven't tuned the MCMC sampler, so the Markov chains are still jumping [Charles Margossian]: left and right. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: During the warm-up phase, we are more vulnerable to those extreme parameter [Charles Margossian]: values. Whereas during the sampling phase, we get away with less strict [Charles Margossian]: body solvers. So I think that somehow, what I would like to do is two [Charles Margossian]: things. One, I would like to have a very automatic way of running this [Charles Margossian]: diagnostics in Torsten. But I also wanna give users control over what
[Charles Margossian]: OD solver do they use at different stages of MCMC. They think that [Charles Margossian]: makes a crucial difference. Another way to approach this problem is coming [Charles Margossian]: up with good initializations. If I can start... my MCMC near, you know, within [Charles Margossian]: the parameter space where I might land from the stationary distribution. [Charles Margossian]: And I know that here, the parameter values are a bit less absurd.
[Charles Margossian]: And so solving the ODEs is a bit more feasible computationally. If [Charles Margossian]: I can start there, then maybe I'm skipping the early regions that really [Charles Margossian]: frustrate my ODE integrator and my MCMC sample. And the way this manifests [Charles Margossian]: in practice is you'll have, you know, you run, let's say you run
[Charles Margossian]: eight chains. You have six of them that finish quickly, and then you [Charles Margossian]: have two of them that are lagging because they're stuck somewhere [Charles Margossian]: during the warmup phase. They're encountering this region where the [Charles Margossian]: parameter values are a little bit absurd. Your OD is super hard to [Charles Margossian]: solve, and that's eating up all your computation. And the truth [Alex Andorra]: Mm-hmm.
[Charles Margossian]: is, you know, at least... And the way we do things right now, we always [Charles Margossian]: wait for the slowest chain. By [Alex Andorra]: Mm-hmm. [Charles Margossian]: the way, we don't have to do it, right? So I'm excited about methods [Charles Margossian]: to come up with good initializations. And I think that this is a place where variational [Charles Margossian]: inference can be good, right? So the, especially [Alex Andorra]: Aha.
[Charles Margossian]: now the pathfinder variational inference, right, [Alex Andorra]: Mm-hmm. [Charles Margossian]: was originally designed to produce good initializations for NCNC. And [Charles Margossian]: so getting good initializations, that's a great example of, I need a good answer, [Charles Margossian]: but not a super precise answer. Right. Uh, and so, you know, if somehow [Charles Margossian]: pathfinder can help me skip the regions that frustrate OD integrators,
[Charles Margossian]: I think that's a big win for pharmacometrics. Again, that's, that's [Charles Margossian]: something we have to test and really try out. But I will say now going [Charles Margossian]: back, I'm going to make another connection back to our hat. which is [Charles Margossian]: when we think about over dispersion. really what over dispersion means.
[Charles Margossian]: So we have shown a bit formally that what makes our hat reliable and [Charles Margossian]: we define reliability in a formal sense is the initial variance has [Charles Margossian]: to be large relative to the initial bias. Now, if you have an initialization, [Charles Margossian]: like you draw your sample from your prior and that's reliable, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: and then you throw variation inference, right? If variation inference reduces [Charles Margossian]: your squared bias more than it reduces your variance, it turns out [Charles Margossian]: you preserve the property of reliability. So there's actually a sense [Charles Margossian]: that we might be able to get good initializations for MCMC without [Charles Margossian]: compromising, uh, the reliability of our diagnostics for convergence,
[Charles Margossian]: right? And these are all the pieces that I think can come together and [Charles Margossian]: really help a great deal with, you know, pharmacometrics, but more generally, [Charles Margossian]: uh, with OD based models. and even more generally, models based on implicit [Charles Margossian]: functions. And I do include things that use nested Laplace approximations [Charles Margossian]: in that, because that's an optimization problem, that's an implicit
[Charles Margossian]: function. It has the same kind of misbehaviors that an ODE has, but also [Charles Margossian]: technologies that we develop for ODE is in a Bayesian context, better initializations, [Charles Margossian]: different tolerances, important sampling corrections would apply [Charles Margossian]: to nested Laplace stuff. So those are the things that I'm excited about [Charles Margossian]: Um, but it's going to take time.
[Alex Andorra]: Hey, [Charles Margossian]: I just [Alex Andorra]: yeah. [Charles Margossian]: want to, I just want to be perfectly honest. It takes time to really, [Charles Margossian]: you know, uh, you know, between the paper and. The software implementation [Charles Margossian]: and, uh, you know, clear description in the manual that the users can, [Charles Margossian]: can follow. It takes a lot of time.
[Alex Andorra]: Yeah, yeah, for sure. I mean, that stuff is really at the frontier of the [Alex Andorra]: research. So it does make sense that it takes time to percolate [Charles Margossian]: Yeah, yeah. [Alex Andorra]: from how to find a solution to, okay, this is how we can implement it and [Alex Andorra]: reproduce it reliably. [Charles Margossian]: Yeah, exactly. And I'm most concerned about how speculative I am for
[Charles Margossian]: some of the ideas I'm sharing. But I think these are directions where [Charles Margossian]: it's worth pushing the research. That has the potential to have a [Charles Margossian]: really big impact. [Alex Andorra]: Yeah, yeah, for sure, for sure. And so I put in the show notes the Pathfinder [Alex Andorra]: paper, actually. That made me think that I should do an episode about the [Alex Andorra]: Pathfinder paper, basically, what that is about and what that means concretely.
[Charles Margossian]: Mm-hmm. [Alex Andorra]: So yeah, I'll try to do that [Charles Margossian]: Yeah. [Alex Andorra]: in [Charles Margossian]: And [Alex Andorra]: the [Charles Margossian]: if you haven't [Alex Andorra]: near [Charles Margossian]: reached [Alex Andorra]: future. [Charles Margossian]: out to Lu Zhang, you know, I mean, it, you know, first of all, the paper
[Charles Margossian]: is great. But actually sitting down and discussing this with her, [Charles Margossian]: you know, at the blackboard or whatever, like, she has so many ideas that [Charles Margossian]: have not appeared in the paper itself. I think, you know, if I can recommend [Charles Margossian]: a guest. If you haven't had her already, I don't know if you... But [Charles Margossian]: yeah, Lu [Alex Andorra]: No, [Charles Margossian]: Zheng, I think, [Alex Andorra]: I didn't.
[Charles Margossian]: would be fantastic to interview. [Alex Andorra]: Yeah, yeah, exactly. I was actually thinking about inviting her on the podcast [Alex Andorra]: to talk about Pathfinder because she's the lead author on the paper and also [Alex Andorra]: all the other authors have been on the show, Bob [Charles Margossian]: Ah [Alex Andorra]: Carpenter, [Charles Margossian]: cool. [Alex Andorra]: Andrew [Charles Margossian]: Cool.
[Alex Andorra]: Gellman, Akive Tareiso. Lu Zheng is missing, so definitely [Charles Margossian]: Mm-hmm. [Alex Andorra]: need to correct that. So yeah, in the near future, definitely try. to have [Alex Andorra]: that episode that would be a very interesting one. So maybe can you talk
[Alex Andorra]: Charles about... So I'll give you two avenues and you pick the one you prefer [Alex Andorra]: because I don't want to take too much of your time but basically I'm curious [Alex Andorra]: either about hearing an example of your work in the epidemiology field Or you [Alex Andorra]: can talk a bit more in general about a very, very common question that students
[Alex Andorra]: always ask me, and it's about priors. So basically, how do you choose the [Alex Andorra]: prior, how do you approach the challenge of choosing appropriate prior distributions, [Alex Andorra]: and especially when you're dealing with complex models. So these are the two avenues [Alex Andorra]: I have in mind. And feel [Charles Margossian]: Okay. [Alex Andorra]: free to... Pick one or... Pick both.
[Charles Margossian]: Okay, so let me say that about priors, I think that... [Charles Margossian]: I want to get better answers, [Alex Andorra]: Yeah. [Charles Margossian]: a better answer to this question. And, you know, like the next time [Charles Margossian]: workshop that I'm giving, I don't have a module on priors that I find
[Charles Margossian]: satisfactory. And so I'm still undergoing this journey. But what I will [Charles Margossian]: do is I'll give, I'll talk about epidemiology and I will talk about [Charles Margossian]: the prior that we use there. So that will be, that would be an example. [Charles Margossian]: And, you know, I like to think. through examples, I like to think [Alex Andorra]: Mm-hmm.
[Charles Margossian]: through the anecdotal, as complementary to the formal, right? I'm [Charles Margossian]: a big fan of fairy tales and fables, simple stories that have good themes. [Alex Andorra]: Yeah. [Charles Margossian]: So what happened in epidemiology, and this will be a good example, is, well, [Charles Margossian]: the pandemic happened, and suddenly COVID, we're all going home. [Charles Margossian]: And we had colleagues in epidemiology, in particular Julien Rioux,
[Charles Margossian]: who actually met Julien Rioux at the StankCon. He was in Cambridge, [Charles Margossian]: he was a PhD student at the time and he demonstrated his model and [Charles Margossian]: he was using those ODI based models. So now instead of having a drug [Charles Margossian]: compound that flows between different parts of the body, what you have [Charles Margossian]: is... you separate the population, the human population, to these
[Charles Margossian]: different compartments. So susceptible individuals, infected individuals, [Charles Margossian]: recovered individuals, and then the individuals flow between the compartments. [Charles Margossian]: And there are a bit more layers. But basically the mathematical formalism [Charles Margossian]: that I had familiarized myself with in the context of pharmacometrics [Charles Margossian]: turned out to be very relevant to... certain classes of epidemiological
[Charles Margossian]: models. And essentially, Julien was working on an early model of COVID-19. [Charles Margossian]: They were trying to estimate the mortality rate. There was a lot of uncertainty [Charles Margossian]: in the data. There were a lot of [Charles Margossian]: things to correct for. Right. So, for example, early on, not everyone [Charles Margossian]: got tested and testing was not widely available. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: Who got tested? The people with severe symptoms. So now if you think about [Charles Margossian]: you're trying to estimate the mortality rate [Alex Andorra]: Yeah. [Charles Margossian]: and according to your data, the people who catch the disease are [Charles Margossian]: only the ones who have severe symptoms, then it looks like a lot of [Charles Margossian]: people are dying from the disease. I mean, a lot of people were dying
[Charles Margossian]: from the disease, but that inflates the number. There's a bias [Alex Andorra]: Yeah, [Charles Margossian]: because you're [Alex Andorra]: yeah, [Charles Margossian]: only [Alex Andorra]: for sure. [Charles Margossian]: testing the people who are sick. You're not testing the people with [Charles Margossian]: mild symptoms or even with no symptoms. [Alex Andorra]: No, for sure. So, clear sampling by ASEA.
[Charles Margossian]: The other bias was some of the people who had caught the virus had [Charles Margossian]: not died yet. So you count them as living, but that doesn't mean they [Charles Margossian]: survived the disease. So that's a bias [Alex Andorra]: Yeah. [Charles Margossian]: in another direction. And that's an example where you actually have [Charles Margossian]: a somewhat mechanistic model. That's based on the epidemiology of
[Charles Margossian]: how a disease transmits and circulates in a population. Then on top [Charles Margossian]: of that, you need to build a measurement model to account for, you know, [Charles Margossian]: how the data is collected. Right. But at the end of the day, none of the, [Charles Margossian]: you know, we were not able to draw any conclusions. unless we understood [Charles Margossian]: what was the rate of people who were symptomatic or had severe symptoms.
[Charles Margossian]: Right. And so there's one parameter in the model, which is the asymptomatic [Charles Margossian]: rate. And so now you have two options in a classic classical statistics [Charles Margossian]: framework. Either you fix the parameter and then you're making a [Charles Margossian]: strong assumption and maybe you try different values of the fixed
[Charles Margossian]: parameter. Right. Or you just say, well, I don't know this. So really [Charles Margossian]: I have no idea what the mortality rate is because maybe the entire [Charles Margossian]: population was infected or maybe only a small fraction was infected [Charles Margossian]: and everyone in that small fraction had the severe disease, right? And [Charles Margossian]: so we needed an in-between, between saying we don't know anything and saying
[Charles Margossian]: we know everything. And this is why I think we're basing shines, which [Charles Margossian]: is we can quantify uncertainty. We have more nuanced statements. through the [Charles Margossian]: language of probability about what our state of knowledge is. And actually [Charles Margossian]: what had happened is there were some instances where we had measured asymptomatic [Charles Margossian]: rates. The example was a cruise ship, the Diamond Princess, in the
[Charles Margossian]: coast of Japan. So they had identified some cases of COVID-19. They [Charles Margossian]: put the cruise ship in quarantine and they test everybody, regardless [Charles Margossian]: of whether they had symptoms or not. So now you have a small population [Charles Margossian]: and based on that small population, you get an estimate of the [Alex Andorra]: Yeah. [Charles Margossian]: asymptomatic rate. And then you had one or two incidents where people
[Charles Margossian]: do that. There were some cities where they had done some other experiments [Charles Margossian]: and measured some other data. And so then you bring all that information [Charles Margossian]: and you use that information. to construct priors and then to propagate uncertainty
[Charles Margossian]: into the model, right? The reason we're able to make predictions and [Charles Margossian]: then to [Charles Margossian]: calculate things like the mortality rate with the appropriate uncertainty [Charles Margossian]: is because we had a prior on the asymptomatic rate. And that's a very [Charles Margossian]: nice example. This is more an example of why it was crucial to have [Charles Margossian]: a prior rather than, you know. How should you construct priors in general,
[Charles Margossian]: right? This is a bit of a specific case, [Alex Andorra]: Yep. [Charles Margossian]: but it's a very good example. And so I'll recommend two papers. One is the [Charles Margossian]: one by Julien Rioux, which was, so this was disappeared in PLOS Medicine. [Charles Margossian]: And it's with a lot of contributors, but it's an estimation of seroscoped
[Charles Margossian]: tumor mortality during the early stages of an epidemic. And then the [Charles Margossian]: other paper, that goes a bit more into what are the lessons that we [Charles Margossian]: learned from this, from a Bayesian workflow perspective, is Bayesian [Charles Margossian]: workflow for disease transmission modeling in Stan. And so here the first author [Charles Margossian]: was Leo Grintjerdsch. [Alex Andorra]: Mm-hmm.
[Charles Margossian]: And then we had also Lisa Semenova and Julien Rioux as co-authors. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And this is a beautiful paper that really goes into the, you know, [Charles Margossian]: here's the prior reuse. Here's the first version of the model. Here [Charles Margossian]: are the limitations with this model. Here's how we diagnose the models.
[Charles Margossian]: Here's the next iterations. And we go through all the iterations. And I [Charles Margossian]: like to show the numbers that the model that eventually we used to model [Charles Margossian]: COVID-19 was the 15th iteration. And along the way we had the model [Charles Margossian]: that took three days to run. And we had to change the way we wrote
[Charles Margossian]: the Stan model to improve the computation. So that had to do with how [Charles Margossian]: the OD was parameterized and how the automatic differentiation was happening. [Charles Margossian]: We got it from three days to two days. And that's useful not, sorry, [Charles Margossian]: three days to two hours, not two days, two hours, drastic speed up.
[Charles Margossian]: And so that not only was good because the inference is faster, but that's [Charles Margossian]: what allowed us to then use more sophisticated versions of the model. [Charles Margossian]: And so [Alex Andorra]: Yeah. [Charles Margossian]: all that is described in that. Bayesian workflow for disease transmission. [Charles Margossian]: paper. [Alex Andorra]: Hmm, oh yeah, need [Charles Margossian]: So [Alex Andorra]: that.
[Charles Margossian]: that's my epidemiology fairy tale. And I don't mean [Alex Andorra]: Yeah. [Charles Margossian]: that in the sense that it had a happy ending or that everything was [Charles Margossian]: nice and glowing. I mean that this is a very nice story that touches [Charles Margossian]: upon a lot of interesting things. [Alex Andorra]: Yeah, yeah, for sure. Definitely going to use link to this paper in the show
[Alex Andorra]: notes. Already have it on the Stan website. Actually, you've done a case [Alex Andorra]: study on this, so that's perfect with all the co-authors. So I'm going to put [Alex Andorra]: that right now in the show notes. And maybe last question before letting you [Alex Andorra]: go, Charles. I'm breaking records these days on the episodes. Like episode 89 [Alex Andorra]: is going out this week actually. And it's so far the longest episode. Uh, it's
[Alex Andorra]: about two hours. And right now we're like approaching this record, uh, Charles. [Alex Andorra]: So, uh, like well done. And at the same time. [Charles Margossian]: Okay. There's a fantastic podcast and I forget the name. It's a German [Charles Margossian]: podcast. [Alex Andorra]: Uh huh.
[Charles Margossian]: I can't believe I forget the name. I'll try and email you what it is, [Charles Margossian]: but basically they do interviews with these, you know, intellectuals [Charles Margossian]: and well established, you know, politicians and well, I'm not saying [Charles Margossian]: all politicians are intellectuals, but people have opinions and thoughts and there's [Charles Margossian]: no time limit to the interview. And they just go on and on and on and on and
[Charles Margossian]: on. And they just have so many topics to discuss. It's really, I'm [Charles Margossian]: not saying you should do that with me, but you could imagine like, [Charles Margossian]: you know, someone like, you know, Some of the other co-authors that [Charles Margossian]: have come up, I feel like you could talk six hours with them and you [Charles Margossian]: would just [Alex Andorra]: Oh [Charles Margossian]: pick [Alex Andorra]: yeah, [Charles Margossian]: their
[Alex Andorra]: for [Charles Margossian]: brand [Alex Andorra]: sure. [Charles Margossian]: and they would have so much insights to [Alex Andorra]: Yeah. [Charles Margossian]: share. [Alex Andorra]: No, for sure. [Charles Margossian]: So [Alex Andorra]: Like, I mean, most [Charles Margossian]: they [Alex Andorra]: of [Charles Margossian]: are. [Alex Andorra]: the time the limitation is the, is the guest's own, own time. [Charles Margossian]: Right, right, [Alex Andorra]: Yeah.
[Charles Margossian]: right. But the [Alex Andorra]: Yeah, for [Charles Margossian]: dev, [Alex Andorra]: sure. [Charles Margossian]: yeah, I think it's really cool, this idea of, yeah, we're gonna do, what [Charles Margossian]: if we didn't have a time limit on the interview? What would happen? [Charles Margossian]: And [Alex Andorra]: Exactly. [Charles Margossian]: eventually somebody gets hungry and that's what happens, but. Ha ha [Charles Margossian]: ha.
[Alex Andorra]: Yeah, yeah, exactly. Yeah, but so basically before asking you the last two [Alex Andorra]: questions, I'm also wondering, because you also teach, so that's super interesting [Alex Andorra]: to me. And I often hear that a lot of practitioners and beginners in particular [Alex Andorra]: might be hesitant or even intimidated. to adopt patient methods because they perceive [Alex Andorra]: them as complex. So I don't necessarily agree with that underlying assumption,
[Alex Andorra]: but without necessarily disputing it, what do you do in those cases? Basically, [Alex Andorra]: what resources or strategies do you recommend to those who want to learn [Alex Andorra]: and apply patient techniques in the work, but might be intimidated or hesitant? [Charles Margossian]: Yeah, okay, so I think that usually when I teach. [Charles Margossian]: Most of the time it's people who are already interested in Bayesian [Charles Margossian]: methods, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: especially if it's a workshop on stand, the people who do sign up, they already [Charles Margossian]: have adhered to Bayesian methodology. So I don't know that I've really [Charles Margossian]: had to convince that many people. Definitely when I was a TA at Columbia, [Alex Andorra]: Mm-hmm. [Charles Margossian]: I had those conversations a bit more with the [Alex Andorra]: Mm-hmm.
[Charles Margossian]: students, especially when I TA'd the PhD level. that everyone has to [Charles Margossian]: take applied statistics and [Alex Andorra]: Hm. [Charles Margossian]: not everyone is gonna do Bayesian, right? And so we have those conversations. [Charles Margossian]: But I think that ultimately what it is, is not that Bayesian is complex, [Charles Margossian]: it's analysis is complex. Analysis is difficult. And when you use less [Charles Margossian]: complicated methods,
[Charles Margossian]: like, maximum likelihood estimates or point estimates. And these are [Charles Margossian]: not simple. I really don't want to undermine the difficulty related to [Charles Margossian]: those methods and the fact that they are useful in a lot of applications. [Charles Margossian]: But anyway, those methods are simple and they will work for simple analysis. [Charles Margossian]: But if an analysis requires you quantify uncertainty, to propagate uncertainty,
[Charles Margossian]: to take some decisions with imperfect data. And then someone says, [Charles Margossian]: well, I don't wanna be Bayesian because that's complicated, but I [Charles Margossian]: still want to quantify uncertainty and I still want to propagate uncertainty [Charles Margossian]: and I still want to make predictions. Well, suddenly the classical methods, [Charles Margossian]: you really have to do a lot of gymnastic to get them to do what you're
[Charles Margossian]: interested in. And And so the classical methods also become complicated. [Charles Margossian]: And that's more because you're trying to use them for, uh, you know, [Charles Margossian]: to do a sophisticated analysis. And so I think that what matters [Charles Margossian]: is, you know, to really meet practitioners, uh, where they are, what
[Charles Margossian]: is the problem they're interested in? And. you know, does the, you [Charles Margossian]: know, is it, is it a problem where, you know, the, the complexity of the [Charles Margossian]: analysis would be handled in a relatively straightforward way by a Bayesian [Charles Margossian]: analysis? And the answer could be yes, it could also be no. And if it's
[Charles Margossian]: no, that's fine. You know, I think that again, we have, we have to be [Charles Margossian]: loyal to, to the problem, more so than to the field or to, or to the [Charles Margossian]: method, but that's how I would start this conversation. And so when, [Charles Margossian]: you know, when I'm sitting with pharmacometricians, And by the way, a
[Charles Margossian]: lot of them don't use Bayesian. I think I can talk for 30 minutes [Charles Margossian]: about what do we want to get out of a pharmacokinetic analysis without [Charles Margossian]: bringing up what kind of statistics we do. And once we've established [Charles Margossian]: the common goals, then we think about what are the methods that are [Charles Margossian]: going to get there. That's how I would do it. Frankly, I haven't had that
[Charles Margossian]: much experience doing it. So take what I say with a grain of salt. [Charles Margossian]: Yeah, but I really think it's, Bayesian is complicated because it confronts [Charles Margossian]: you with the complexity of data analysis. Right? It doesn't [Alex Andorra]: Yeah. [Charles Margossian]: introduce the complexity. Let me put it this way. [Alex Andorra]: Yeah, yeah. No, completely agree. And that usually is... I usually give an answer
[Alex Andorra]: along those lines. So [Charles Margossian]: Mm-hmm. [Alex Andorra]: that's interesting. To see that we have converged, even though we didn't, we [Alex Andorra]: didn't, you know, talk about that before the show. so that people don't [Alex Andorra]: start having a conspiracy theories about Charles and I trying to push back the [Alex Andorra]: message. [Charles Margossian]: This is not rehearsed.
[Alex Andorra]: Exactly. Awesome. Well, is there a topic I didn't ask you about that you'd like [Alex Andorra]: to mention before we close up the show? [Charles Margossian]: Um, no. Let me keep [Alex Andorra]: Perfect. [Charles Margossian]: it simple. [Alex Andorra]: Yeah. I mean, we've talked about a lot [Charles Margossian]: There [Alex Andorra]: of [Charles Margossian]: are [Alex Andorra]: things.
[Charles Margossian]: a lot of topics that I think we could have very interesting conversations [Charles Margossian]: about, but I also think that we're starting to hit the time constraints. [Alex Andorra]: Awesome. Well, let's go to show then. Me too, I could keep asking you a lot [Alex Andorra]: of things, but let's do another episode. Another day. That'd be a fun thing. [Alex Andorra]: Or if one day you have a model you've been working about and that you think
[Alex Andorra]: would be beneficial for listeners to go through. I have this new format now, [Alex Andorra]: which are the modeling webinars, where basically you would come on the show [Alex Andorra]: and share your screen and show us your code and do some live coding about [Alex Andorra]: basically a project that you've been working on. And so if one day you have [Alex Andorra]: something like that, feel free to get in touch and we'll get that organized
[Alex Andorra]: because... It's a really cool new format and I really love it. In August, [Alex Andorra]: we've had Justin Boyce showcasing how to do Bayesian modeling workflow in the [Alex Andorra]: biostatistics world. So even if you're not a biostatistician, it's a really [Alex Andorra]: useful thing because basically, as we were saying, Bayes are basically methods. [Alex Andorra]: Even though the field might... not be yours, the methods are definitely transferable
[Alex Andorra]: in the workflow. So that was a very fun one. And then we've got one coming [Alex Andorra]: up in September in theory with Benjamin Vincent and we're going to dive into [Alex Andorra]: the new do operator that we have in Pimc and how to use that. So that's [Alex Andorra]: going to be a very fun one too. [Charles Margossian]: Yeah, I love the idea of that format, and I'm definitely going to [Charles Margossian]: check out the two webinars.
[Alex Andorra]: Yeah, [Charles Margossian]: The [Alex Andorra]: yeah. [Charles Margossian]: one that's already there and the one that's coming up. Actually, the [Charles Margossian]: two topics sound extremely interesting. [Alex Andorra]: Yeah, I'll send that to you. And yeah, for sure that that's something I've [Alex Andorra]: been starting to do. And so if you have one such analysis one day, feel [Alex Andorra]: free to reach out that'd be a fun one. Um, [Charles Margossian]: Absolutely.
[Alex Andorra]: so before letting you go, of course, I'm going to ask you the last two questions [Alex Andorra]: I ask every guest at the end of the show. One, if you had unlimited time [Alex Andorra]: and resources, which problem would you try to solve? [Charles Margossian]: Oh yeah, I thought about this and [Alex Andorra]: Hehe. [Charles Margossian]: I feel like so much of my work is about working under computational [Charles Margossian]: constraints, [Alex Andorra]: Mm-hmm.
[Charles Margossian]: you know, and time constraints and resource constraints and suddenly you relax [Charles Margossian]: all of this. And... [Charles Margossian]: I think. For me, they're really big questions. [Charles Margossian]: let's say at a curiosity level, right? I think there are, you know, [Charles Margossian]: there are utilitarian questions and philanthropic questions that [Charles Margossian]: I might prioritize. But my thinking right now is, I really love the
[Charles Margossian]: problems I worked on in astronomy. I am mind blown by some of the stuff [Charles Margossian]: my colleagues do in cosmology, where they're trying to understand, you know, [Charles Margossian]: the early universe. They have models with six parameters that apparently [Charles Margossian]: explain the entire structure of the universe. I like to understand that [Charles Margossian]: a little bit better. I love the work I did on exoplanets. I think
[Charles Margossian]: thinking about... Yeah, is there life on other planets? How does it [Charles Margossian]: manifest? My advisor when I was an undergrad, she... [Charles Margossian]: She put it a good way, she said, you know, astronomy helps us think [Charles Margossian]: about our position in the universe, our place in the universe. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And you know, yes, I have unlimited time and resources in this completely
[Charles Margossian]: ideal scenario. I think I would gravitate towards these really, really [Charles Margossian]: big questions, which by the way, I can still get involved in [Alex Andorra]: Heh [Charles Margossian]: right [Alex Andorra]: heh. [Charles Margossian]: now as a researcher in Flat Iron, but it's true that there's more [Charles Margossian]: competition for that. Um, and just, there's a scene that I, um,
[Charles Margossian]: Yeah, okay. If I had thought about this more, I would have just started [Charles Margossian]: there. But there's a scene that I really like. This goes back to cosmology, [Charles Margossian]: to astronomy, you know, which is this idea of a theory of everything [Charles Margossian]: and, you know, a very fundamental model. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And, you know, maybe it all reduces to one equation or one set of equations.
[Charles Margossian]: And what I really wonder is, you know, if we did have that model and [Charles Margossian]: that theory, how much insight would we actually get from it? because [Charles Margossian]: you can have a simple system with simple particles and simple interaction [Charles Margossian]: rules. And that doesn't mean you understand the emerging behavior of
[Charles Margossian]: the system. And if I had unlimited resources and I can actually figure [Charles Margossian]: out what that equation is and then run the simulations with infinite [Charles Margossian]: computation and then study the behavior. And actually, at a very conceptual [Charles Margossian]: level, understand how much insight we get out of this.
[Alex Andorra]: Yeah. [Charles Margossian]: And the example I like to give is, The rules of chess are simple, [Charles Margossian]: but just because you understand the rules of chess doesn't mean you understand [Charles Margossian]: chess. [Alex Andorra]: Mm-hmm. [Charles Margossian]: And I think that there is a, there is a tension with the reductionist [Charles Margossian]: view of physics and the state of the world that we live in, that is
[Charles Margossian]: kind of related to that. And that sometimes the simplifications that [Charles Margossian]: we use, involvement of probability theory of statistics. That's still [Charles Margossian]: useful, not just because we don't have the computation to run all the [Charles Margossian]: simulations based on fundamental equations, but because it actually is more [Charles Margossian]: intelligible to us. And yeah, I think we've only made the time and
[Charles Margossian]: resources. You could really explore that. Once [Alex Andorra]: Hmm. [Charles Margossian]: you can run all the simulations you want, what are the actual models that teach [Charles Margossian]: you something and that gives you insight? [Alex Andorra]: Mm hmm. [Charles Margossian]: Let's try [Alex Andorra]: Yeah. [Charles Margossian]: that.
[Alex Andorra]: Yeah, that sounds like a fun one for sure. And second question, if you could [Alex Andorra]: have dinner with any great scientific mind, dead, alive or fictional, who would it [Alex Andorra]: be? [Charles Margossian]: I read the question yesterday and I really have... [Charles Margossian]: I, um... [Charles Margossian]: I like the idea of, that's not a definite answer. Let's put it as an answer.
[Charles Margossian]: But I like the idea of talking, you know, with one of the founders [Charles Margossian]: of hypothesis testing. So, you know, Naaman Pearson, Fisher, and, [Charles Margossian]: you know, and by the way, just because I'm having the dinner with them, [Charles Margossian]: that doesn't mean I condone everything they've done and their character
[Charles Margossian]: and their behavior. Just to follow this closure. But I think that I'd [Charles Margossian]: be interested to ask them, why were you thinking when you came up with [Charles Margossian]: those ideas? And what do you make of how this method that you've [Charles Margossian]: developed has been used and misused [Alex Andorra]: Mm-hmm. [Charles Margossian]: in current days? And what do [Alex Andorra]: Mm-hmm.
[Charles Margossian]: you, and also like, I don't know how much Bayesian methods were on [Charles Margossian]: their radar, but what would they think of Bayesian methods now that
[Charles Margossian]: we have the computations that are available. So kind of, you know, someone [Charles Margossian]: from the early 20th, late 19th century, one of those, you know, statisticians [Charles Margossian]: that's described as having done something foundational, but also that [Charles Margossian]: has worked with, you know, on a field, on a branch of statistics that [Charles Margossian]: historically has been opposed to the branch of statistics that I
[Charles Margossian]: work on. And I think that could be, you know, hopefully a pleasant. certainly [Charles Margossian]: an engaging conversation for dinner. [Alex Andorra]: Yeah, very interesting answer and very original. You're the first one to [Alex Andorra]: answer that. And I had not even thought about that, but that definitely makes [Alex Andorra]: sense. [Charles Margossian]: Again, an [Alex Andorra]: Very [Charles Margossian]: answer. [Alex Andorra]: interesting.
[Charles Margossian]: If having dinner with Laplace is on the table, I'm not saying I wouldn't [Charles Margossian]: take that. [Alex Andorra]: Yeah, for sure, for sure. But yeah, that's definitely super interesting. [Alex Andorra]: And I do remember that I, I mean, in my recollection, which of course is [Alex Andorra]: very fuzzy, has any Homo sapiens memory. We talked about that in episode 51 [Alex Andorra]: with Aubrey Clayton. And we took it about his book. Bernoulli's fallacy
[Alex Andorra]: and the crisis of modern science. I'll put that in the show notes. And I seem [Alex Andorra]: to remember also from his book that the founder of hypothesis testing definitely [Alex Andorra]: had an active role in putting aside patient statistics at the time. That [Alex Andorra]: it was definitely very, very motivated also in that regard, but I don't [Alex Andorra]: remember one. why on the top of my head. But yeah. So yeah, I'll put that
[Alex Andorra]: into the show notes. I should really listen to this episode also personally. [Alex Andorra]: This was a very interesting one. So I recommend it to people who, when [Alex Andorra]: I get started with the podcast, actually, I think it's a very good first one [Alex Andorra]: to understand a bit more, basically, why you would like to think a bit more about [Alex Andorra]: the foundations of the podcast. hypothesis testing and [Charles Margossian]: Mm-hmm.
[Alex Andorra]: why it's interesting to think about other frameworks and why the Bayesian [Alex Andorra]: framework is an interesting one. [Charles Margossian]: Oh yeah, that sounds great. Yeah. So maybe I could have dinner with, remind [Charles Margossian]: me the name [Alex Andorra]: Aubrey [Charles Margossian]: of your [Alex Andorra]: Clayton. [Charles Margossian]: guest. Yeah. So I mean, add that to my answer.
[Alex Andorra]: Awesome. Well, thanks a lot, Charles, for taking the time. As usual, I put [Alex Andorra]: resources and a link to your website in the show notes for those who want [Alex Andorra]: to dig deeper. Thank you again, Charles, for taking the time and being on this [Charles Margossian]: Yeah, [Alex Andorra]: show. [Charles Margossian]: thanks Alex for the invitation and also the service to the community that
[Charles Margossian]: I think this podcast is. It's really a fantastic resource. So thank [Charles Margossian]: you very much.