Joining Bayesian submodels with Markov melding

00:02

I was just curious. OK, I'm going to have to monitor the table on this. Yes. I hope it will. I hope this is yeah, you can say it's so. Yeah, so I can send my slides. That's correct. Now. It's tough, it's all. I so so somewhere out there and and the interweb you indicate that to. About where they are today, and you hear me, right, so. A chat message isn't that. Sorry, this is a job for John. John produced a message. Yes, they can hear us.

01:09

Yes, word. Well, welcome, and thanks to Robert for standing next to me, sir. So, so welcome today to Robert Good. He's going to tell us about some of his work on lack of melting. And so this is of interest to me personally, because you're Mormon specifications is a thing that we have to deal with and parametric inference,

01:38

and it's one of the people who's dealing with it. So Robert was actually an emergency student here in Oxford a long time ago, and and he's going on to great things at the new statistics unit in Cambridge. So, so, so thanks very much, Robert. We do. We need to worry about that. So hopefully someone somewhere will have to shout out messages and say, you can't hear me remotely. So thanks very much to Jeff for inviting me. It's nice to be here to present this work.

02:28

So this work is a. Doesn't it seem to be working? It was working a second ago, the. And they need to move. It's not what it's working now. So this this work is motivated by problems where you have more than one source of data. So you might have multiple studies, each of which of course have have separate pieces of data associated with them, which won't go away one up to wait for. And each of these studies, we think are complementary in some way.

03:04

So they might be from different populations, or they might tell us about different aspects in my line of work and some kind of disease, human disease usually and said ideally we'd wait to piece them together to hopefully give us better inference. So to piece them together, we need to assume that there's some degree of commonality between these different bits of data.

03:26

And for most of this talk, we're going to assume that the commonality is that we've got some parameter fire here, which is shared by all of the models for each of these different studies. And by combining them together, hopefully that will give us more precise estimates just for the fact that we've got a larger sample size. I hope will also give us a true representation of the uncertainty that's inherent in in in the situation we're looking at.

03:54

So in some situations, the variants might be larger, but that will hopefully be a more accurate reflection of of what's going on. And finally, it will hopefully minimise the risk of selection type biases. So if we just use one of these datasets and maybe that only considers a particular cohort of patients, then then that might be biasing our results according to the usual selection type problem.

04:22

So if to make it a little bit more concrete in my line of work of statistics, people often come along with with multiple different datasets, so there might be a clinical trial where they've investigated the effect of some specific intervention on some disease. There might be a cohort study that might have run for 20 30 years and will be a sort of moderate size, whereas the clinical trial will tend to be recovery.

04:50

So you will tend to be relatively small and have very strong inclusion criteria, so it won't be terribly representative of the whole population. We may also have health record data from GP's and hospitals, which will be a larger and potentially but different types of biases from a more carefully controlled clinical trial, a cohort study.

05:10

And finally, we might also have genomic or other kinds of high throughput data, and ideally, we would like to combine these all together to provide a complete understanding of whichever disease as we're interested in. But in practise doing that, at least as a Bayesian and probably as a classical suspicion as well will be very hard because all of these different types of data all have their own unique complexities.

05:36

And so identifying a suitable model that describes all four of them simultaneously is well nigh on impossible. And even if you could fit in such an enormous model, at least with the computational tools that are available at the minute will be pretty difficult.

05:54

And even if you could do that, you've got such a large amount of data and such a complicated model working out whether that the posterior distribution from that enormous model is actually a good, a good fit to your data and and really captures what's going on will be very hard. Never mind if you get results that are counterintuitive in some way, trying to figure out where the problem is arising from, when you've got so many bits of data and parameters.

06:24

So what what what we are coming out with this work is, is the idea that you might not try and do this all at once. You might instead have its smaller sub models for each of these different types of data. So you might have a model one that describes the clinical trial data for the cohort study and so on. And in fact, this is often how things are and that someone will already have analysed the clinical trial and they will develop the model.

06:53

And similarly, for all of the other types of data, there will be existing models. No one's going to start trying to model all of these data together immediately from nothing. And so while we could do all of that hard work in creating one up to P four out, it seems pretty wasteful to throw all of that away and start from scratch again. So what we'd like to do is find some way to take these existing models and integrate them into an overall analysis,

07:17

ideally as generically as possible. So and the existing ways that people do this are as follows. And so one way that is extremely widely used is that you you have one of these models, you get your posterior distribution and then you take some that take the common quantity, which in this case is PHI. And then you plug it in to that point estimate from that posterior distribution into a second model and then you carry on and do that each time.

07:51

So this, I think, is everywhere is actually quite hard to find examples of this because everyone hides this in the supplementary materials the best of their paper. But I think this happens absolutely everywhere. And of course, statistically, this probably isn't ideal because you're taking an uncertain quantity and putting you in a point estimate. So at least in general, this will underestimate uncertainty.

08:12

Of course, in specific situations, if the onset is small relative to the problem you're looking at, it may be quite reasonable. A second approach is to plug in an approximation to the posterior into the second model that you're fitting, so rather than just taking a single fixed point estimate you you take some distribution that looks similar to the posterior distribution and pro-Tibetan.

08:37

We'll come back to that in a minute. And finally, what we were trying to do is integrate these models into a single joint Bayesian model, which of course then brings along all of the advantages of Bayesian inference. But of course, it could be quite tricky to do that in practise, and that's what we're that's the problem that we're trying to solve.

08:57

So he has a slightly more concrete story example that is based on the more complicated example show later that will hopefully make things a little bit more concrete. So this and this is this is where all of this work really came out of which was in trying to model through. And one of the quantities of interest in that world is the probability of being hospitalised, given that you have influenza like illness.

09:24

And let's imagine that we observed 100 people in a hospital with influenza like illness out of a thousand people in the in the population who had influenza an illness. So one model for that would be a very simple beta binomial model. And that's obviously very simple to fit.

09:43

But of course, in practise, we never really know how many people there are in the in the whole population who have influenza like illnesses because you can't, you can't, you can't go out and ask every single question in the population. And so there's some degree of uncertainty about that. So we could represent that with a with a prior on long little end, which might be, say, a plus on distribution, for example.

10:06

But then imagine that we have new data from a similar geographic area, for example, where we observe where we've managed to go out and count how many people in the population have influenza like illnesses. And so maybe about 40 out of 500 people. And so then that gives us another beta binomial model.

10:28

And if we assume that the populations are similar, then we can take the the posterior distribution of this, a key parameter which gives us the an estimate of the proportion of people in the population who have influenza like illnesses. And then, um, if we know the total number of people in the in the original population, then we can plug that Q into another binomial model.

10:53

And that gives us a second estimate of whito and the number of the number of people in the population with influenza like illness. But now we have not just this one with Model four, and we have that original prior that we had four in. And so we have to try and resolve the fact that we've got two two models for the same quantity. And of course, one option would just be to throw away this question prior that we had four in and take this.

11:25

And as as the truth, but of course, this is coming from a similar area and maybe we've done a lot of work to elicit, for example, that a plus on prior. So it would be nice if we could not just throw away one of these prises or models, but we could actually integrate them in some way. Things got even more complicated if you had a stratification of your original population. So you've got two age groups, for example, and now we have them. So we've added that I subscribe to all of this, the model.

12:01

And so rather than having a direct prior for the total number of people in this population and we have a model for and one and two where the sum of n1.2 equals with away. So now we have a binomial model for end to end that comes from this similar data area. And then we have a second model that's the that's a deterministic function of two other press. This obviously, here is a very trivial, deterministic function.

12:41

And it turns out that these two problems are the ones that make this problem difficult. In general, this problem that you have two separate models of price for the same quantity and sometimes one of those models may be related to the original parameters by deterministic model. So graphically, that's what this example looks like, we started with a beta binomial model where these squares represent data in the double circle represent parameters.

13:11

We then acknowledge that we're uncertain about with so and that becomes a parameter. We then added a second model from a similar area, so we now have two models for this quantity. And and then finally, we added that we split up this way into y one and way two. And that then meant that the repeated quantity was the output of of a non-invasive or deterministic function.

13:34

And in this model here. OK, so the return to what the aims of this are and more and more general, we imagine that we've got several models, each of which involves some common quantify. And what we'd like to do is create a generic way of joining all of those models together into a single giant model. In some cases, this will be very easy.

13:57

It turns out that the cases where it's hard, where you have an implicitly two different price for the same quantity, which doesn't make sense in Bayesian inference, and secondly, where you need to handle these non-convertible, deterministic transformations. We'd also like to do this in a what we call sort of staged or modular way so that you don't have to think about the whole model once. But you can kind of gradually build up inference to the social model.

14:27

And thirdly, we'd also like to understand the reverse operation, which is kind of the opposite. So when you take the existing large model and want to split it up, and that can be useful for understanding what's going on, or potentially it could be a useful inference method.

14:46

So he has been attention and good to use throughout, so we imagine we have capital and separate models or models, each of which involve this common parameter phi and some stuff, model specific parameters, Siam and some sub model specific data. Y m and what we'd like to do is create a generic method for integrating these two M different models into a single data model that involves all of the quantities together. That's what we're aiming to do.

15:21

So it turns out that a special case of this had been considered in the 90s in a slightly different context by Phil David and Stefan Lauritsen. So they considered the special case where the marginal, the prior marginal distributions for the common quantity were all equal. So each of these sub models have the same prior to the common quantity. If you have that, then it's relatively obvious what a sensible model should be.

15:51

So we have these capital m separate sub models we then can depend on by which then gives us the sub model specific conditional here and then the the that the prior marginal. And once you've done that, then it's fairly obvious the sensible model would be that we have this common prior to the buy. And then we take the prototype, the model specific conditionals. And this is called a mark of combination. So and, Stefan.

16:29

And it's also been examined a bit further by Massa and Horowitz and more recently. There's a message, but I don't know how to achievements if it's important, it's like, Oh, OK, this is it. I'm just not. I just need you to talk more. Is that better?

16:51

OK. OK, so the properties of the mock up combination model are the sub model specific parameters and data, a condition independent given the the common quantity that the sub model specific conditional was given the joint the common quantity of preserved between the Markov combination model and the original models. And and indeed also the this sub model marginals. So the joint distribution of each of the marginals preserved and the mark of combination.

17:27

So what we were looking at was a more general problem where these marginals may be inconsistent. Hopefully that doesn't seem too contrived given the examples I showed you at the beginning. So we need to resolve the fact that we've got these inconsistent marginals for each of the from each of the sub models. So what we're going to choose to do is pool these together.

17:49

We have some function g that takes us in input each of these prior marginals and spits out a single marginal how it's going to kill people. And then once we've done that, we're essentially back in the same situation. So it's quite obvious that a quite reasonable that the Joint Model for everything should be this prior marginal that we've pulled together multiplied by the sub model specific conditionals.

18:15

The properties of this, which we call mark of melting, are quite similar to Markov combinations, the conditional independence and some model specific conditionals are preserved again. But now the the sub model specific marginals, the joint distribution of each of the sub models is no longer necessarily preserved because we've changed that the marginal distribution on the common quantity.

18:45

Five. So we call this the mark of a bit of this comes from the mark of combination, I'll explain the melting bit in a minute. The connexion to where that word comes from. So how are we going to go about forming these priors? Well, there's several options.

19:05

It turns out that this problem is basically the same as a problem that's been talked about since the 1980s and prior elicitation, where people have asked multiple experts what prior they think should be chosen for a particular parameter and the need to combine those separate phrases together in some way.

19:23

So the the options that have been suggested are what's called linear opinion polling, which is basically just a mixture of each of the the sub model specific priors weighted by some quantity w logarithmic opinion polling, which is almost the same, but on the log scale or product of experts, which is the same as well comparing.

19:46

But with these weights all set to one or what we somewhat hoshiko dictatorial pooing, which is where you set the pooled fried to be one of the original sub model priors, so you threw away all of the prise. Apart from the one, a big. As what these look like in one in a couple of settings to the inputs are these two black densities of Gaussians.

20:15

And then the output density is in is in the colours speaks to that in this situation, where there's some degree of difference between the two input densities that, in your opinion, will be a mixture of the two input densities, whereas the wall can predict express where it's being the most concentrated.

20:40

Several properties have been investigated and in the same literature, one of them's property of being expected to be made, in which basically the idea that if you perform it at accruing on each of the justices is that you get from the sub models, then you should get the same as if you pool together and then calculate the posterior. And it turns out that this property holds if you have logarithmic going with that with the some of the weights, something up to one.

21:13

Unfortunately, this property doesn't really make sense in the context we're interested in because here we have different likelihoods, potentially and different data in each of the sub models. And so the log pooing externally Bayesian property doesn't even hold for about pooing in our context, unfortunately. So while it might be a property that would guide you, it's not very useful for us. So I said, I'll come back to where this word melding came from. So it comes from.

21:42

This paper by David and Adrian Raftery in 2000, who considered an inference for a deterministic function f. So the f here was a fixed differential equation model of the number of whales in the sea. And so the standard Bayesian model for this would be that they had these observations of the number of whales in the sea, and they say that their noisy observations of the output of this deterministic function and they have got some prior. So the inputs of that deterministic function feature.

22:11

What was unusual about Prue and Raftery setting was that rather than just having price on the inputs deterministic function, they also had price on the outputs, which simplistic function asked. So they wanted to kind of constrain the outputs of this differential equation model in a sort of soft way by imposing a price on the output. So now they have not just one price for five, they have the that's directly specified.

22:35

They also have if you transform this profit to buy, if they have again have to price for the outputs, deterministic functions, they need to resolve this in some way. The solution is basically that you, even if this f is is not investable, you extend it to an investible function and then you back transform the prophesy onto future and then you pull those two price 52.

23:00

And it turns out that the price that you get in the end is and doesn't depend on the way in which you extend that and not investable function turning vertical function. So that's where the that's where the melding comes from. OK, a few final miscellaneous notes on this. Of course, this this procedure gives you a joint model for all of this.

23:23

All of the data, but it could be ludicrous, especially if the if the sub model is strongly conflict with each other, there's no guarantee that it will make any sense. And another thing is that I've said this, this idea is we're trying to create something that is modular. Some of you might be familiar with the idea of distributions, which are another idea another conception of modular module, a modular approach.

23:50

Her mark of melting is is is quite different to mark of melting creates a full joint Bayesian model. And if you believe that that folding model is appropriate for your setting, then it's just standard Bayesian inference at that point. There's no changing of the posterior distribution, right? There isn't cuts. Finally, there's an approximate approach that it turns out is an approximation to mark up notice.

24:18

I said at the beginning that we were sometimes people plug in and some kind of simple parametric approximation to a posterior distribution into a second model rather than just plugging in a point estimate. And one of these turns out to be an approximation to mark of noting. So if you if you, first of all, obtain the posterior distribution for your first model and then you approximate the posterior margin of the common quantified by a normal distribution.

24:49

And then if you plug in that normal distribution, so if you plug in that normal distribution to a second model saying that you've got the observation for the normal distribution is that point estimate from the first model and that the variance covariance matrix is the estimate experience covariance matrix in the first model. Then it turns out that this is equivalent to Markov melding with a specific form of pooling with this product of experts pooling.

25:14

So that can be useful in practise if you want a quick approximation to to this. So turning now to inference, and so the writing down the posterior distribution is, of course, very straightforward. It's just proportional to the to the to the melted distribution here. In the special case of this product of experts going where you say the world prior is the product of the marginals. And of course, these cancel out and you get to something even simpler.

25:49

But in general, you will if you don't use projects, that's going, you'll need to. You need to have some estimate of these marginal price from each of the sub models PM five. In some situations, those will be analytically tractable and everything will be fine. But in the situations we've looked at, generally, these fires are not kind of root nodes in a graphical representation. And so you need to estimate these marginal prices.

26:19

The way we've done so far is just sampling from those prior distributions and then approximating them with a kernel density estimate. This obviously won't work very well with anything other than a two dimensional common quantity. Five. So once you've won, if you've once you've done that in principle, something from this posterior distribution can be done in using standard methods.

26:47

You can use a metropolis within Gibson Bush, for example, where you sample the sub model specific parameters conditional on the common quantity in one step. And that would be exactly identical to the sample that you would need for just the original model. And then in the other step, you can sample the common quantity conditional on all the sub model specific quantities, and you just need to come up with a reasonable proposal distribution there, which may or may not be straightforward.

27:17

But while we've been more interested in is a more kind of modular approach where each of the where you can, where you can sample from one one sub model and then gradually build up towards the full Joint Model. And this is a this is what we call a multi-stage algorithm, and this has been explored by a few a few papers, some of which are cited there. So I'm going to tell you how this works in the in the special case of two models. And so we have.

27:46

Two models, one of the P one and the second one, P two and what melting show tells us how to do is form a joint model for all of these quantities here. So to do computation in this case, what we first of all day as we draw samples from the first model and the posterior distribution of the common quantity under the first model, which is P one five given way one. And then what we're going to do is use those posterior samples as a proposal and M.C for the full Joint Model.

28:20

And it turns out that this means that the likelihood terms relating to the first model cancel in the same C and so your second stage M AMC doesn't require any knowledge of the first model apart from those samples. So in this sense, it's a sort of modular approach to make that a bit more precise. We're taking drawing samples from the first model's posterior distribution and retaining them.

28:46

And the second stage, we consider the phone model and to draw samples for the sub model specific parameters of that sub model, we just use a standard method. That would be used if you you're just fitting that model by itself. But for the so the common quantity five, we draw that by drawing a sample from these samples and stage one uniformly at random, essentially drawing a sample from this posterior distribution here from the first model assuming everything's converged.

29:22

And then if you write out the usual metropolis Hastings acceptance ratio, then because we've set the proposal distribution queue to be equal to the first stage is the likelihood something that's proportional to it. Then these cancel out and the acceptance probability leaving us with an acceptance probability that doesn't depend on on p one at all. Apart from its prior marginal, which you said before you need to estimate somehow if it's not directly tractable.

29:54

And of course, this can, at least in principle, extend to two any further stages if you've got more than two models. So let me now talk you through the more substantive example that the original toy model was based on. So this was looking at estimating the probability of hospitalisation from a specific form of flu, which was the H1N1 strain that went around England and in 2010. And what we're looking to do was estimate the total number of ICU admissions that happened throughout that well,

30:30

not pandemic, whatever you call something that's less than a pandemic. And we had various sources of data. One of them was weekly numbers of suspected cases of a H1N1 in several schools in the UK. The second one was positivity data, so not all of these suspected cases really did actually have H1N1, but we for a sub sample, we had confirmation of of whether they were H1N1 from further testing.

31:05

And then we also had various other indirect data, like the number of GP consultations, the number of hospitalisations that didn't get to ICU and the number of deaths, et cetera, et cetera, which to make this province. Not too complicated of this secondary data, I've simplified down to an informative if prior. So the models work like so. So the first model is this model for the intensive care unit data. So these data, why are the weekly number of suspected cases of H1N1 in each ICU?

31:42

That's related to this parameter feature, which is determined, which has lots of components that represent a kind of birth death process of people coming in and out of ICU. But of course, all of this bit of the model relates to suspected H1N1. So we then if we want to estimate the true number of H1N1 cases, we need to relate that to the positivity data that is available for a subset of these patients, which is represented by this PI pause quantitate.

32:13

And if we combine the suspect estimates of the suspected number of cases and the positivity data, then we can calculate the number of confirmed cases or estimate the number of confirmed cases, which is going to be a common quantify in this case. The second model is highly simplified to make this vaguely understandable. So it's just a binomial model in this case, and so we interested in.

32:42

So the key here is at the end in a binomial and we have some proportion of of this list and quantity are actually key cases of confirmed H1N1 in an ICU. And we've got a very informative prior on this. This PI here. Sorry, on the K. Yeah, that's that wraps up all sorts of other bits of model. So we'd like to combine these two models into a single model. And what we're we used melting to do that which tells us how to how to form

33:17

a joint model from these two models that don't work immediately compatible. We had to combine the priors and this this y here was sorry. The five here was split into two separate age categories. In this example, and so the priors here are by various distributions across the x axis. Here is in one age group and the y axis is another group. I think this was children and this is adults at the y axis.

33:47

So these on the left are the original prise that we had and four, four, five one and five two in each case. So the ICU model had a very flat prior, whereas the severity model, I said, was informative. This is quite piqued. And then on the right hand side is what you get if you pull those two together. In this case, because the ICU model is so flat, you get something that's quite similar to just the severity model by itself.

34:14

These are the results for the for the five parameters in the second model. So the top line shows the posterior distribution from the first model alone. That's the second. The second line is the posterior distribution from the second model alone. So you can see some degree of difference between these two, whereas the bottom few rows show what you get from the melded model, using various different pooling techniques and also using the normal approximation method that I mentioned as well.

34:45

And what's quite quite reassuring here is that there's a fair degree of similarity across all of these different viewing methods. But the variance is reduced by combining the two datasets together. I'm going to skip this example for the time. And so as I mentioned before, that you can also do the reverse of this process so you can start with a Joint Model and then split it up into sub models.

35:15

And you take these sub models to be faithful to the original model in the sense that if we then join them back together with melding, we'd get the Joint Model again. And this could be useful for dividing up computation in a really big model. Or it could also be useful for improving understanding when when you're not quite sure what's going on in a very big model, it might be useful to to work out which parts of the model is providing information.

35:43

So you can obviously only do this if you've got conditional independence between the two different bits of model that you want. So graphically, you can't have a kind of B structure of colliders. If songs you have that, you've got a fair degree of choice about the sub model specific parameters that set the sub model specific prior that you'd like to use for each of these components that you split into.

36:09

So, so long as when you join these sub model specific priors together with whatever approval you're using, you get back to the original prior, then you can do whatever you like. For example, if you're using product of experts point where you just multiply the components together, then you could use a fractionated prior or indeed any other factorisation of the original prior into M factors.

36:35

And so we use this in in a model from ecology, where they had two different sources of data on a common quantity fire, so they had mark recapture data where they where they capture animals and then tagged them and then catch them again, which is why. And also data on from an index where they recorded the number of birds that were in this case. And we can split that Joint Model for all of these quantities into two separate models.

37:10

And then what was reassuring to us was if we apply our two stage algorithm to fit this model and then that model afterwards, we get something, we get results that are very similar to what we get if we felt the full Joint Model together. So that's what that graph shows. I will get better as well time. So the multi-stage algorithm that we proposed in principle is very nice because it lets you split up the computation into separate parts.

37:41

But as probably many of you are guessing, it's not perfect because you are. You're simplifying your you're using a oh, quite simple proposal distribution. But there's another problem, which is that we are needing to estimate in our acceptance probability the ratio of these prior marginals p one at two separate points whenever we make an m m c move for that common quantity. And what I said at the beginning was that we were going to use density estimation for estimating that ratio,

38:19

and that definitely won't work if it is as high dimensional. But even if it's not high dimensional, we are if we just have samples from.

38:30

From these he won by quantities, then we won't have very good estimates of that quantity in the tales from kernel density estimation, and because we've got a ratio here, even if we get this quantity on the denominator out by a relatively small amount that can blow up very quickly and can mean that we accept moves that we shouldn't really accept so we can get we can move out from the main mass of the distribution here and into some point in the tails and end up getting stuck there.

39:03

You can do a little bit better if you if, rather than just drawing samples from these margins directly if you draw from some weighted version of that. And that's what this this paper here describes. OK, so so far, I've talked about melting when there's a single quantity that's common to all of the the models, but not all situations are like that. You might have models that are related in different ways and one way that you might have that would be some models that form a chain like structure.

39:38

So if we have several models and then we have a kind of Venn diagram where we have a common primary to fire one in two, that's common between Model one and Model two. And similarly so on. So the original formulation of melting doesn't tell you what to do in this situation. So the invitation for this set up is that we have capital separate models. And again, the sub model specific parameters and and some of the specific data, why.

40:09

But now, rather than having a single fire that's common to all of the models we have and I am intersect and plus one which is common to model m m plus one. So we have m minus one common quantity shared across and different models. And again, what we'd like to do is find some generic way of forming a single joint model for all of these quantities.

40:37

So what we propose is similar to the original melting idea, so we take each of the sub models divide three by the the prior marginal for the common quantity in that model and then multiply all of these together and then we replace that prior for each of the common quantities by a single pool for all of the common quantities together. I know that this won't be the same as providing as performing the common fine form of melding twice in general, except if there's if there's independence in this.

41:18

If these two these these two countries in the middle model a priority independent. And of course, we can generalise this case for equals three into in the obvious way to a general number of models capital. So how are we going to form this period prior in this case, and now we need some function g that takes a prior for each of the common quantities all the way up to Model M, where all of the middle priors are by variety.

41:52

Each of the Pfizer univariate and end models at the end quantities are a univariate. This is a little bit different to the original pooling set up. And however, you can nevertheless just apply a logarithmic opinion point in this context. Just multiplying all of these price together and taking some power that you called lambda here. So that will give you a valid probability distribution when your opinion polling is a bit less obvious.

42:26

What what you get if you add together a univariate density in a Bay Area density, it's not not terribly obvious what you should do here. And the nearest analogue that we've come up with is that you marginals of each of these by various densities and takes the linear pool of those by various those univariate marginals, and then take the product of those marginals to give you your full dimensional approval density.

42:56

But obviously, this induces prior independence, which may well not be what you want. So I'm not certain that that's a that's a great option, but it is an option. You can also do something that's analogous to dictatorial peering. So essentially in this setup, you have two choices of prior for each of the quantities, and you have to figure out some way of choosing one of those for each of the quantities as the several ways you can do that.

43:24

So here's an example of how this works out and in in a few cases, so the setup is that we have one. This is live on the x axis. We have one of the input densities, but here is just a normal. And then on the other axis, we have the the marginal distribution for the second quantity. And then we also have this this joint distribution between the two. So this here is the prior model one this year, the prior Model three, and this is the BI variant prior in the middle model model to the blue.

44:06

And then if we combine those together, then we get on the wrong way around. The red one is the sorry. The blue one is the input, isn't it? And the red is that is what you get if you vary the the priors so you can get if you vary. The quantities in the linear invoke points of the left hand column is working in the right hand column when you're playing, so you get various different choices of overall pilled prior.

44:32

OK, then finally, an example of this change set up that was inspired, some work I was doing on on COVID. So in COVID, in the worst case, you end up in intensive care. And one thing that's of interest in COVID patients in intensive care is when you reach what's called respiratory failure, which is defined by your ratio being less than 300. So these are three example patients with the F ratio three times on the x axis.

45:03

So the y axis. And what we're interested in is the time when these this quantity crosses the red dotted line. So as you can see in these examples, there's a fair degree of uncertainty about when when respiratory failure has been reached for each of these patients. And so what we'd like to do is understand what determines when you reach respiratory failure while accounting for this uncertainty about the time when the events happened.

45:30

So we essentially have a time to event problem with an uncertain time. So one of the things that might influence how quickly you reach respiratory failure is is this thing called cumulative fluid balance, and this changes through time. It depends on the treatments that you're being given. And in particular, what might effect is the rate of of the commute, the rate of the cumulative fluid balance. So this is essentially the the slope of these of these lines.

46:04

So this, of course, varies through time. And there's also baseline factors that might influence how quickly you reach respiratory failure. So we want some way to combine the uncertainty about when the end point has been reached. The uncertainty about this cumulative fluid balance rate and also the uncertainty or the fact that there are baseline risk factors that might influence this.

46:32

So what we're going to do is we're going to have three models, one model, which is a model for this ratio data, which is going to be a baseline model, which is represented here with the Black Line and the uncertainty in the grey. Then we're also going to have a model for the for the cumulative fluid balance, which is just going to be a peace prise winning our model. And then finally, we're going to integrate those two models with with the time two event model for respiratory failure.

47:03

So in more mathematical detail. We have a standard baseline model for the F ratio data. And as a function of that, we can calculate an estimated time of respiratory failure by solving when we first cross the three hundred point, then we've got a, uh, a a piece y is going to be a model for the cumulative fluid balance. So just with with two pieces, and then we're going to take a quite simple time to event model, which is a valuable time to event model.

47:40

So these models are related by the fact that the the rating, the this quantity here in the time two event model, the the what we get to we differentiate and it comes from the cumulative fluid balance model. So together, these two things are what's called a joint model in the statistics literature. So we're combining together a longitudinal model here with a time two event model in Model three. So what we're adding to that is that the the time that's related to this hazard is itself uncertain.

48:20

And it's been estimated from this this model model one here. So graphic, what we have is we have the the Beast plane model for the F ratio model here and P one. And that is the quantity five one in SEC two here is that is the time when you reach respiratory failure. And that's related. Oh, that's the that's the time that goes into Model two, which is the time to event model.

48:53

And then the time two event model depends on the rate of cumulative fluid balance intake, which comes from this third model, which is the which was a simple, piece wise linear model. So we want to integrate all of these three models together, and it turns out in this case that the results that you get don't really depend on which type of pooling you use in this case, so that the pooled results are the red and blue curves.

49:23

But you can't really see the blue car because it's completely underneath the red curve. But that the result you get from from that hearing is different to what you get if rather than doing melding what you do is you fix the in your in your middle models, the the time to event as there is a point estimate from your first model and the point estimate from the third model, which is that since the fluid balance model and this is the APF ratio model.

49:55

So there's a there's a shift in all of the key parameters in this case. So it shows that at least in this case, accounting for the uncertainty by propagating it through the three models, it does make a bit of a difference. So to summarise, Mark of Melting provides a generic method for joining together different sub models that either share a single common variable or that are linked in a chain like structure. The key idea is this idea of pulling together prior marginal distributions.

50:29

But of course, it probably won't make sense if the strong conflict between either the prior marginal or the the data from each of the separate models. The multi-stage algorithm allows you to conduct inference for the full joint noted model in a sub model specific stages, which might be easier or more convenient than fitting the full model directly. But it can be a bit unstable in some cases, and weighted KDE might help with that, or at least one aspect of the challenges of that.

51:01

So returning back to my original problem that I set out where we had these four different types of data in four different models, do we know how to integrate all of these together? Well, I think the honest answer is is no. I think we're a long way away to do this in general. This what I've described today will work if you've got quite low dimensional common parameters and relatively little conflict between the different models.

51:28

And so I think there's a lot of work that could still be done in this area to provide a truly generic method that would work for the scale and complexity at least of biomedical data. And finally, thank you to my collaborators, particularly Lawrence Bernice, who I did a lot of the early work on this with and more recently with Andrew Mendelson, who is a Ph.D. student with me. He's done a few bits on this and also to and David and Danny as well. So thank you very much. The president said.

52:09

Survivor Grissom. I mean. Yep. One of the star would have a possible way, choose an example, it will say. Yep, yep. So the question for the online people is would it be desirable to have a principled way to choose amongst the different viewing methods, even though in the case, the case that I showed doesn't make much difference? I think it would be great if there was a way.

52:47

So we we thought at first that some of the the property of external Bayesian pulling might be a justification for rogue doing in this case. And but unfortunately, it doesn't really apply or it doesn't. Well, that probably doesn't hold in the setting that we're interested in. There's also other properties that people have looked at in the prior proving literature, which are related to whether specific A.R.T. properties hold and not going to be able to formulate it properly off the top of my head.

53:19

But there are there, so there are some more properties that you'd think would be desirable. I seem to remember it's one of these cases where there's three properties that all seem quite reasonable that you'd like, and I think someone's proved that you can only have two of them. So I don't think that's going to be a perfect method, but.

53:38

I don't I don't know whether there's a generic way of choosing, I think, um, I guess you'd have to specify some specific criteria we're trying to satisfy, maybe maybe in the prediction setting, maybe there'd be something generic and the kind of get out of jail free answer is that it should be subjectively chosen like Oprah as an innovation model, but that's not very useful. And so. Britain's. The question about building as well.

54:20

Now this is the way it. Yes, exactly the same is basically the same question as Jeff's question, which is, I repeat for the online people, which is how would you choose the weights when you're doing, when you're doing polling? I think that's I think it is much the same as my answer to Jeff is that I don't know that there's a generic way of doing that unless you I think if you specified a specific and objective, maybe you could do something. But I don't I don't know of any way of doing that.

54:52

But yeah, great if we could find a way. Great. Thank you.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript