Introduction to Bayesian inference for Differential Equation Models Using PINTS - podcast episode cover

Introduction to Bayesian inference for Differential Equation Models Using PINTS

May 21, 202157 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Ben Lambert, Department of Computer Science, University of Oxford, gives the Graduate Lecture on Thursday 6th May 2021, for the Department of Statistics.

Transcript

I was just just having a meeting with some students about talk about. About machine learning for sars-cov-2 main proteases, developing models to help discover new inhibitors. Yeah. All right. OK, so. We should get started. So it gives me great pleasure to introduce number who is now another co-director of the subsidy, and you're connected to the RC group and computer science as well, research engineering. And I should also plug your book, which you can buy on Amazon.

And it is all about Bayesian statistics. Give me the correct title. The title is called A Student Guide to Bayesian Statistics. There is got to if you look at it on a on a bookshelf, it's got to kind of chillis on it and the vegetable won't go into vegetables on the front of it, which we call and quite confusing. So if you need to identify the MasterChef rating and to find excellence and is there a new edition coming out soon or there is in it in a year or so, it's time. So, OK, I still get it.

OK, so what you can say. Yeah, this, this talk is going to be recorded and it will end up on Oxford podcast eventually under the of statistics. So if you do want to ask a question and then we'll be stopping during his talk to ask if there are any questions, make sure that if you if you come off video, you're happy to be recorded. And if you don't want to be recorded, just type your questions in a chat and then I will listen to that.

So without further ado, I'm going to hand over to Ben and if you'd like to share your screen. I will do. I was doing this before I do, and I was just trying to trying to get a portion of my screen, so oh OK. So I thought that looks about right now and just that does not go through with it. Change as it is changing. Perfect. Great. Thanks. And thanks very much for the invitation to talk today and the statistics about what is actually happening with the statistics department.

Because when I was about 16 years old, I came to the night and dying down in the basement of the statistics department. And so whenever I go there, I get a bit nostalgic. Well, hopefully we're all able to go into the department at some point soon anyway. So today I'm going to give an introduction to Bayesian inference for differential equation models. And if you're wondering why we already said a few words.

I'm a statistician based in the Department of Computer Science and I essentially work on data science, machine learning and statistical inference problems for different research groups across the university. I've been a user of statistics for the past, I don't know how many years I worked to an industry before I came back, FEMA and very crucially for this talk, I was born in the same town as Thomas Bass.

And I actually went to school there, which is Tunbridge Wells. And here's a here's a picture of Tunbridge Wells station. And it still looks quite similar to that today. So today, I'm going to have a mixture of things because I thought, look, what we want to talk to be is partly pedagogical and partly research. So firstly, I'm going to provide a really, really short introduction to the it's just a simple example.

Then I'm going to talk about how you actually go about doing inference, formulating an inference problem for ordinary differential equation models. And then I'm going to talk briefly about how it is very, very difficult in practise to actually do exact spacing influence. And so instead, what you do is you do some sort of approximation. An approximation typically happens by some sort of computational sampling.

And then finally, I talk a bit about a Python package, which we created in the Department of Science, which specifies immigrants for ordinary differential equations, models, and that's called points, which I can't remember the acronym because I think it's probabilistic inference for noisy time series models. So, yeah, so if we get started with a sort of short introduction to basic inference.

The example I'm going to give is a margin that we want to estimate disease prevalence within a population, and so we're going to suppose that we take a sample of and study participants from the population. We take their blood, and then we apply some sort of clinical test to determine presence or absence of a disease. And we find that X of those individuals are disease policy. The question we might have is how do we use these data to estimate disease prevalence and hopefully with uncertainty?

Well, there are aspects of the data generating process that we don't know about, we don't know exactly how the sample of individuals was formed, for example, and then we're going to use a probabilistic model to try and explain the data. So how do we choose what sort of probability model to use?

What we need to think about the characteristics of our data, first of all, sample size and is fixed, we only sample individuals so we can only possibly have as a maximum and and individuals with disease processes and we can have any integer value up to that.

So it's a discreet probability distribution we're looking for with a bounded non-negative and sort of support for the data and again, so that narrows down also probability distributions, then we need to make some assumptions so we can assume that those individuals that retrieve from the population represent independent samples and we can assume that those individuals are drawn from the same population.

And if you Google all these things, these these various assumptions and characteristics, but it turns out that a single probability model satisfies those conditions, which is the number and I've written down by my new probability mass function, which I'm sure you're all familiar with in school. And basing inference, what we want to do is we want to essentially estimate our parameters of all probability.

So she's going back. The promise of pizza represents the prevalence of disease in our population, and it does so if we assume that the clinical test that we're using has is essentially perfect. So under those assumptions is the disease prevalence, and that's the parameter that we want to estimate. Bases Rule gives us a sort of mechanism for estimating that Prometa written down here, but what each of these terms mean in group.

So I'm going to kind of step through these individual terms and then what we can do is going to see how changing these individual terms actually influences our results and influences. So the first time on the right hand side is something which is known as the likelihood. And it's important to note that the likelihood is actually not a probability distribution as it's used in Bayesian inference, because it makes a difference to variance and we hold the data constant.

And so it's a function of pizza. And that function of pizza does not satisfy the conditions for distribution to integrate pizza coding exposed and then it wouldn't integrate to one. Importantly, people like to bang on about how it happens is quite subjective. We use the sort of wishy washy, but in my experience that one of the typically the most subjective decisions that are made about an analysis or how you choose the likelihood.

And so I want to highlight here that the the likelihood it so often contains many, many subjective, I guess, assumptions. And then the second time on the right hand side in the numerator is what's known as the. By contrast, and the likelihood is a valid probability distribution and similar to the like. It is also subjective. And then the final time on the right hand side and the bottom is 10.

It's got many names identical to the denominator and it's got kind of two different interpretations before we actually collect the data. It is what's known as a prior predictive distribution. So it's actually about a potential date to some things that we could get given all of to and something.

And then once we are there, it's just the number that normalises the area and that's known as the evidence or the likelihood, and that's calculated from that and it's entirely calculated from the numerator. And as we'll see that later on, calculating this denominator is the source of much of the problem in doing. Exactly. And then the final term in baseball is what's known as the posterior.

It is the goal of Democrats because it is what we want to do is we want to summarise our uncertainty about some quantity that we don't know about the disease prevalence using probabilities and probability distributions, and because that's the only way to summarise uncertainty. And as I say, it's the starting point for all sort of further analysis and basic. Now, I want to talk a little bit about the intuition behind doing Bayesian inference.

So if we run down this rule again, then we can see that actually the numerator is the right hand side doesn't contain the promise of Caesarism. And so the posterior is essentially proportional to the product of the likelihood in the front. And so what that means is that the posterior is essentially a kind of weighted geometric mean of the palm, the likelihood. And so all of its kind of shape is determined by the product of these two terms.

And that's what I want to emphasise. Now, if you animation's. So I'm going to imagine how we go out and we collect blood samples for 10 people and we find that three of them are disease. And now what I'm drawing here is a potential crime. And I've chosen a uniform criteria for the disease prevalence between zero and one zero one hundred percent, depending on how you think about it, the disease prevalence all equally likely.

And then below that, I can show the likelihood that likelihood is that's three out of 10, because we found that three out of 10 individuals were disease. And the posterior is the product of the likelihood because the prior is the to speak to that sort of point three, then the posterior is also pointing to the same. But now what I want to show you is how posterity changes as I change my prior.

And so I'm going to run a quick animation which shows that as I changed what prior distribution I'm using, then the posterior distribution shifts. And what we find here is that the peak of the posterior ends up being somewhere between the peak of the prior and somewhere between the peak. And so it's kind of, as I said, this weighted average of the prior on the. So austerity is affecting both of our prejudices and data, telling us about the values or promises of our modern.

Now, I want to show you a slightly different thing, which now I'm going to hold the prior constant and I'm going to imagine that we collected different types of samples. So I'm starting off here with imagining that we had a sample size of 10 and only we didn't find any individuals with disease. And now we see that the likelihood is zero because that's the maximum likelihood estimate of of the primitive.

And now what we can see is that as we collect different data samples, my likelihood shifts and because my head is shifting, my posterior is also shifting as well. And we find that the position of the peak of the area is somewhere between

the peak of the problem because we're lucky to be found in the previous case. Now, finally, I want to show you what happens if instead we imagine that we had a fixed price and a fixed proportion of individuals who have the disease, but we increase our sample size, so we start out with three out of 10 individuals who have the disease, then we get 30 out of 100, etc. So we're just keeping that proportion the same, increasing the sample size.

So before I start running some animation, we can see that with a sample size of 10, the posterior is somewhere between the peak of the pro and the likelihood. But as I increase my sample size, we see a couple of things happen. Firstly, we see that the area becomes narrower and that makes sense, right, because as I collect more data, then I should hopefully get more confident in my estimate, but something else happens as well.

We actually see that the position of the austerity shifts over towards the position of the. And that, again, is a desirable property of fazing influence, which is that as I increase my sample size, then ideally the private, I use it much. That's not always the case because the more complicated models and you may not ever be able to invest in promises of a model for a model like this, then that's true. So hopefully that's provided some intuition for you.

That's what I wanted to ask if anyone had any questions at this point. There are no questions. If not, that's fine, I can continue on and there's going to be more opportunities, questions, and so just curious how it's implemented. What's that? Sorry, how is that updating Grauwe implemented. OK, so do you mean to be supervised or do you mean is it just an animation. If so, it's not to get these sorts of things is always very good for the animations. Great.

OK, so that's hopefully provided a little bit of an instruction device, Newgrange, and I'm sure that a lot of you are familiar with the technology. So but I wanted to do that just to provide a little grounding for the rest of the. So now we're going to kind of step up perhaps the level of difficulty a little bit and we can talk about how we formulate Bayesian inference, the problems, ordinary differential equations.

And we can imagine now a slightly different example where we carry out a series of experiments where we inoculate some patients with bacteria, some initial time and then predefined time intervals, we can't count the number of bacteria on each pipe using some sort of experimental approach. And imagine that what we're trying to do is to model the bacterial population growth over time.

So I've got some sort of fictitious data here which shows the accounts of bacteria over time that have been measured and what we want to do is develop a model that it kind of explains the. One model that might be appropriate for this is the logistic model. So this basically contains it's a differential equation that contains two terms on the right hand side. One of the terms is essentially reflects initial growth of the bacterial population.

And then so that's the term alpha here. And then this term beta and essentially reflects how, as the size of the bacterial population grows, then there's a reduction in the growth rate due to some of the crowding that to be competition for nutrients, for example. And so what you actually find in this sort of model is that you get sigmoidal curve, which represents the solution of the. So we've got our data and we've got a model, do we have everything that we need to do here?

So what I could do is I could imagine overlaying lots of potential solutions of the different parameters. But we have a bit of a problem here, which is that none of the models that we're using can fully explain the data. In other words, they've got zero probability of having generated the data because our models are smooth lines and our data set in the uncertainty of it all that not all along those lines.

And so at the moment, we don't have enough information to do to formulate the inference problem here. What we need, we need some sort of statistical model which represents essentially those boxes that we don't account for and are deterministic model. So here I'm going to assume that we've got some sort of measurement error around the true value. So the number of bacteria that we count on a plate at Sonti is normally distributed about the true count, which is the solution of the.

And there is some. Noise Prometa Sigma, which represents the magnitude of the mission about the true value. So I should say here that using a normal measurement error model is not the only choice I could have made and I could use, for example, student distribution. And but it is generally a fairly steady, widely used race and also implicitly almost assuming here that the normal areas and actually just well, I just want to have a second.

I might just kind of turn off my slack, because I think that's going to become quite annoying in a second. I'm just going to quit that. Sorry. OK. So the question we might have is, how does this model work? So the data generating process is that we assume the true number of cells follows the solution of the. And then there is some sort of measurement process which is imperfect.

So that means that we don't actually measure the true number of the bacteria on plates, but that the amount that we do measure is sort of normally distributed around the value. And then to get all our data, then we sample from that process. And so then we can draw data from. And now, because of the statistical model that gives us a possible way of having generated our data, whereas we didn't have that before when we just used our purely deterministic mathematical model.

We needed that information to be able to formulate the sort of proper inference. So how do we write down the likelihood of this circumstance? Well, remember that what we're using is as a normal like to hit that sense of on the true number of cells at the time. And so we can write down the observations by taking the products off the individual bits of each data point.

And we're able to do that because we're assuming conditional independence of all of all of our data, conditional conditioning on the parameters of our model. And so the likelihood of all our observations is sort of the the the probability I get the first observation on the probability density of the second observation and so on all the way up. And so I'm using the capital and stuff with the kind of bolt it to represent my vector of measured counts of bacteria.

The question we have, though, is how do we actually calculate and, well, for the logistic model, there is actually an analytics solution. I can write it down. And that's that is the justification in most ordinary differential equation models that mean representing the deterministic solution cannot be solved. Exactly. And so what we typically do is use some sort of numerical integral methods and to to integrate idea.

And we're going to have to bear that in mind whenever we do inference problems. And also, we know the value of end depends implicitly on the promises of so I can actually write down the solution of using an implicit notation entity is some function of time for to. So then what we can do is we can just sort of rewrite our likelihood a little bit here and now, that explicitly makes it clear that the likelihood depends on on three parameters.

It depends on the great promise of the founding PROMETA and the noise of our model. And so what you see is that when you were actually formulating the infamous public order differential equations, models, then typically you get expert parameters other than just the parameters of your ordinary differential equation model that you need to put in place for that sort of nuisance parameters that represent some sort of measurement.

Some sort of infection. So then what I can do is I can write down our posterior and it has the form, the posterior is equal to the time times. The problem right now is this three dimensional distribution and we have denominate to install. So that's that's how you formulate. Well, that's one way of formulating the infant's problems. The only difference to question was does anyone have any questions at this point?

Yes, I do. Hello. Hi. Could you please explain your Alpha, Beta and gamma, which one of them represent the parameters of? Sure. So here's the equation. And it's got these two properties out from Beta. This is my ordinary difference. The model, as I said, Alpha kind of represents the initial growth rate of the population or indicates the growth rate in population, which is an exponential growth rate. And beta represents a kind of grounding.

And so my posterior distribution is a function of those two provinces and my noise problems, a signal, does that make sense? Yes. And so does that mean and when we are formulating something like this, the only additional parameter that we need to consider is if we just have one additional parameter for the measurement data. So it depends a little bit. So I've used a very simple measurement model here, which is just got one extra parameter.

Now, there are lots of different choices I could have made of the more complex I make that measure model and typically the more promises it has. And so, yes, sometimes it does and other times it doesn't. If you're using more complex measurement. Also another time it is if your ordinary differential equation model essentially has a number of outputs. So imagine you've got a system of order difference equations and and you're sort of using observations on all of those to form inference.

Then then you might have a different measurement. Is the promise of each of those different parts of the system. Yeah, that that. Thank you. Thank you very much. Does anyone have any other questions? No. Sorry. No questions. OK, great. Thank you. So talk about how about how we formulate the inference problem and now want to talk about how we go about actually solving it. And as we'll see the method of solving the problem, is it perhaps a bit messier than you might expect?

And it involves the various approximations. So if we revisit the posterior for our logistic model and imprint's problem, then we see that it's got this denominate here, then they then start. So how would we actually calculate that? Well, because Alphabeat, Beta Sigma, all contiguous properties, then to calculate its nominate, someone need to do a type of integral, essentially three dimensional.

And that's pretty tricky, it's tricky for computers to do any three dimensional, really, at least to do it deterministically. Exactly. So for any sort of problem, doing a three dimensional indigo is tricky. Invasion immigrants doing the interviews that are involved in the denominator are especially difficult.

And that's because the likelihood tends to be very narrow, the sort of space for which the likelihood is not negligible, whereas the prior tends to be really, really white people of the news, kind of uninformed, surprised, and that causes additional problems.

So trying to approximate in different. And in order to difference, the most difficulty is compounded even further, because to evaluate the likelihood of in a differential equations setting actually means that typically numerically integrate differential equations to get the solution. And so is it integral? Did we see an equation? But implicitly, this also involves a whole series of kind of implicit intervals as well.

So suffice to say, there's absolutely zero chance that we could actually evaluate this this phenomenon at least. Exactly. And so we need to sort of realised that we can't do exact and influence. And that's that's not just the problem with ordinary differential equations, models. That's the problem with doing this in general.

So what can we do and this leads me into a different area, which is that I imagine that you want to gain insight into a distribution and not distribution here is I'm imagining we've got this kind of bottomless pit bulls and we don't know really how many cholesterol. We don't know the frequencies of each of the colours. So the question we might have is how can we determine the underlying probability distribution of food colour from from this?

So the answer, which is very intuitive, is that what we do is we draw lots of lots of balls from the end and we count the sample frequencies. So if I draw one hundred bowls from the end and I tabulate the frequencies of light balls, then what we see is that if I collect enough samples, then the probability distribution or the sampling distribution, rather, it starts to converge to something which hopefully represents the underlying probability distribution of those colours within the.

And so what have we learnt here? We've learnt that if we can sample from a distribution. Then we can use the sample properties of the things that we collected to help us to learn about that distribution and we can get quantities of interest. So what's the connexion of this to. There is a probability distribution. It's a discrete probability distribution. The posterior, for example, is also a probability distribution.

It's a continuous one in that case, and there are also discrete posterior probabilities. But in all cases, a continuous one. So the idea behind computational sampling is that if we can construct a way of sampling, draw drawing and values of all properties from that posterior distribution, then we can use those rules to help us to summarise the posterior distribution and approximates it in some way. So a question we have is, how do we actually construct such a song?

And it's not as simple as that reaching into the and with a different colour. Right. So how do we do that? Well, the answer is that you use something which is known as molecule Markov, chain multicolour. So in our example for our posterior, we couldn't calculate the posterior exactly, but we could calculate the numerator of this room, which is the product of the hit in the prime.

And it turns out that this contains enough information for us to construct a Markov chain, which in an infinite sample size draws from the posterior distribution. Now, infants on both sides sounds a bit intimidating, but the idea is that in a finite sample size and hopefully we should have enough samples from our local chain that we eventually get out something that approximates quite well the posterior distribution.

But they're constructed, so they converge asymptotically into the posterior distribution. There are many, many times of multiple Chamberlain's columnists, and they be sort of different uses in different situations, some typically more useful than other ones. And I'll talk about that in a few minutes. I don't want to go into too much detail about Montecarlo methods and sort of motivating them.

Not so much. But I want to provide a the perhaps the simplest or definitely the simplest variant of molcajete Monte Carlo of the oldest variant, which is amongst them, which was created in nineteen fifty three by Nicholas Metropolis and still stands Donelan. We were working on the Manhattan Project that a nuclear physicist interested in neutron. So. This is a sketch of the algorithm is actually not as intimidating as it looks.

The idea is that you start from some sort of arbitrary initial point for each of all parameters, Alpha, Beta Sigma. And then you iterate the following. You draw some proposed values for each of those promises from a normal distribution, which is centred on the previous process and not normal distribution has some sort of proposal with distribution or covariance matrix sigma, which needs to be tuned to be appropriate for a given circumstance.

Then what you do is you calculate the ratio of the proposed posterior probability to the current posterior probability. And so, you know, that equation twenty one would require us to actually evaluate the posterior distribution, which we know we can't do because we can't calculate the denominator, the bicycle. But it turns out we don't need to because the denominator cancels out of this ratio.

And so what we get left with an equation twenty two is actually just the ratio of the unorganised posterior of the proposed point to the normalised. So we talked about this racial thing, which we can do because we can talk about all the things in this room, all the things in the in the numerator, then what we do is we draw a uniform value from a value from the uniform distribution between zero one.

And then if all is greater than the uniform value, then the next set of products bodies becomes proposed. Otherwise, then we split where we were for where we are currently for the next iteration. So so we get two sets of samples from the competition. So. Empty rooms then then all this, but they tend to be exact reject. So you don't necessarily set all of your steps and so you end up with two samples in one place or a number of samples in one place.

So I wanted to now provide a slight that was the sort of mathematical detail of the algorithm or sketch of the algorithm. I'm going to try and visualise a little bit about how. The algorithm runs the time. And so the question I have is, can we use Petropolis to sample from this sort of random continuous distribution of great below? And so obviously this is a problem. And I just know it's kind of weird, funky distribution and ask the question, can we actually use.

OK, so what we find is that if I should have done this, then we started some sort of poverty point and then no algorithm proceeds by proposing values and then only the rejecting that value, in which case quite illustrate the change path, a sort of read and or it accepts that poverty, in which case we get a green transition and the chain moves to the new location.

And so what we can see over time is that our Markov chain moves about and intends to move to the modes of distribution, which is what we want. We want to sample more from the modes of distribution, because that's what having a majority means. It means you generate more samples in that location, in other locations.

And so if I was to leave, this is a much, much longer, which I'll show you in a second, then hopefully the collection of the points, the sort of nodes on the blue path and would represent samples from the underlying distribution.

And so I can illustrate that now on the right hand side, I've got the actual density here and on the left hand side, I've got a reconstructed version of the density which I get from fitting a of density estimate up to my collection with my samples from Random Metropolis. And so with a sample size of one hundred, we see here that I get a very noisy interpretation of the activity. But as I run more and more samples, then we see the distributions that are around below that.

Over time, the distribution from my metropolis routine ends up converging towards the actual density. And so after a sample size of roughly twenty thousand, the metropolis approximation of the density. So it's kind of hard to tell it apart. So it's probably a good approximation to the underlying distribution. Great. So that's that's a very sweet introduction to it's a wonderful Marcovicci Monte Carlo.

Does anyone have any questions at this point? No. OK, well, in that case, I'll proceed on to the final bit of the talk, which is to say a little bit more research that I'm involved in and which is a bit of software called points, which facilitates inference or differential equation rules. So I don't know. I'm sure that some of some of the people in the audience have tried to use or use Monte Carlo to fit models, I don't know if they tried to use ordinary differential equation models, but.

What I found in the past is that often, especially the early stage researchers and people that are new to doing the thing, then typically they fall. But they follow a path which looks something like this, which is that they read the statistical literature and they find a given Markov chain, Monte Carlo method.

And if they understand the statistical, it's which certainly doesn't go at the time and it's very poorly described at the time, then what they do is they type it up that method, and that may or may not be good code. And depending on what sort of software development practises they're using, then what they do is they apply that to their to their problem and they find that the chains aren't converging. So the method is essentially failing, and that can be for a number of reasons.

One of the reasons could be that your molcajete Monte Carlo message isn't appropriate, that it's not up correctly, or it could be some characteristic of your old model and your data, which means that actually doing inference is going to be really, really difficult.

And so then what people tend to do is they then move on to another, they look at the literature again, they put up another method and try again and they repeat the cycle of what we call a cycle of misery until they eventually end up with something that was. And so I think a few of us got a bit fed up with seeing people go through this path, and so we decided to try and stop it.

And. As I say, the reason it exists is because partly it's a communication between the statistical literature is often written by methods, experts, but other methods, experts, and often those papers don't contain high quality pseudocode, which actually makes it harder to take these methods themselves. And and also. If there is software available accompanying the papers that they often it's not actually very well developed, a very user friendly, so it's not difficult to use.

And if you want to move to a new method, then you need to get familiar with the whole new package of doing using that method. And so it takes ages to shift between different different types of. So, yeah, that's that's part of the reason that this cycle exists. Another reason is that ordinary different equation models are particularly problematic for infants because of that non-linear nature.

So this is an example that I've taken from a paper by marcarelli in which he shows the posterior distribution of the two of the parameters for what's known as a good one oscillator model of this often used to sort of represent circadian rhythms and organisms. And so you can see on the left eye the posterior distribution. You can see that it's got all these kind of nasty ridges along. And if you think about Markov chain ones, that's what they're trying to do, is essentially explore those ridges.

And so you need for this sort of method. It's really, really challenging to come up with good multichannels methods that will adequately, adequately explore the space. Remember that this is only two dimensions of a much higher dimensional space. And so it actually gets it's much harder than just how this problem looks. And so you need often different times from these different types of insurgency methods.

So much motivated, this this is something which points and basically what point is, is a zoo, the zoo have lots of sampling methods and it also has optimisation methods in that site in optimisation methods and single value of your properties, which optimises some criteria in sampling. It's different. You return kind of distribution of your property values, which represent some sort of uncertainty. It's an open source Python Library that's available and GitHub was created in computer science.

So how is it different? It's not aligned to the single algorithm and it's designed to interface with other programming language. So, for example, it has an interface that stands. So if you if you try and do inference on your model using stand and you find it fails, then you don't necessarily have to model in the first model. And then using points, you can actually user interface and actually make the transition a bit easier.

It's aimed at Honda forward models, IDs and PD typically, and it allows users the freedom to use their own Ford model solution method so often, and particularly partial differential equations, they require quite nuanced ways of actually solving those models. And so points gives users the freedom to use whatever they want to solve those differential equations.

And you can still the mass difference in a lot of the probabilistic programming methods that are out there at the moment where you have to come up with your problem, solution using essentially their own language. So I'm sure that some of you probably use them and which is really, really good software. I wrote a book about it essentially and really what useful but it's a different sort of nature to what point says I'm done.

It's done is really, really good. If you've got a model which is got lots of lots of promises, dimensions, but the the evaluation of the likelihood is relatively cheap and various points of the evaluation, the likelihood is really expensive because you have to integrate your your definition of the equation. And so it's in a different part of space and sort of needs to to to stand and sort of community act as a guest for politicians rather than to necessarily apply statistician's.

Points a set is a zoo of lots and lots of different animal coaching methods and other types of something that I haven't gotten so fascinated because molcajete Monte Carlo is just one type of method, the sampling area. There are lots of other types. And as I say, we've got already a lot of these methods and points and some more problems and some of these methods, and they placed different restrictions on what you need to be able to evaluate your model, your model to be able to do so.

Some of them require no gradients of the local attitude with respect to the promise of values, and others require what is known as the sensitivities, which is the great promise of all these, which means that you need to get the great solution of your own, your different situation with respect to your partner's values, which is about as difficult as it is to say that.

And then you've got secondary sensitivities, which is one step forward as a second derivative of your ordinary differential equations, solutions, which prompts about this. And as you can imagine, that determining these things the first and second, the sensitivity is really computationally expensive. And so it works well in some circumstances. And in others it just means that it's too restrictive and so and it's too complicated and expensive to do so.

And so you need a great method to be able to do it. And then we also plan to include lots of free methods. So is a place that knows of approximate computation methods and points of entry. And so that's of the next iteration. So with that, I'll finish and I'll just have a quick thank you to the other developers have written five of them here. Michael Cluck's, Martin Robinson, John McClain, those people and the government, we've all been based upon a consensus at some point.

Michael is now investigating Nottingham, but there's also other people that have been involved that just didn't have space to include it. So with that, I'll finish and I'll ask if anyone has any questions. OK, yeah, so everyone, if you can either come off the audio, clap or clapping like that was wonderful.

Thanks so much. Thank you. So, yeah, if you have any questions and you'd like to remain anonymous, just talk them in the chat or if you have your voice recorded, then go to audio on any questions. And can I ask the question, please? Yeah, go for it. And then I thought your topic is really good. And in particular, animations are just so helpful and showing the intuition underlying the concepts that you're talking about. And so I just had a question about prior selection.

And because you mentioned this already, obviously one of the criticisms of Bayesian inference is that it can be subjective. And so how do you address this in your analysis and say that you have like a reviewer on a paper who doesn't agree? Like, how do you kind of argue that that point? Well. Good question. So I was asking questions about how I am. Yeah, very good question. It's a bit of a bit of a can of worms.

There are many different ways to go about choosing an appropriate for our distribution. And so in some instances, then promises to be able to have a very kind of interpretable manner about them and for literature or prior estimates of things then to be directly ported over to become poster of your previous analysis can become a fire. If that's the case, that doesn't happen too often in reality.

So in reality, I think the way that I am, especially in this very different, is the way I'm sort of now thinking about selecting its I tend to do predictive simulations, so I choose a selection of pros and then use sampling to set something resembling sample first parameter values and your prise, and then you empathise with the sampling distribution and then you get out a distribution of your data and that distribution of your data

should hopefully look kind of similar to what you would expect plausible volleys of your data to look like before you do an experiment. So so that's what you would typically what I typically tend to do now is that I will do this property some simulations and I have a distribution of of potential data sets, which is much wider than encompasses the expected range that I would sort of expect to collect when actually go ahead and collect data.

And so then it becomes actually not too difficult to argue that in the paper, I find, because then you just include either these or the visualisations, the property, the distribution or so those are the contours of it. And it tends to be quite a persuasive way of arguing that you've made sensible choices about property properties. So, yeah, that's that's what I sort of encourage people to do now, is to do that.

I mean, because often you just don't have that much information about individual properties. And if you do, that's kind of a luxury. And then you see that obviously. And so so, yeah, that that's the way I go about doing.

I mean. With counselling, referee comments about fries hasn't actually come up that much to me, I don't know why, but the one thing you obviously can't do is just kind of sensitivity analysis at that point when someone raises the issue is, is your problem is your your employer emphasis, sensitivity approach or is and if that is the case, then you always have a kind of obligation to report that anyways, they should be doing that anyway.

Does that answer your question? Yeah, no, that's that's really interesting. Yes, I should say that first part. I thank you for that and thanks again for the talk. So it's a really interesting question, but a good question. Do you have any other questions? Could I jump in here? Yes, and thanks so much to talk relief's extremely clear, they presented things excellently. I kind of question the study. You showed how as we had more data, the Pryors essentially outweighed by the likelihood and so on.

Do you have a rough sense of when you evaluate it's kind of worth doing a Bayesian analysis versus not given the amount of data, or is the idea to just kind of like set everything up in a Bayesian approach, regardless of the volume of the data and let the results speak for themselves? So as I understand your question, it's about sort of when you get much merit from using a Bayesian analysis versus using a frequency.

Is that right? Yes, yeah, exactly. What's the kind of threshold of data and how it is? So it's a good question. I mean, I'm not one of these people that tends to to bash classical inference, particularly. I think that they both have pieces. And what I can say is that so. There are situations when the inference allows me to do inference when you wouldn't be able to do so in the classical sense. So I'm not saying in situations where your model is relatively poorly identified.

So a good example of this is in covid-19, then you have this sort of transmission dynamic models of how the disease spreads and those models have got lots of different promises, the rate at which people recover, the rates at which people become infected. And there's lots and lots of uncertainty about all these promises. And the data that we collect to actually try and estimate those promises is really, really noisy and poor.

And so there's no way that you can actually estimate all these promises just from the data you need. You need something else. You need biological, pre-existing knowledge, basically. And so in that situation, then you're a bit stuck in frequencies so you can fix your promises about things that you think about logically possible. But that's not quite satisfactory because often we don't know those things very well.

And so in that inference, we can use price. And so we can sort of incorporate our uncertainty then, but we can still make progress on the inference. We can still try and estimate that we're actually interested in. And and so, yeah, kind of the basic message, one of the benefits is that it allows you to make progress on problems that are basically just unidentifiable when you wouldn't be able to make progress with using frequencies influence.

And that tends to be the case when the models that you're trying to do in school get more complicated or the data get fewer. So it's one of those two kind of circumstances. There are also other benefits of the sort of method, which is that because everything's done in a simulation way, because you have to do approximate inference, then you typically get things like uncertainty and predictions for free.

And that's that's kind of the nice thing about basic principle. But does that answer your question? Yes, that was great. Thank you. Thanks for the question. Or any other questions? I had a question to come. Yeah, yeah, yeah, I was curious about the like you free to find it quite hard to hear he. Can you hear me better now? Yeah, right. So, um, yeah, I was I was I was curious about the like in three different fundamentals.

You were planning to go there. So do you I mean, which kind of matters are planning to include? Are you planning to have my walky only four of these are asked for like stochastic financial models as well. Yeah. So so I guess the that that the class of models isn't one class of models that isn't covered by points at the moment is models where you have the caps that underlines the Casodex.

So I presented a deterministic model which is the closest you can imagine the model, which is sarcastic and point isn't able to handle that because when you get into the realm of stochastic processes, then often it's really, really difficult to write down the probability of having generated the data. Intuitively it be there were just too many ways to generate the data and so on.

And so in that setting then, yeah, it's very difficult to latch on a. In those situations, there are a few different things you can do. One of them that I think is probably the next logical step is to put an approximate on compensation approaches. And in those, what they rely on is essentially your ability to simulate from the model. So so long as you can simulate from your process and check how close your simulated data is to your actual data and different parameter values, then.

Under some some conditions, then you can still make progress towards being able to do it, it becomes approximate because it's it's it's only exact in the limit that your data and you the data exactly. One another, which typically doesn't happen in reality. So you used what cycle sort of approximate based compensation methods to to to do not just rely on your ability to forward simulating the model. And we're going to include a whole host of these different methods.

We've already done a lot of exploration of these different methods, and our plan is to include them in the next iteration of points, because people in the institute I've asked this about, about this quite a lot of the time because they tend to to use these things in, let's say, spatial simulations themselves. And it also gets used a lot in support of genetics. And because, again, it's hard to write down the. The volunteer person. Yeah, I think it was, yeah, yeah, it was.

Yeah, I was just asking because I'm I'm also happy to fight my body when we do A, B, C, so. OK, well, I mean, if you're interested in working or chatting about that, then happy to if you want to send me an email or get in touch with have a chat. But my view is that about, about developing parks and libraries is that if they are trying to do the same thing, which is kind of inference, then I think it's probably better that they this is just I'm biased like that.

It's probably better that they say on the one thing rather than on the lots of different things. But that's just my view. And I think people in Matsue also tried to develop their own agency libraries and and and that some have been more successful than others on site. But yeah, if you want to if you want to have a chat about it and set a minimum. Right. Thank you. Oh. Great. OK, so if there are any more questions. OK, so I think what another round of applause for a great talk.

I said, well, thanks again, guys, and thanks for setting this up and down and organising. And yeah, if anyone's got any questions afterwards and just email me, then I think the talk is being said on the podcast. Seriously, you should be able to that anyway. So yeah. Thanks again.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android