Extreme value statistics and the theory of rare events - podcast episode cover

Extreme value statistics and the theory of rare events

Mar 03, 202339 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Extreme value statistics and the theory of rare events - Francesco Mori Rare extreme events tend to play a major role in a wide range of contexts, from finance to climate. Hence, understanding their statistical properties is a relevant task, which opens the way to many applications. In this talk, I will first introduce extreme value statistics and how this theory allows to identify universal features of rare events. I will then present recent results on the extreme values of stochastic processes, including Brownian motion and active particles. I moved to Oxford in October 2022 to take the position of Leverhulme-Peierls Fellow at the Department of Physics and New College. Previously, I was a PhD student at Paris-Saclay University, working with Satya Majumdar. During my PhD, I worked on extreme value statistics of stochastic processes. I am interested in out-of-equilibrium physics, extreme value theory, and large-deviation theory. In particular, I am currently applying ideas from statistical physics to study living systems.

Transcript

So I will talk about extreme value statistics. So this first lady's motivation, why we want to study extreme events and extremely statistics is a branch of probability theory dealing with extreme events and which are typically very rare events. But when they happen, they can have devastating consequences. So these are three examples of extreme events which which are a big deal.

And for instance, in epidemics, breeding a very rare mutation of a virus can lead to a new epidemic wave or in in finance. A financial crisis can really affect world economy. And finally, it's very important to study rare events in climate studies, because with in the context of climate change, extreme weather events like heatwaves have become more and more common and more important.

So it's very it's crucial to understand the role played by extreme events and in particular, their statistical properties. So let's start with the practical problem. Imagine that you are an engineer and your task is to build a bridge over the river. So you have at your disposal some data. So this, for instance, is the water level of the Nile River near Cairo. Over many centuries. And your goal is to be able to decide, for instance, the height of the bridge.

So one first thing you could do to study this data is to study what the average height of the river is. And the average value is plotted here by the black line. And to do that, you have a very strong result from mathematics, from probability theory, math, from probability theory, which is the central limit theorem. And what the central limit theorem is telling you is that if you have a bunch of random variables x1x2x, then.

You sum them up. And if you have many variables, you are guaranteed that this some of these variables as and will converge to a Gaussian distribution or a normal distribution. Which is shown here, and it has the typical bell shape that you might have seen many times already.

So to give you a specific example, a numerical example, if we consider probability density function API effects, which is telling me that the probability, I mean, I can find the, the variable, the random variable X anywhere in the interval zero one with the uniform probability and I will not find this variable anywhere else. So the probability zero say this interval. So it's like this box probability distribution.

So what they can do is that in my computer I can generate many replicas of this random variable independent of each other. I sum them up in a plot. The probability distribution of the sun. So if I start with just one variable and will be the number of variables. If I start with just one variable, I have my box distribution, but you can see that as n increases, the probability of the sum converges very fast to the bell shape, which is the Gaussian distribution.

And the important fact is that this is completely independent from the probability distribution from which I started from. So that if I start from some probability distribution which is completely different like this to box distribution. It might take longer to converge to the Gaussian distribution, but the result eventually will be exactly the same. So if I have many variables, I can guarantee that I will converge to a Goshen distribution.

So this is a very important piece of Martha is it characterises the average volume. But there is a problem because usually we have a finite amount of data and the central limit theorem then only applies to deviations from the average value which are small enough. But in many practical cases, as in the case of the bridge, we don't care about the Harvard behaviour, we care about the extreme events.

For instance, when the water level is very high because the river is flooding and we want to study what is the statistics of these extreme events like in this case, the maximum the maximum level, the maximal water level over many centuries? The. And we would like to find an equivalent theorem. As the central limit theorem, which applies to extremes. Which is in some sense a universal result because we don't know maybe exactly how to model the fluctuations on the water level.

So we would like to find the result, which is independent of the specific way in which we model our system. So this is the general motivational and the setting that I would consider is the following, which can apply to many different systems. So I have a collection of a collection of random variable x x, 1x2x and the index i in x i in the case time. So x one comes before x two and so on. So this could be the, for instance, the water level of a river over many, many years.

And to describe these random variables, I need to write down a model. And the model is given by their joint probability distribution. So this function of many variables picks one x to extend. It's called the joint probability density function or joint probability distribution and is telling me what is the probability to observe a specific sequence of events. So one x, two x and. And this is the modelling part.

So I have I start from a system, I create the model and the model is encoded in this probability distribution and this probability distribution is telling me about the correlations and the interdependencies between different variables. And then in extreme value statistics, what we usually study, for instance, is the global maximum of these variables, which we call capital M, the maximal entry of this sequence.

And then what we are asking is given our model, so given on joint probability distribution, what can we say about the statistical properties of the maximum m? Or there are other quantities that we will that we will, that we will study in this presentation. For instance, the time at which the global maximum occurs. So we might ask we don't really care about about how big these extreme events are, but we care about when they will happen in time.

Is it more likely to see the extreme event at the very beginning of the sequence in the middle or at the end? And this is I mean, a very simple practical application of this is to finance if you imagine that you need to sell a stock in the stock market, the best time to do so is when the price is the highest. So it's a practical problem to understand the typically or, you know, simple model at what time the price would be the highest.

Right. And another quantity, which is of interesting extreme value statistics are records. So almost every day we read in the news about a new record being set either in sport or unfortunately now in climate and within a sequence of random variables will say that an entry is a record if it is larger than all the previous entries.

So and this has many applications in nine climate studies, for instance, we know that these particular are important to study the statistical properties of records because due to climate change, new records have been are being set every year almost. So here you can see the average temperature up, um, of, of globally as a function of time over many years. And you can see that there is a clear trend. So new records are being set more often than they should.

And uh, another example is, is very common is that of sports. So if you consider the this is the best time to run a marathon for each edition of the Olympics in the last 20 editions. So you can see that the time has gone down over over over the years. And the red dots indicate records. So it would be great to have an understanding or statistical understanding on how many records we should expect in a given sequence and so on. So this is the general set up.

These are the three main quantities that we talk about the global maximum, the time of the maximum, and the number of records. And to make progress. Let's start with the simplest possible model. The. Which is the one where the variables are independent and identically distributed. Independent means that there are no correlations between the variables. So if I know the value of x one, this doesn't tell me anything about the value of x to.

And identically distributed means that these end variables, they all come from the very same probability distribution, which I will call a P of x. Mathematically speaking, this means that their joint probability density function be x one to is just the product of the marginal probability distribution b of x1p of x 2%. So the probability of observing the whole sequence from x12x and is just a probability to observe the first one times,

the second times, the third, and so on. And this might seem like a very simplistic model, but it's a very successful one in physics. One example is the random energy model by the radar, which was used to understand the properties of disordered systems like glassy materials and so on, and is based on an assumption which is precisely this one. So we're very lucky because in the case of independent and identically distributed variables, there exists a theorem.

Which is which is very important. It's called the Extreme Value Theorem, and it is the counterpart of the Central Limit Theorem. Four extremes. What this serum is telling me is that if I take and random variables are independent and identically distributed and I want to study the distribution of the maximum m. This distribution of MMR cannot be anything. It can only be one out of three probability distributions.

So in the case of the Central Limit theorem, we saw that the distribution of the sum of these variables will be gauche under. In this case, we know that the distribution of the maximum will be either gamble free or webull. Depending on how the marginal probability of a single variable behaves.

So if the probability of p of sees the probability of observing a value for a single variable, if this probability decays exponentially or faster, I will be in the Gamble Universality class, which is this continuous blue line. If instead the probability is the king has a power law for Jackson, so it's more likely to see very big numbers. So it will be in the free shade universality classroom. And finally, if there is an upper bound. So there is a maximal value that my random variables can take.

I will be in the way but universality class. So let's consider just the first one for simplicity. The Gamble distribution, which is the one you will converts to if you start from random variables which have a distribution which is the king exponentially or faster for a larger value. So. And the distribution of the maximum in this case is a very, very simple formula, which is this one E to the minus M minus E to the minus m.

And it is shown here. So it has this kind of sellable shape, but it is kind of skewed on one side. So one of the tails so the the right one is the king's lower than the other. So let's go and do again a numerical example, as we did before for the Central Committee. So in this case, I consider a probability distribution, which is exponentially decaying. So we expect to end up in the Gamble University classroom.

So again, what they do in my computer is that they generate an replicas of these random variables and I compute the maximum and I generate an Instagram for the probability of the maximum. And you will see that as PN increases the. The probability distribution of the maximum convergence to the universal gamble low. And it is universal because if I start from a completely different distribution like this one with two different peaks.

Maybe it will take longer, but if end is large enough, I will end up again in exactly the same probability distribution. So at this point, you might wonder, I told you about many numerical examples. So these are just syntactic data that they generate on my laptop. But is this any useful for real data?

And to to answer this question, I have to tell you about a little piece of history of Oxford and in particular about the Radcliffe Observatory, which was the University Observatory, see from 1773 to 1934. And if you want to see it, these are just about a five minute walk from here.

And the astronomers of the observatory in the 18th century, they started to collect weather data for a specific reason, which was that refraction, which is influenced by atmospheric condition, can affect astronomical measurement. So they had to have a precise understanding of the local weather condition in order to ensure that their measurements were accurate. So they started collecting data about temperature and pressure and other things as well.

And they are still doing that. So this data collection has continued for more than two centuries. They have the longest running record of temperature and rainfall data for a single site in Britain running continuously from 1813. So you can see here a picture of their data. And here you can see a handwritten entry of this data set from November 14th, 1813. And you can see that there is some data about temperature, about pressure, wind, rain.

And apparently that day was very dark and rainy, not surprisingly. So today, these letters, we have been digitalise them and they are freely accessible on the Internet. So what they did is that they just went and downloaded the old data centre. And I took, for instance, I wanted to study extreme value statistics, so I took the maximal temperature in October in Oxford for every year in the last 200 years.

And this is what is plotted here. So each dot is the maximum temperature registered in Oxford in October. And you can see that I mean, you cannot tell too much from this data, to be honest. But what you can tell is that 2011 had a very hot October. And the other thing is that if you plot this data as an Instagram, so if you build an Instagram from this data, this is what you get. And this is surprisingly close to the global distribution.

And you can fit the gamble pretty easily to this data. So even though these data are not independent and then distributed, this theory still tells us something useful about the data, and it can predict the probability of rare events. So the other thing I want to tell you about in the context of independent variables is about records. So again, I consider independent variables and I want to answer this question, which is given a sequence of independent, random numbers.

How many records do we expect to see? So it's a very practical question and we will be able to answer this question and to compute this quantity exactly within this light. So it's a very simple computation. So the first thing we have to observe then is that we want to compute what is the probability that I the eighth variable variable that I observe is a record.

So if excise are recorded, it means that it is the biggest variable so far and these variables are independent, so any of them could be the maximum with equal probability. So since I observe the I variables, the probability at the last one is that equality is just one over I.

Because it has to be uniform and it has to sum to one. So the average number of records I can I can obtain it just by summing over I. So the average number of records is the sum of it, all of the probability that that particular time had a recorder. So it's some over I from 1 to 10, some over one where I from I go from one to n and if I approximate the sum with an integral, if any is large, I can do that. I get that the average number of records is growing as the log of an.

And let me point out that this is also universal. So this doesn't depend on the particular distribution of the single variables. It's a very robust result to. And this is what what you would expect. So if you observe, for instance, 20 random variables, you would expect around three or four records. So it's a very slow growth. And this makes sense because. Later in time, it will it will be harder to break a new record because the last record will be higher.

And if you try to apply this to the sports data that I showed before. We had that in the last 20 editions of the Olympics. There have been seven marathon records. And so the independent theory doesn't work in this case. We would expect three or four and we get seven. So it's it's very off. And if we apply the same the same idea to the to the temperature data that I showed before, it still doesn't work. So in the last 200 years, there are nine records, but the log of 200 is like around five.

So the independent theory doesn't work in this case. So the independent theory is very useful as a benchmark and it works in some cases, but it's it has limitations because the real world in the real world, there are correlations and you need to include them in the model often. So what we want to do now is to include correlations in the model and to get a more complicated model which takes into account that different variables are not independent.

The simplest model that you can consider with correlations is the weakly correlated model. And so first of all, let me say that in general, there's no general technique to study correlated system. So we have to go on a case by case basis. And the simplest model that you can consider is the one where correlations are weak. What does that mean? So that quantity that I have on the left hand side is the correlation between variable exit and variable XG.

And what you have to know is that this number will be zero if the variables are independent and it will be non zero positive or negative. If these variables, these variables are correlated. So if I assume that the correlations are decaying in time exponentially faster. Overall a typical timescale, which is the correlation timescale, say. What I what I will have is that two random variables which are farther away in time than say.

They are basically independent. So I can still make progress using the independent theory. And to do that, imagine that they have a very, very long sequencer. What they can do is that they can divide this sequence into different intervals of size, saying. Similarly to what is done in statistical physics with the Canada Cannon argument the. And since the correlations decay over this time, the different intervals are almost independent. So if I define the maximum within each interval.

These Maxima and want him to make will be independent run the variables. And they can still apply the theory of independent identically distributed random variables because the global maximum is just the maximum of the maximum. So I can still use the theory that I showed before. But the problem is that when correlations do not decay exponentially in time, so when they have strongly correlated random variables, this doesn't work anymore.

So here I wanted to present just three examples of systems with strongly correlated random variable of which have been studied quite a lot in physics. So the first one are random mattresses, which have been used a lot in physics to describe the complicated Hamiltonian of heavy nuclei. So the basic idea there is that you don't know that this Hamiltonian is so complicated that you don't know exactly what it will look like. So you approximate it with a random matrix. And it actually works.

And and in this case, for instance, the extreme, the maximal like in value of the Hamiltonian has been studied quite a lot in the context of extreme volume statistics. The second one are fluctuating interfaces. So. Which have been used a lot to describe the interface of growing colonies of bacteria or growing tumours. And it's quite important to study the statistical properties of these interfaces and in the context of extreme value statistics.

For instance, the maximal height of an interface has been studied and it is a very crucial quantity. And finally, around the works. Which will be the focus of the rest of my talk. So let me define first of all, what are on the wokeism. So here I am plotting the position of the worker over time. So XLK is the position of the random worker as a function of key?

It's a motion in one dimension. And the evolution of the position satisfies a very simple rule, which is telling me that the position of the step K is equal to the position at the previous step K minus one plus a jump. Which I will call it, actually. So I will assume that the jumps of this random worker are again independent and identically distributed random variables.

But now the variables in scales are strongly correlated because if you write down the joint probability distribution, it will have a more complicated form, which is not just the one of independent variables, and there are actually strong correlations in this model which do not decay exponentially in time. So talking about around the works, as you probably know, Oxford is full of beautiful pubs and many of them, I mean some of them, like the one in this picture, are just next to a river.

So if you imagine that there is a drunk student classical exam after that, after a few pints, he wants to go home but is drunk. So he will move with random steps. Either towards the river or away from it. So my question for you is now after and steps, what is the probability that these long land has fallen into the river? We can answer this question very precisely by modelling the motion of the student as ran the work.

So now the river is this red line that I should that that's corresponds to x equal to zero. And what we want to study is the survival probability. So the probability that the student will survive. Which is which I will call Q1 and Q1 is just the probability that X1 is greater than zero, x2 is greater than zero up to Accenture. Given that the starting position was zero.

And of course, this probability distribution will depend, as will be, a complicated function of the probability distribution of the steps of data. So if I take steps which are drawn from a distribution, I we get in principle something different than if I get steps from my uniform probability distribution. And in this formula you don't need to understand the details. These details are heavy sighted functions, which are one.

If the argument is positive and zero otherwise. But this doesn't really matter. What matters is that QM is a complicated function of P. And B can be anything really. So I would expect to end to depend on. But the very surprising result, which is known as this Barry Anderson theorem, is that Q and is completely universal once again. So he is completely independent of P of data. And this is true for anyon.

So in this case, it's not true only for large values of n is true for any value of n and is given by this very simple formula. So Asparagus was a Swedish mathematician, and they proved this in 1955, 54. And this the formula is surprisingly simple. So many people thought that the proof of this formula should be simple as well. And there have been, while the original proof passed by, Anderson, is a kind of complicated combinatorial proof.

So there have been many attempts to prove this formula in a simpler way. But they came up with different proofs, which are where we more complicated than the original one. So if we now plot what QM is as a function of then so what? What is the probability this student has survived for and stepson. You can see that initially the probability is one off because the students, the student is starting just on the edge of the river.

So if it goes on the right in the wrong direction, the first step will immediately go in following the river, and then it decreases to zero. So if you wait long enough, the student will fall. And the the survivor probability is, I'm telling you about this because this is a crucial quantity in statistics, because it can be used as a building block to study more complicated quantities like the distribution of the time of the maximum.

And this is what I will tell you about. So let's consider once again a round the work of this type. I led the round the working world for MN steps and I want to study. I want to know what is the probability distribution of the time of the maximum democracy? So what is the probability at the time of the maximum amount is either at the beginning or at the end or in the middle, for instance. So we will be able to complete this distribution exactly by using the result for the survival probability.

So the first observation that we need to make is that. Almost by definition, the random worker cannot go above its maximum. Is almost out a logical but. So the Iranian worker cannot cross the red barrier, which is the maximum. Now, let's split this trajectory in two parts. So the first part is from time zero to time democracy. And the second part is from time to time and.

So what's going on in the second part? The random worker starts from the maximum and it has to stay below the maximum for some number of steps. And this is precisely what the survival probability is telling you. So the probability of this part of the trajectory is precisely this would via probability. In this case is q n minus the marks because there are and minus the steps on that side. And in the first part said the I can apply a very similar argument.

I just have to go back in time. So if I start from the maximum that there are no workers to stay below the maximum, 40 marks a step. So. So the probability of the first part is just the of the probability first steps. And the probability of the time of the maximum was just the product of these two probabilities. Very simple. Okay. So if you remember that Q is a universal quantity.

This is telling that this the distribution of climax is also universal, doesn't depend on the specific way in which I model my system and. If I if I plot this quantity, the probability distribution of the maximum. If you look something like this in the limit of margin. So what they what I what they can understand from this plot is that it is more likely to find that the maximum either at the very beginning. Or at the very end of the interval that I'm considering.

Because the probabilities diverging, actually the probability is going almost to infinity in the limit of margin. When t max is going to zero, t max is going to end, which is the final time of the end. And this is telling me that there are way more trajectories which are either increasing like this of reaching the maximum and the final time or decreasing like this and reaching the maximum at the very beginning.

So you might you might think that if we go and model the evolution of the price in the stock market just as around the work. The best time to sell a stock is either in the morning or in the late in the afternoon. But before you do that, you have to keep in mind what is the distribution of the time of the minimum? Which is exactly the same. So you have to be very careful by describing in describing the stock market as around the world.

It's more complicated then. So in the last few minutes of my presentation, I want to tell you about something, some new results. So so far I presented some classical results of existing value statistics of. So I would like to tell you about some more recent developments and some of my research in this in this field, which is done in the context of active particles. So, first of all, let me tell you what what active matter and active particles are.

So active matter describes systems like colonies of bacteria, which are composed of many individual units, some. And these units are able to absorb energy from the system through food, for instance, converting this energy into some form of work. In the case of the bacteria, this work is just persistent motion and this is very different to what is usually considered in physics.

For instance, Brownian motion. Because the bacteria move in a persistent way while Brownian motion is just moving in a random way. Due to collisions with the molecules of the fluid surrounding it. The. And crucially, these bacteria are out of equilibrium because they are continuously consuming energy while Brownian motion is at equilibrium. So we have a lot of tools and techniques from thermodynamics and statistical mechanics to describe equilibrium systems.

But all of these techniques do not apply to non-equilibrium systems. And the reason for that is that there is a continuous absorption, absorption of energy. And in other words, active matter is alive. Passive matter is dead. And it's crucial to understand the statistical properties of active particles. So. And this is what I did during my study and by considering the running tumble particle model.

Which he's describing in a very simplified way, the motion of equal bacteria, which is shown in the lower animation. These are experimental data. So these bacteria, what the way they move is that they typically move in a fix direction in a persistent way with almost constant velocity for some amount of time. And then they change direction suddenly this way. This is called the run and tumble motion. And to model it, I will just I will just start with the with the simplest possible model.

So I assume that they have a bacterium which is starting from a barrier, which is this red line here. And initially this bacterium will choose the action uniformly at random in space, and it will start moving in that direction for some time. And after some time it will stumble. So it will pick a new direction again uniformly at random. And they will assume that these tumbling events, these changes of direction, occur at a constant rate gamma.

Meaning that on average for each second they expect to see one over gamma events or changes of direction. But there can be fluctuations is actually what is called a Poisson process. And they will also consider velocity fluctuations in the in the speed of the particle which are described by same probability distribution w which but this is not very important and will consider this motion in one or two or three dimensions.

And this is the dimension of the system. What I'm shooting here is the motion in two dimensions. So in these cases, one one first step to understand the statistical properties of this motion is to study the survival probability, which I will define in the following way as the probability that the X component of the particle does not change. Sign up to time t in two dimensions. This is the probability that the random worker or the random number particle doesn't cross the barrier for a time.

T So this is a process which is defined in continuous time and this would very probably was first computed in 95 in the simplest possible case,

which is one dimension. So the motion is just on a line and constant velocity and it is given by this formula where I noted they one are modified by cell function but is something you can that you can plot them and it looks like this so initially this would work probabilities one off for the same reason as before because half of the times the particle we immediately cross the wall and then it decreases to zero.

So I wanted to study this problem more in general. So for higher dimension size and also for velocity fluctuations of the particle. So I first did the simulations in one day and not too surprisingly. I got that. The simulations agree with the theory. And then I repeated the simulations in equal to two, and I was quite surprised to get exactly the same result. So at first I thought that I had a bug in my code then. But I checked very carefully.

It took me a lot of a long time, but this was not the case. And doing simulations in the sequel to three, I got again the same result. And when I included the velocity fluctuations in the speed of the particle in my model, I got once again the very same result. So the numerical simulations were suggesting that the result is, in some sense, universal. Independent of the details of the model. And keep in mind that this is a non-trivial statement because many other quantities in this model.

For instance, the position distribution of the particle, the probability to find the particle at position X at 90 is not universal. This depends strongly in which dimension I'm considering. There is a curve for equal to four, which is something you can simulate on a computer but doesn't really make sense. But it is strongly dependent on the. And this looks a lot like what we saw before for the works.

This part Anderson Theorem, which was also a universal result but it was not obvious how to apply this result to the running tambour particle model for many technical reasons. For instance, the tumble particle model is defined in continuous time, while the Spotted Anderson theorem applies only to discrete time run works. So. But we were able to actually develop a mapping from the continuous time motion of the running particle to a discrete time around the work.

And we were able to show that actually it is this period of time that is behind that, this universality. And we saw before that the sort of higher probability can be used as a building block to consider more complicated quantities in extreme value statistics of. And this is the case also for the running tambour particle model.

So using this building block, this reverb probability, we were able to show that this time of the maximum for this random process, which is the random tumble particle motion, is also universal. And the other is the number of records in for this model is also universal quantity. In this plot, I'm showing the I mean, the blue line is the theory and the different symbols which kind of overlapped with each other are the simulations.

And they all followed the very same curve on the left. I'm plotting the cumulative probability of the time of the maximum. So the probability that the maximum was less than some variety prime as a function of t prime and theory of the number of records. So and this is again a very non-trivial result because we have seen before that many other quantities in this model are not universal. So there is something special about these extreme value statistics quantities.

So to conclude the I presented to you different simple models. And from this I hope that we have built an intuition on how extreme event behave in a statistical way. And these results have many applications to finance physics, evolution theory then. And so it's a very exciting and interdisciplinary field of study. And the crucial point that I want to make is that often these results were universal, independent of the specific way in which we model the system.

And this is very important because often we model the system in a way which is not accurate because we don't have access to the full information about the system. So we have to make assumptions and if we get the universal result, we can we are guaranteed that our results are robust to errors in the way we model the system. So as a final question, as I mentioned before, there is no general theory. Forex in value statistics and correlated systems.

And uh, as a very ambitious question, we would like to find or to explore possible direction to find one. And with this, I want to thank you for your attention.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android