Modelling genes: the backwards and forwards of mathematical population genetics - Alison Etheridge | The Secrets of Mathematics podcast

00:02

I. Okay. Thank you very, very much. I'm slightly worried you're here under false pretences because this morning I looked at my abstract, and the talk I wrote doesn't really fit the abstract, but it does fit the title still. So I'm hoping you won't feel you're here lured by false advertising. So I'm going to talk about mathematical models and population genetics and a good place to start actually seem to be population genetics.

00:36

What is it? And it's more than the last 20 or 30 years where mathematics has been important. So as a subject, a scientific subject is rather young. It's only a century or so old. And we usually trace it back to what we call the modern evolutionary synthesis, which was when the work of Darwin up here and Mendel were reconciled. So what were their theories? So his natural selection in a nutshell.

00:57

So Darwin's theory says that heritable traits that increase Chris reproductive success will become more common in a population. So if, for example, being taller makes you more likely to have more children and being taller is a trait that you tend to hand on to your offspring, then being taller will become more common in the population. The population will gradually become taller, and for it to work for it to make any difference to a population, it requires variability.

01:21

It's no good if everyone is the same and offspring must be similar to their parents. That variability and heritability are given to us by Mendel. So what does Mendel say? Mendel says that traits are determined by genes, determined in inverted commas. Because actually it's a bit more subtle than that. But traits are definitely influenced by genes, and genes occur in different types. That gives us that variability that we need. And offspring inherit genes from their parent.

01:48

So you've got heritability. So why did it take 50 years for anyone to notice that these two theories could be brought together? And perhaps the main reason is that whereas Darwin tended to concentrate on the natural selection, acting through the accumulation of lots and lots of very small changes to get something really big. So these are his famous Galapagos finches. And to get from this beak to this beak was not something that happened overnight.

02:12

It was an accumulation of lots of very, very small changes. Mendel, on the other hand, famously looked at peas and he was interested in traits which really were determined by genes. So whether a pea is green, yellow, wrinkled or round, whether it has a a green pot or yellow pot, a constricted pot or inflated pot, these are all things determined just by single genes. That very discrete is very different sort of variability from Darwin and his finches.

02:38

But nonetheless the theories were brought together and they were brought together by mathematics and the mathematicians in question. We usually attribute anyway the modern evolutionary synthesis to three guys. So we have Fisher over here, Ray Fisher, civil right, and J.B. Haldane. So Fisher. Very famous British mathematician or statistician, depending on your culture.

03:03

Fisher was interested in the data that was being collected by biometrics, and so they collected data on things like height and weight of parents and their offspring. And Fisher noticed that this could all be explained by Mendelian genetics as long as you allowed a particular trait to be determined by lots and lots of Mendelian factors. So lots of genes influencing the trait, each having a very small influence and a bit of environmental noise.

03:28

And in the process, he actually invented much of modern statistics, and in particular the analysis of variance. This isn't a usual picture of Fisher, but that's what he would have looked like at the time of the evolutionary synthesis. Usually we show this grey man with a long white beard. Over here on the right, we've got civil right. So civil right.

03:43

Was it an American who was trained in mathematics and then lowered into biology by a woman whose name I wrote down so I knew would never remember it? Wilhelmina Antman qui, i have you forgive me for not remembering that, but she's interesting. She was one of the first women to get a Ph.D. from the University of Chicago. And while Wright was at Cold Spring Harbour, she lured him into biology.

04:05

And he developed a lot of what we now call the theory of genetic drift, which is understanding randomness. This is a long time before probability was a fashionable mathematical subject, but he was understanding the randomness in a population just because it's finite, it's inherently stochastic, this nature, the nature of reproduction. And he also developed notions, things like fitness, landscapes, which we still use today. This man here, JBS Haldane, Oxford trained mathematician.

04:33

And you might guess, looking at the photo that he had perhaps the most colourful of the careers of these three gentlemen. He wrote an excellent children's book, my friend Mr. Leakey, which I thoroughly recommend to you. He was married to a very interesting journalist who brought very interesting people into his life and into his household. And so he left Oxford and travelled the world and finally died in India, very sadly.

04:55

But. While these three would certainly have agreed that Mendel and Darwin were very compatible theories and indeed they they reinforced one another. What they certainly did not agree on was the answer to this question. So what is the relative importance of the different forces of evolution that are acting on my population?

05:13

So natural selection in the sense of Darwin population structure, because we don't all live in a big melting pot, we're all sort of spread around and we live in different spatial locations and in different, different genetic forms. And genetic drift, this randomness that right. Set up. And I actually deliberately put Fisher and right. Rather a long way away from each other because they really did not get on.

05:36

They had a very long standing feud. Because while right thought that genetic drift was a very important evolutionary force, Fisher thought that it would be completely dwarfed by natural selection. And he and Ford wrote a number of rather aggressive papers against rights theory. So if these incredibly intelligent and innovative thinkers were unable to cast light on this problem, why do we think we might be able to shed any new light on it now?

06:03

And the answer lies in the data. So right you will notice is holding a guinea pig. This is not because in the 1930s when this photograph was taken, the Americans used guinea pigs as blackboard erasers as it might appear.

06:17

It's actually because he bred guinea pigs, so these guys could only view genetic information indirectly by phenotype and right developed the our understanding of the way that different coat colours are inherited in guinea pigs and also rats, rabbits and lots of other other mammals of similar descent. Nowadays we can view DNA sequences directly, and frankly, our data is a lot less cute. So here is what geneticists do with modern data. This thanks to Jonathan McKinney for this.

06:49

Actually, if you go to the Department of Statistics, you can see this patent on the ball, on the doors, on the second floor. This is how you can tell the statistical geneticists at work. And what it corresponds to is data from 40 different human beings. They're from the Thousand Genomes Project, in fact, and they all come from an area in Nigeria. And what's been recorded, this is quite a long sequence of DNA, but all this records is the differences between individuals.

07:18

And that's what geneticists there record, the differences between individuals or rather between the DNA sequences and individuals. And from those differences, they infer something about the way that individuals are related to one another. And we call those relationships genealogical trees. And we'll see a lot of those in the rest of the talk.

07:35

So as mathematicians, if we want to address that key question about the different importance of the different forces of evolution, what we need are forwards in time models that say how those forces of evolution would change gene frequencies, how would they change the frequencies of different genetic types to move forwards? But then we want to compare that to data, or rather to what geneticists infer from their data.

07:58

And so we need to be able to say backwards in time, if a population were evolving according to one of our models, what would those genealogical trees, what would those systems of relatedness look like in individual sample from our population? Okay. So let's just have a quick think about backwards in time. So I said genealogical tree and I'm deliberately saying that not family tree. So let's try and explain why. If I want to plot my family tree, what do I need?

08:24

I need my parents and my grandparents and my great grandparents and so on. And the number of individuals in each generation is growing really, very, very quickly. And I think that's quite nicely illustrated by Mike Wallace. Just why you can find this sculpture in the grounds of Morton College. And thanks to David Lowery for sending me the photo.

08:43

After just nine generations, which here are meant to represent generations of academics in maudlin, with a little self-important and maudlin through 512 leaves on this tree, it doesn't take very long to get to a very big number nine generations, 512. Now, natural populations are finite, and so you can't indefinitely go on doubling the number of people in your family tree without running out of individuals to put in your family tree.

09:08

So some individuals must occur more than once. And let's see a real example of that. And I bet I've chosen the rather extreme example. But here is the family tree or the pedigree of King Charles, the second of Spain. And so here's Charles himself and here's his father. And here are his paternal grandparents and then great grandparents and so on. And then his his mother and here his maternal grandparents. And then we see that his great grandparents appear to be duplicated already.

09:39

So he's a very extreme case because his mother was his father's niece. Now, this is really quite extreme inbreeding and it goes on as you go back in the tree, you'll see there are lots of instances of lineages coming together. This really isn't a tree. And in fact, it is an extreme case because I'm afraid Charles the second was actually very seriously handicapped by genetic disease and died without leaving offspring.

10:07

So let's try and find a family tree, which is a little less politically inspired, because obviously a lot of the marriages here were not random. They were so that bits of Spain, state property of Spain and I'm going to move to a very apolitical organism that the snail and one of the reasons I moving to the snail is because drawing these pictures gets very, very difficult if you separate your population into males and females and snails hermaphrodite.

10:33

But let me assure you, the mathematical models are almost identical, just much harder to draw with the program that I was using in a hotel room in Paris yesterday. So it's very carefully prepared, etc. So here what we've done is we've taken five snails and snails are not monogamous. So we're supposing that in the previous generation each snail just picks two parents at random.

10:55

So for example, this one chooses that parent and that parent and any snail can breed with any other snail because they're hermaphrodites. And so they've successfully produced this offspring. And as we trace backwards in time. So this is my present day population and we trace backwards in time and we see we get quite a complicated network of relationships developing.

11:13

And in particular it's already the case that in this generation, this individual, this individual and this individual all have to stay in the family tree in the sense of the pedigree of Charles, the second of all of the individuals down here. So all five of these individuals are in some sense descended from these three guys. This one left no offspring, so no one in the current generation.

11:36

And it took me a long time to adjust the picture so that this guy doesn't actually is not actually ancestor to everybody. There's one person this out little exercise for you to work out which one. In fact, the reason that it took me so long to work it out is it's actually very difficult.

11:50

So if this were genuinely done at random, instead of me just picking individuals after logarithm log, two of the number of individuals here, that number of generations back, we expect to see an individual who's ancestral to everybody after 1.7 log to end generations with very small variation around that. For large populations, everybody in the ancestral population is either ancestral to nobody now or ancestral to everybody now.

12:20

So this suggests actually that this is not the best way to view ancestry. It's two joined up. So let me just actually convince you that that guy there was ancestral to everyone. And not only is ancestral to everyone, I've just coloured in cyan. I believe this colour is called all the individuals who are descended from him as I come down the tree without paying any attention as to whether he transmits genetic material. I've just looked at his offspring or her offspring since this is now.

12:47

And you'll see that not only is everybody descended from this person, but there are multiple routes through this graph which get me from here to here. Okay. On the other hand, if I start thinking about genetics, you may wonder why there are small circles. The small circles I'm thinking of as individual genes. And if I just trace back the individual genes in this generation, then what I've done is I've used red lines to indicate where genes were inherited from.

13:12

So each chooses not only an individual for a parent, but actually this gene chooses one of those two circles. Then this character, who is ancestral to everybody in the sense of pedigrees, has not transmitted any of their own genetic material to the current generation.

13:27

Moreover, as I go backwards in time, because each gene chooses a unique parent in the previous generation, the structure that I get by only looking at genes is going to be much simpler because the number of lines that I trace as I go back in time can only get smaller. It's not getting bigger. I'm not having to look at two parents for a single gene. Single gene only has one parent, and that's easier to see if I just focus on one.

13:51

So what I've done is I've just arbitrarily decided I'll only trace the left hand gene from each of the individuals in the present day population. And then you can see that as I trace backwards in time, there's a unique path that connects, for example, this guy to the ancestral generation, but occasionally two paths will meet and thereafter they will be the same.

14:12

And we can encode that information in a very simple picture like this. And so what we're going to do is instead of following the haploid individual and trying to think about where everybody came from with both of the sorry, the diploid individuals and think about where everybody came from with both of their genes, we're just going to trace Gene by gene and see where individuals came from. And that leads us to the simplest imaginable model of inheritance.

14:33

And this is called the right fisher model. I don't think they'd like their names being tied together like this, but I'm afraid they are inextricably connected in population genetics. And the Roy Fisher model is probably the most important model in mathematical population genetics. So it's a bit disturbing how simple it is. So the idea is that each individual chooses one parent in the previous generation.

14:53

And I'm thinking of an individual now as being a gene. So each gene chooses one parent in the previous generation and generations discrete. And what happens in one generation doesn't affect what happens in the next generation. I told you before that in a pedigree we'd expect a common ancestor to the population on the order of long to end generations ago. So let's have a quick think about how far we have to trace back to get a common genetic ancestor.

15:21

So let's take a sample size two from this population and think about how long do we have to go back before we get a common genetic ancestor? Well, the probability that my two individuals had a common parent is just one over ten because the first chooses a parent just uniformly at random, and the second one's got to choose the same parent. And the chances of choosing the same parent is one over ten. If they didn't choose the same parent, the chance they chose the same grandparent is one over ten.

15:45

So the number of generations I wait until my sample of size two has a common parent's common parental gene is the same as the number of times I have to roll an enzyme to die before I get an end, and that's going to be on the order of generations. So whereas Pedigree had common ancestry after this very small number of two generations. Genetics genetic ancestry is determined over much longer timescales. And for just a pair of individuals, it would be about and generations ago.

16:13

And in fact, for the whole population, it'll be about to end generations ago. So not very much longer. Okay. Now, the models we're interested in, as you may already have spotted, are very, very crude. They're trying to capture just some caricature of the way that populations reproduce. We're not going to go into fine detail of what's happening locally. So let us suppose that population size is very big. Otherwise you wouldn't use this kind of a model.

16:38

So he said that for a sample of size two, it's going to take on the order of generations before we see a common ancestor. So let's use MN in generations as our unit of time so that the time for two individuals to find a common ancestor is of order.

16:50

One Okay, well, if I do that, then the time to the most recent common ancestor is one over ten times the number of rows of my inside a die until I guess an N and it's well known that that is very well approximated by something completely independent, often called an exponential random variable with parameter one.

17:08

So as long as I measure time in these ludicrously big, big units of generations, the times the most recent common ancestor of a sample of size two is on the order of one, and it's given by an exponential one random variable. Now let's take a bigger sample to be a bit boring. So let's take a sample of size. KP And Regal two three. The probability that at least three individuals in my sample have a common parent is water one over n squared.

17:32

I claim because the first one chooses a parent, the second one's got to choose the same parent. That's one over in the third one's going to choose the same parent again. So that's one over n squared. I'm not going to see that happen because it's going to take me about ten squared generations before I see an event like that. And by then, all my lineages will have coalesced pairwise. So I'm never going to see three lineages coming together in a single generation in the same way.

17:58

I'm also never going to see what we call simultaneous mergers, where two distinct pairs of individuals come together in the same generation because the probability this pair comes together is one over. In probability, this pair comes together as one over ten. So I'd have to wait and scratch generations to see it. And that's too long. All my lineages will have coalesced by pairwise coalescence by the time that happens.

18:18

And so what we're left with is it's an observation of Kingman in 1982, really, although he proved it, lots of people had observed it was if I have a sample of size. K So here's a sample of size four from my population. Measuring time in these units of ten generations. The time that I have to wait as I trace backwards in time before anything happens in my genealogical trees.

18:39

In these trees, telling me how individuals related to each other. It's just the minimum of the four choose to just reduce three exponential one random variables that tell me when they find their common ancestry pairwise. And the minimum of those two exponential random variables is just an exponential random variable. And we just denoted it here. Now I've got three lineages left. And the extra time I must wait before the next thing happens is the minimum of three.

19:06

Choose two exponential random variables and so on. Okay. And here this is a picture of what this related list looks like for a sample size a thousand. I'm grateful to Bob Griffis for producing this for me many years ago. It's as you can see. A lot of stuff happens very, very quickly. But then we're down to a rather small number of lineages, and it takes a long, long time. After after this initial flurry of activity, before very much happens.

19:35

Okay. So we've got a forwards in time model for the low frequencies and a corresponding backwards in time model given by Carmen's coalescence. So how does it do with data? Well, you probably guessed that this isn't really a very good model of how populations really reproduce. But you might hope that you could reproduce it in a laboratory. And in the 1950s, Furey tried just that. So what he did was he took a population of fruit flies.

20:03

This is just awful. Melanogaster And he took the fruit flies in two different forms that differed just very slightly in their eye colour. So half of them, as when he started out, carried a gene which just slightly changes the eye colour. And he took 100 populations each consisting of eight males and females, and each started with half with one eye colour and half of the other eye colour. And he propagated these populations for 20 generations.

20:31

And he compared the results that he got to the predictions of the Wright Fisher model. Now, how did he do it? He actually had to keep the populations of mice constant. So in each generation, he resample to always keep the constant, the population size at 16 individuals for each of those populations. So it must have been it doesn't seem a very big experiment by these modern standards of big data, but it must have been quite a tedious experiment to perform.

20:57

And here's his results. So that right Fisher model tells us that on average, actually, the proportion of the eye, different flavours of AI in our population is not going to change. But there'll be some variability in that and it gives us a prediction for the variability. And he plotted So what this is one minus one, minus one over into the number of generations is what the picture should predict. So he plotted, all right. Plotted his results. So here are the results of his experiments.

21:27

And this is a variance that we're plotting a variance against generation. So in population at the beginning, we started with exactly a half a half in all the population. So there was no variability. And this is just saying something about how the populations vary as time goes on. And eventually all the populations will be either one eye colour or the other eye colour. And at that point, this variance will hit point 25. So you can see it's rising steadily.

21:50

Now, I'd like to tell you that this straight line was the prediction of the right fashion model. But that wouldn't be completely honest. This line, the dotted line is the prediction of the right fisher model. And it turns out, though, that this line is almost the right fish model. But instead of taking the true population size, which in this case is 16, I've substituted 11 and a half.

22:17

And by virtue of doing that, I mean, I know it's not a perfect fit, but actually for an experiment of this size, that's pretty good. And it turns out that's universal. It is pretty good as long as I don't use the real population size and I put in a population size to suit my purposes. The Kingman coalescence, or the right Fisher model is a pretty good approximation, even to natural populations.

22:38

As long as I sample individuals from far away, far enough away from one another that I'm not seeing local effects brought about by them living in very close proximity, for example. And it's an example of the sort of scale of this fudge factor. So how much do I have to change the population size to make things fit? I think the human population is rather nice. So for the whole world I would need to take an effective population size.

23:03

I'd need a substitute and to be about 50,000 in my right vision model. And of course, the true population size of humans is 7 billion. So the difference between the number I plug in to make my model fit and the true number is not five orders of magnitude. So that seems crazy that this should work, is completely mad that it should work. But actually beyond that correction, it fits the data extremely well. Okay. Now, we would like to understand why.

23:35

And in particular, we would like to understand if we're going to understand our basic question, how the different forces of evolution feed into this need to make it all work so well. What the right fish model is doing is it's just modelling the genetic drift. But how would selection alter any how the spatial structure of any.

23:53

And I started working on this stuff about 20 years ago when Nick Boulton, who is a very distinguished evolutionary geneticist now, at least in Australia, came to see me and Nick said, Well, look, I'm studying these grasshoppers who they are and they live in the maritime Alps. And you'll see, like he always chooses nice mountain ranges for his field trips. And they they really are in a spatial continuum.

24:16

And I want to know how this spatial structure is affecting the genetics of pedestrian pedestrians. And I should tell you why it's called disruptive industries, because it is a pedestrian grasshopper. It's hard to see, but this is the vestigial wing. This thing cannot fly. Okay. So it crawls around, hops around. It doesn't move very far in its lifetime. So the spatial structure to as a pedestrian probably looks pretty much like the plane.

24:42

And at the same time, Nick said, oh, and by the way. Right. And Marco almost solved this in the 1940. So Gustav Malago is another of the greats of population genetics. And the way that right in Monaco solved it was they took the right fishing model and they adapted it to a spatial setting. So they said, Let's suppose individuals are scattered across space.

25:00

This pointer is really not good. So we go so here individuals scattered across space and it's in a kind of uniform way that each of them just chooses where they fall uniformly at random. And this has been drawn on a source for reasons that will become clear in due course. And I will show some realisations of a simulation due to Jerome Kelleher, who I think is also sitting up there somewhere.

25:22

And the way that their model works is it evolves in discrete generations, just like the Wright Fisher model and the number of offspring that each individual produces in each generation is taken to be a Poisson random variable with parameter one. So why is that what they chose when they chose that? Because in the right fisher model, if you look at the number of number of offspring that a single individual produces phylogeny, it's approximately a Poisson.

25:46

It's very, very close to being just a partial. So now here, Mitch Gooding drew this for me. Here's a histogram which just tells you this is how many times I should expect to get zero offspring. This is how many times I should expect to get one offspring. So quite a lot of the time, and this is two offspring and so on. But on the average I produce one offspring. Okay. And I can't have my offspring all sitting on top of each other. That wouldn't be right.

26:11

And so I scatter them around the position of the parents according to a Gaussian distribution. So they're just distributed close by in a nice, symmetric way. So understandably. Right. And Marco thought that this sort of pattern would persist, that if they looked in their population in generation ten, it would still look quite a lot like this one, still look pretty uniformly spread.

26:34

I'm working on that assumption. They were able to do what was the equivalent in the 1940s of writing down those genealogical trees, telling us how individuals in the population were related to each other and genetically, how that the correlations between the genetic type would decay with distance. They predicted it would decay approximately exponentially. But then in 1975, Joe Feldstein noticed that actually the assumptions were inconsistent.

27:00

And this is during the simulation, which he's probably rather embarrassed I'm using. So he did it for a lab meeting a long time ago. But here's the initial condition and what he's done. He's working on a tourist, and he's suppose that the population really does evolve according to the Malago model. And after ten generations, this is what it looks like. So the population is still pretty much a thousand. It's not changed very much, but we're getting these white spaces developing.

27:25

After a hundred generations, the population is still pretty close to a thousand, actually. But we really are getting a lot of white space. The population is really clamping. By a thousand generations, the population is caving in and realising that mathematics cannot be defeated because there is a theorem that says it has got to die out eventually and it's noticing that it really ought to. So it's down to close to 300, but those individuals that are left are really clustered together.

27:51

Okay. Now, a fellow Sunstein first observed this when he was working on the whole if the real plane hole of Euclidean space and he said, oh, well, maybe it's just cause I'm working with an infinite population. Real populations are finite. Let's look at a finite population. So he moved on to a Taurus and then he noticed, well, unfortunately, on a tourist, either the population blows up or it dies out, so it will die out.

28:13

If I just have on average one offspring and if I slightly increase that, the population will just explode eventually. So that doesn't work. And then he said, Well, let's suppose that somehow the total population size, oh, my Taurus is exogenous. And he specified and had to fix it to be a thousand. But we can see from the simulation that that's still not working, actually, because the population size here didn't change very much.

28:35

It's still pretty close to a thousand, but we're still getting clumping and Feldstein realised that this was going to happen and at that point he said, okay, I give up. And he wrote a paper which famously dubbed this problem the pain in the Taurus. And so the challenge that Nick was really throwing down to me was to solve the pain in the Taurus.

28:54

All right. We want you to produce a model which was a little bit like the Moloko assumption that the population would be distributed in space in a relatively uniform manner, but which actually had a stability to it so that we could write down genealogical trees in a consistent way. And we want that model to address one or two other issues. So we've already seen that genetic diversity is much, much lower than what you'd expect from census numbers.

29:19

That's another way of saying the effect of population size is orders of magnitude different from the census population size. That's that same statement. In other words, another thing was I said that right. And Malika observed that the correlations between genetic types would decay, sort of expand exponentially with distance apart.

29:37

That's sort of true ish over some scales. But then when you look over larger scales, the rate of exponential decay appears to have decreased and you get longer range correlations than seems reasonable. And a possible explanation for this is that the demographic history of many populations is really dominated by large scale events. So imagine I'm a population of plants living on a forest floor. Every hundred generations or so I forest fire sweeps through and completely wipes the population out.

30:04

And then it gets very, very rapidly replaced, re colonised. And that's going to lead to both a reduction in genetic diversity and long large scale correlations in LEO frequencies. Now I am not claiming the model I am about to write down. I think it is quite a good model for forest fires. It's not a very good model for glacial maximum for ice ages. But I like to show this slide just to remind me, to give you a notion of the sort of timescales over which evolution is happening.

30:32

So remember, we set the effect of population size for the human population. Let's say if we just look at European humans is about 20,000. So that means that my genetic composition is being determined over 20,000 generations and into generation time. I don't know. We could argue over 20 years. So that means we're talking about timescales of hundreds of thousands of years. Right. The last ice age, the last glacial maximum Northern Europe was largely covered in ice.

31:03

Humans did not live there, and that was only 18,000 years ago. So from the point of view of genetics, that's just the twinkling of an eye. So these large scale events are really going to have influenced our genetic composition, and we can't simply ignore them. They really are going to have affected things. Okay. Another thing I wanted to say before showing you the model that we came up with is one we derived derived the Cayman coalescence.

31:29

I emphasised that for large populations we'd never see three lineages merging in a single generation because of having a common parent in a single generation, because individuals had so many parents to choose from, you never got three of them choosing the same ones on the timescale that we were looking at, where you just get pairs choosing the same one. And that's just because in squared was much bigger than in.

31:50

In a spatial continuum if I'm an individual and I'm looking around me for potential parents in the previous generation. That may not be true and may not be very large. And it may be the case that any squared is really not so much bigger than N. And so I will see mergers of not just two ancestral lineages at a time, but three, four or five, any number. Okay. So here's his model. Our first first stab at a model.

32:15

We going to suppose the population is just spread out in space in the same way as Wright Malachi did. But no reproduction isn't going to be based on individuals. It's going to be based on events. And this is the key insight. So what we do is we throw events down. You could do this in discrete generations. Mathematically, it's convenient to do it in overlapping generations, as we say.

32:34

So we throw down one event at a time. And a reproduction event is just going to affect a region which is determined by the nature of the event. So here is just a disk sent x and radius. Ah, now if the region I throw down is empty, I don't do anything because there's no population there to reproduce. But if it's not empty first, among all the individuals living there, I choose a parent uniformly at random.

32:59

The first thing to notice is if I'm living in a very crowded region, the odds of being chosen as a parent get to be very small. So my reproductive success, if I live in a very crowded region, is not big. It's very small. On the other hand, if I'm in a very sparsely populated region, I will be picked. Okay. And that's the key. That's what prevents that clumping that we saw from Feldstein's observations. Okay, so I've chosen this individual. I'm going to kill a proportion of the population.

33:26

In this example, I allowed the parent to die. I don't have to. And I replace them. So they are killed with some probability that I just specify independently. So each of them flipped a coin that comes off heads with probably see you. And if it comes up heads, they die. And then I replace them with offspring and they're scattered in the same way as the parental population is scattered just by picking points uniformly at random in the region and the distribution of the number of offspring.

33:52

It's random, but it's chosen to roughly replenish the population. So the population density should be, roughly speaking, constant. Okay. So and then we remove the dead individuals, and that's my new population. Okay. So how does this work as a model? It's obviously very crude. It's a little bit like a white fisher model. Does it work at all? Well, it overcomes the pain in the tourists. It does have a nice stationary distribution with populations distributed uniformly across space.

34:18

It allows us to incorporate large scale extinction recolonisation events very easily and it also is easy to extend to include things like natural selection. So for example, I might select the parent not just uniformly among those in the region, but according to their genetic type weighted by their genetic type. Or I might choose individuals to die according to their genetic type.

34:38

So it's very easy to adapting to natural selection and we can write down the distribution of those genealogical trees, the things that the geneticist transferring. The problem is. It's a bit of a mess. So the expressions we write down are extremely complicated, but it's only a metaphor. Mathematical reasons, at least as the population is relatively dense, that mathematical mess can all be approximated by a single model.

35:04

And the way that we think about that single model is that what we're going to approximate is what we're going to use as an approximate in model as a model for sampling probability. So what my model answers the question if I were to sample an individual from the point X at time T, what is the probability that it is a type A, whatever type A might be? So that's the question that our model would let us answer. And to explain how it works.

35:30

It's convenient just to forget space for a moment because we're just going to adapt a norm spatial model to a spatial model. So here's how it's going to work. Reproduction is, again, going to be based based on events exactly as it was before. So events now specified a time. That's the time when the event happens and an impact. This you that's the proportion of the population that's going to be affected by the reproduction event.

35:52

And let's have a look at this event and see what it does to our population so immediately before the event. This is how the different types are distributed. I've just used three different colours to represent three different types, and I got to select a parent. So I'm going to select my parent uniformly at random from the population. So I just threw a point just at random. I'm zero one here. And it happened to land here in this cyan region.

36:12

So the type of the parent is going to be cyan. Now a proportion of my population is going to be killed. The remaining one minus you survive. So this band here is one minus two times the width of that one. That's one minus you times that one. And this bit is one minus you times that one. And I replace everyone I killed with offspring of this chosen type.

36:34

And so on. This example, they're all saying. And the nice thing about this model is it's very easy to write down how the ancestral lineages behave. So here I've taken a sample from my population. The sample is a size five and I'm wondering how it's going to behave. So what's going to happen is I traced backwards through this event. Well, as it happened, these two guys fell in the region of the population that survived the event.

36:58

And so they just survive. Nothing happens to them. They're still distinct lineages. And the previous generation. But these three all fell in the portion of the population corresponding to offspring. And so we know they had a common parent. So here we have an example of not a pairwise merger, but a three merger, and they merge into this common ancestral lineage. And it's very, very easy to write down mathematical expressions for the probabilities of events like this.

37:24

Okay. So the idea of our approximation to the model, as I wrote down just now and it can be obtained genuinely is the limit of that model is that we do the same thing in space. So now we're not just specifying the distribution of types in a single region as we were in a non spatial example just now.

37:43

But for each point Z And each time T I'm telling you the distribution of the type of an individual sample from the population at Z at time t. So if I sample an individual at time t. From this point, what's the probability it's type A? That's the question I'm answering. Okay. And it's much the same as what we did before. Reproduction events affect bounded regions. Those regions are now never empty. So I don't need to worry about empty space. I got to sample of parents.

38:08

So first I'm going to sample a location from the parent. So this point Z was just uniform. Then I choose a type according to the distribution there, and it came out to be read much as it did on the previous slide. And then I update for everybody in this region. I kill a proportion of individuals and I replace them by offspring of this chosen type.

38:27

And this is the fancy mathematical way of writing it. But all it's saying is that everywhere in this region I delete just slice off a proportion of the population and replace it by individuals of this type. I can write down a backwards in time model that corresponds to that that tells me about these genealogical trees. Because if I'm just a single individual in a sample, I want to know how the ancestry of the individual in my sample evolves backwards in time.

38:51

I wait until the first time my individual is in a region that is affected by one of these reproduction events, and then it's got a probability you have been offspring of the event. Right. And if it is, then it's going to have to jump to the location of the parent. If it's not, nothing happens. It just keeps going. And what's the location of the parent? Well, it was just sampled uniformly from the ball.

39:10

So it's very easy mathematically to write down expressions for the way these ancestral lineages move around in time. And if a region happens to cover a whole collection of ancestral lineages, as it did here, the idea is these green guys are ancestral lineages. Lineages outside the region can't be affected, but inside. Each end of each led each flipped. A coin that comes up heads is probably see you if it comes up heads we.

39:33

This was a an offspring of the event and these three offspring must all be descended from the common parent. And the location of the common pair was uniform across the bow. So individuals coalesce when they within or can coalesce when they're within region, the same region that's affected by an event. So that gives us a backwards and forwards in time model. Okay. So it looks pretty crude and you probably think it's not going to have anything to do with data.

40:00

But remember that Kingman worked over very, very large scales. Even though the Kingman coalition coalition was based on this very crude white fisher model. So is it the case that if we look over kind of intermediate scale, somewhere between the scale on which Kingman works and very local scales on which this model is clearly rubbish, it might work. And when you look at low frequencies, you think, Oh, no way.

40:23

So this is just what a pattern of low frequencies might look like after we thrown down 50 odd events. And it just doesn't look realistic. But if I look over larger scales, maybe it will. So here's some something to try to convince you that there might be something in it. So this is this is a horrible bacterium, actually, pseudomonas aeruginosa. It's found in the lungs of cystic fibrosis patients, amongst other things.

40:46

And this picture is from Kevin Foster's lab in the zoology department in Oxford. And I find this very beautiful. He has an incredibly high resolution microscope which allows him to observe that you can just about pick out, I hope, these individual bacteria. It's just an extraordinary picture, but we're not trying to model this. It's obvious that what's going on is this is the edge of a bacterial colony as it's evolving.

41:10

So this is just a snapshot of it. This is empty space and these are the bacteria. And obviously, what happens next depends on the particular configuration of these bacteria and their exact load shape. But that's not what we're trying to capture. Let's zoom out a bit and look at that big bacterial colony on a slightly larger scale. So this is more like visible scales. And what we see is emerging structure.

41:31

So what Kevin's done here is he's taken two populations of the bacterium, one blue and one green, but they are equally fit. And the beginning of the experiment, he's just makes them all up and he's dropped, put a little droplet onto his nutrient plate and you can sort of still see the vestiges of it there. And he's allowed it to grow. And as it grows, we see these sectors developing and these sectors are kind of proxy for relatedness amongst individuals in the population.

41:58

So, for example, all these individuals in this blue sector will descend from some common ancestral bacterium back here somewhere. Okay. So what if we do the same thing for our model? I saw two basset. We're obviously picking out the same basic structure, the same secretary. And again, this is a simulation I owe to Jerome Callahan. So we've done exactly the same thing. We've put a mixture down at time zero. We've allowed it to evolve fortuitously.

42:26

We've chosen the same colours as Kevin and you see sectors of much the same sort of pattern developing and these sectors are ubiquitous. It's not just pseudomonas that does it. This is yeast. This is from a paper of Oscar how Jack and his co-workers from 2007. By changing one parameter in our model or the ratio of two parameters, we can reproduce pictures like this.

42:47

And I'm sorry, I forgot to email Jerome and ask him for one so I don't have one to show you, but I assure you that we can also reproduce these narrower sectors that are characteristic of yeast. Okay, so with that sort of reassurance, we decided to press on and try and understand a bit more about spatial structure. And I wanted to tell you about some very recent work on things called hybrid zones. So now I have to tell you some biology and show you pretty pictures again.

43:11

So as Nick Bolton's got older and 20 years ago, he could just about catch up with Buddhism, a pedestrian that crawls very slowly around the maritime Alps. He's realised there are things called plants and they hardly move at all. So this is again Nick, who's a good friend, so I can tell you things about him. This is anti rainham and these on Geranium live in the Pyrenees.

43:33

Obviously they don't live in some sort of damp field in the West Midlands, they live in the Pyrenees and they exhibit what's called a hybrid zone. So a hybrid zone is when you get two genetically distinct populations coming together and at the interface between them, they're sufficiently similar genetically that they can reproduce and hybridise. But the hybrids are not as fit as the pure populations were.

43:58

And these are ubiquitous. And if you think about plant populations in the last place, you maximum that we talked about before, a lot of plants were sort of pushed back into Refugia after the Ice Age, they started to expand again and when they came back together, they were sufficiently genetically distinct that you could distinguish them and that when they interbred with one another, the hybrids were less fit.

44:18

And so this particular hybrid zone on one side of the zone, the and Geronimo Yellow, and on the other side that is some pinkish purple colour. Okay. And hybrid zones are maintained by a balance between the desire of the plants to spread their offspring out and this selection against the hybrids. I said they're ubiquitous. Here are a couple of textbook examples. So the one you see in every textbook is this one up here. This is Mice with Musketeers and most domestics.

44:44

So in the Northeast, mice take this form if you catch one in your larder. And down here we have the most domestics and you can probably see much better than I can that there is a narrow hybrid zone in this colour between the two populations. Here's another one. I like this one. This is the fire belly toad against the yellow bellied toad. And they have a really wacky hybrid scene. So you can see the hybrids on here. It goes all the way. Through Europe.

45:15

And what this picture does, what this slides down is it's focussed on this little bit of this hybrid zone where a lot of experiments have been done. In hybrid zone, there's almost flat. It's about 20 kilometres wide and they've plotted a low frequency. So the frequency is the genetic type that gives you this toad versus the frequency, the genetic type that gives you this toad. And these are data points that they've plotted. Now in a region where.

45:37

Well, okay, if you if you don't believe in genetic drift. I do believe in genetic drift. But if you don't believe in genetic drift, then you can model these hybrid science using this differential equation. It's called the it's a special case of what's called the Alan Cohen equation. We model it with this plus noise in some sense, so plus some genetic drift term, but it's easier to write it down in this example.

46:00

And what that predicts is that actually across the hybrid zone, this should be in a curve and the curve would look a bit like one plus one over two. And those of you who can remember what one plus hyperbolic tangent over two looks like will think, Gosh, actually, that's not bad. I mean, that's the right sort of shape, but it predicts a relatively stable hybrid scene.

46:19

But the question we set out to ask was, well, okay, that's what it looks like now, but how is this hybrid in itself going to evolve with time? If you zoom out, I mean, this one's 20 kilometres wide. You don't have to be very far away before it looks like a sharp interface.

46:33

And so with Nick Freeman, who's now a lecturer in Sheffield, and Sarah Pennington, who is about to take up a research fellowship in mathematics here in the Institute and at New College, we have shown that at least if we start from sufficiently regular initial conditions to make myself mathematically honest as we zoom out, the hybrid zone becomes sharp, and that's whether we use this deterministic differential equation that the guys use,

46:58

and they already knew this result of their deterministic equation or whether we also add to noise. So let's understand what happens as we zoom out. So as we zoom out, the hybrid zoom becomes sharp, that sort of clear because it was only 20 kilometres wide in the first place. But how does it move? It involves a cool design called curvature flow and to understand curvature flow. Is a hybrid zone, a putative hybrid zone, roughly speaking, the curvature at this point.

47:23

You take the biggest circle that you can. That just fits here, doesn't crossover. It just fits. And the curvature is one over the radius of that circle. And here the circles on the other side and the curvature is one over the radius of this circle. So the curvature here is less than the curvature there. And it also has the opposite sign and curvature flow will push this point inwards and this point outwards and see it in action.

47:50

Mat Dunlop is a student in the University of Warwick and he very kindly produced an illustration of curvature flow for me. Now let me just stop it so you can see the initial condition. I did not choose the name of this file, so you can see the Batman symbol very quickly degenerates into a sausage on up here where it's almost flat. You know, nothing much happens. And these ends are pushing and pushing in. And then there's still not much happening.

48:21

But what we're going to see is that these ends will eventually there's not going to be any flat bit left because these ends have pushed in so far. And this will become circular. In fact, any convex region would eventually become circular. And as this shrinks, it's going to shrink is going to go faster and faster and faster because the circle gets smaller, the coverage is getting bigger. So many of which are flow goes faster and it's gone. Okay.

48:42

So that's what I mean. Curvature flow does for you. So. What's going to happen to our hybrid teams. So now what's going to happen if we put some noise in? What's going to happen if we use our spatial and a Fleming view model and puts noisy? Well. This is a video that Nick Freeman did. So you can see again, he'd started with something a little bit like a Batman and Batman symbol, but he didn't have the imagination to do that.

49:13

And you can see, again, it's pushing out to be sort of sausage shaped, but there's a bit of noise here. But you could imagine if you looked at this from far enough away and had bad enough eyesight. So for me, this looks a lot like a bunch of flow. I'm going to speed it up a bit cause we're getting to the end. So let's zoom along a bit at all. And you see it really is doing what COVID flowed. It did for the Batman. Same symbol. It's gone almost round and.

49:40

If I keep going, it get smaller. Okay. Now, Jerome did another one for me. Jerome did one that looks good for the anti rainham. So it's even the right colours. And up here he's taking something which is nice and nice and wiggly, so that we can see that these bits with bigger curvature are going to disappear. This is almost flat and it stays almost flat. So what this stone is trying to do, and it's more like the shape of the zones that we see in other natural populations,

50:10

is it's trying to get to be got to be a straight line. Okay. So we can expect that in natural populations. Approximately. At least if we look over large scales, things are going to evolve according to this curvature flow. It's probably the slowest example known of curvature flow, but it's rather cute. I think it's rather nice that we we can see how these things will move. We have not tested yet against a against data because this simply isn't going to be enough.

50:42

It's going to move very, very slowly. So. We've assumed that those two populations that have come back together and then interfaced that the hybrids are not as fit as the purebreds. But we've assumed that the pure populations are equally fit. In fact, very often you can expect that the pure populations will not be equally fit. And if the populations are not equally fit. So I'm just showing off now, writing down random differential equations.

51:06

Of course, the equation we write down before the island colony equation this year was equal to one, and that made this symmetric about a half. So that said, the frequency of different illegals in our population was symmetric about a half. And when it's bigger than a half, this time pushes me towards one, and then it's less than a half this term push, it pushes me towards zero. And so this gives me the competition between dispersion, dispersal and selection.

51:34

If I is not equal to one, that's what happens when the populations are not equally fit. Then we get something called the Con Hillyard equation, which I can expand out as having a symmetric term. And this term here, which is no longer, no longer pushing me towards zero and one. And this we can expect to model that situation where things are not equally fit and in fact, we add a noise to them.

51:59

And what happens in a situation is that the fighter type is going to spread in a travelling wave and it's happening much faster than curvature flow. It's going to it's going to spread across the, the range of the species as a travelling wave on a much faster timescale. Now why am I showing you this? Actually the two interesting because for any PDA people, it is interesting that if this time we're not here, then we get just pure selection.

52:22

So for a haploid population, as we call it, so for populations where there's only one copy of each gene in each individual, and we're just saying that one type is fitter than the other type, this time wouldn't be here and this travelling wave would still exist, but it would spread out to the right, roughly the square root of S times, a minus one for these populations where we've also got the selection against hybridisation.

52:44

The travelling wave will travel at a speed proportional to s times, a minus one, so a completely different speed, which is because we have a pushed wave instead of a pooled wave. So that was just proving. I've read some mathematics that was for Alan's benefit. So why do mathematicians like equations like this? We like equations like this and we like models like this because we recognise them as having come up elsewhere.

53:06

So this we've come across for biological reasons. It's actually an equation which has been studied extensively, especially in physics. And we like that kind of universality where particular models arise in lots of different contexts. And the only difference here is that the form of the noise we take is really very different from the form of the noise that the physicists use.

53:24

And the other thing that mathematicians like is they always like an excuse to mention coffee, an especially excuse to drink coffee, and even more so an excuse to spill coffee. And it turns out that this equation will model is travelling front and the fluctuations in that travelling front should be roughly the same as the fluctuations in the travelling front when you spill your coffee all over your exam scripts. So on that note, I think I'll stop. Thank you very much.

Transcript source: Provided by creator in RSS feed: download file

Modelling genes: the backwards and forwards of mathematical population genetics - Alison Etheridge

Episode description

Transcript