In this episode, Andy Ashfanden and Doug Brinkerhoff tell us about their work in Glaciology and the application of Bayesian statistics in studying glaciers. They discuss the use of computer models and data analysis in understanding glacier behavior and predicting sea level rise and a lot of other fascinating topics. Andy grew up in the Swiss Alps and studied Earth Sciences with a focus on Atmospheric and Climate Science and Glaciology.
After his PhD, Andy moved to Fairbanks, Alaska, and became involved with the parallel Ice Sheet model, the first open source and openly developed Ice Sheet model. His first PhD student was no other than Doug Brinkerhoff. Doug did an MS in computer science at the University of Montana, focusing on numerical methods for Ice Sheet modeling, and then moved to Fairbanks to complete his PhD with Andy. Why in Fairbanks?
he became an art invasion after quote, seeing that uncertainty needs to be embraced rather than ignored, end quote. Doug has since moved back to Montana, becoming faculty in the University of Montana's computer science department. Thank you so much to Stephen Lawrence for inspiring me to do this episode. This is Learning Vision Statistics, episode 105, recorded March 7th.
Welcome to Learning Basion Statistics, a podcast about patient inference, the methods, the projects, and the people who make it possible. I'm your host, Alex Andorra. You can follow me on Twitter, Alex underscore and Dora like the country for any info about the show learnbasedats .com is left last week show notes becoming a corporate sponsor unlocking Bayesian Merch supporting the show on patreon everything is in there that's
learnbasedats .com if you're interested in one -on -one mentorship online courses or statistical consulting feel free to reach out and book a call at topmate .io slash Alex underscore and Dora see you around and best patient wishes to you all. Andy Ashvanden, Doug Brinkerhoff, welcome to Learning Asian Statistics. Thanks for having us. Thanks, Alex. Yeah. Yeah. Thank you. Thank you so much for taking the time. Andy, thank you for putting me in contact with Doug.
I'm actually happy to have the both of you on the show today. I have a lot of questions for you and yeah, I love that we have an applied.
slide with you Andy and Doug is more on the stats side of things so that's gonna be very fun I always love that but before that yeah let's dug into what you do day to day how would you guys define the work you're doing nowadays and how did you end up working on this maybe let's start with you Andy Well, often when people hear the word glaciologist, they assume I should be jumping around on the glacier on a daily basis. Some of my colleagues do that.
I've done it for years, but these days my job has become a bit more boring in that sense that most of the time I spend in front of my computer developing code for data analysis, data processing, trying to understand. what's going on with glaciers. So it's not as glorious anymore as maybe I want it to be. Is there a particular reason for that? Is it a trend in your film that now more and more of the work is done with computers? I think there is certainly a trend that...
More stuff is being done with computers in particular, we just have more data available, you know, starting with the dawn of the satellite era. And now with much more dense coverage of different SAR and optical sensors on satellites. So that just has created the need for doing more computing. Personally, it just happened. I did not, you know, have a master plan going from collecting field observation on a small glacier to do large -scale modeling. It just, my career somehow morphed into that.
Hmm. Okay, I see. And well, I'm guessing we'll talk more about that when we start thinking to what you guys do. But Doug, yeah, can you tell us what you're doing nowadays and how you ended up working on that? Yeah, sure. I'm in a computer science department now, so obviously I spend a lot of time in front of a computer as well. But similarly, I got into this notion of understanding glaciers from a mountaineering type perspective.
That's what I was interested in and got into geosciences from there and then took this sort of roundabout way back to computers by sort of slowly recognizing that they were a really helpful tool for trying to understand what was happening with these systems. They definitely are. I remember that's personally how I ended up working on stats. Ironically, I wasn't a big fan of stats when I was in college. I loved math.
and algebra and stuff like that but stats I didn't like that because it was you know we were doing a lot of pen and paper computations so I was like I don't understand like it's just I'm bad at computing personally so I don't know why computers don't do that you know and and then afterwards randomly I I started working on electoral forecasting and discovered you could simulate distributions with the computer and the computer was doing all the tedious
error -prone and boring work that I used to not like at all. And then I could just focus on, okay, thinking about the model structure and making sure the model made sense, what we can say with it, what the model cannot tell you also, things like that. That was definitely super interesting. So yeah, like that's also how I ended up working on stats, ironically. I had a similar path.
I didn't... take a stats class until I was in my PhD and watched Stan or one of these other MCMC packages work to answer some really interesting questions that you couldn't do with the type of stats that people told you about when you were in high school. And that became much more intriguing to me after seeing it applied to ecological models or election forecasting or any of these things that you need a computer to assist with inferences for.
Yeah, for me, taking a stats class as an undergrad student in the first or second year, I had the impression that. the stats department took great pride into making the class as inaccessible as possible and just go through like theorems and proofs and try to avoid like any connection to the real world, trying to make it useful for us. And I also got like really later into it through Doc mainly, where I thought like, you know, this kind of makes sense. That's a good method.
to use to answer a problem I care about. And before that, we were just giving hypothetical problems that I had no connection to. Yeah. Yeah. Yeah, definitely makes sense. And I resonate with that a lot. And so today, what are the topics you focus on? Are you both working on the same topics or are you working on slightly different or completely different topics of your field?
Because I have to say, and that's also why I really enjoyed this episode, I really don't know a lot about classology and what you guys are doing, so it's going to be awesome. I'm going to learn a lot. Yeah, well, we work together a lot. We both have our own independent projects, but I think we work together a lot.
And I would say that you can tell me if you don't agree with this, Andy, but I would characterize the work that we both do separately and together as trying to make glacier evolution forecasts that actually agree in a meaningful way. with the observations that exist out there in the world. And that sounds sort of like an obvious thing to do.
Like, yeah, if you have a model of glacier motion that maybe you use to predict sea level rise or something like that, like it ought to agree with the measurements that people have taken, those people that are jumping around on the glacier that Andy mentioned before. But for a long time, and... Perhaps now as well, that hasn't been the case. And so we're working to make our models and reality agree as much as possible. Andy?
I agree with Doc and I see it as a... we do similar things, but... I see this as a symbiotic relationship where our independent strengths taken together, Meg. I need to rephrase that. I think the sum of our strength has led to some... ways of thinking and breakthroughs that we may not have done just on our own. Sorry, that was not a good way of phrasing it. So while I'm coming a bit more...
In the past 10 years, I've been focusing a bit more on like model development, on development of ice flow models. And as Doc said, we want to make them agree with observations as good as we can within observational uncertainties.
And I didn't have the background in statistics to make that happen, whereas Doc has both the insight into like how ice flows and the modeling aspect, but he also has a much deeper understanding of statistics in general and patient statistics in in particular and we had a lot of conversations trying to converge on an approach to make that happen in a meaningful way.
Because these days if you go and skim through our literature, almost every third paper somehow somewhere mentions machine learning or artificial intelligence or something. It's just a buzzword. It's a big hype. Most of the time, if you dig deeper, all you'll find is people do some multilinear regression and call it machine learning. That's the best case.
In some cases, I think methods are being used in places where... they haven't, we haven't been able to demonstrate that this is the right place to use those methods. And we are trying to spend time to figure out where can we use these modern machine learning methods in a meaningful way that actually drive science and help us answer real world questions. Yeah, yeah, yeah, very good points.
And something I've seen also in my experience is that, well, the kind of models and methods you can use is also determined by the quality and reliability of your data. So I'm actually curious, Andy, if you can give us an idea of what does data look like in your field? How big, how reliable are they? And I think that's going to set us up nicely to talk about modeling afterwards. Sure. So to figure out, you know, how much a glacier and ice sheet is gonna melt.
There are a few things you need to know. If you think about it in terms of partial differential equations, you need initial conditions and boundary conditions to solve those equations. But you also have processes besides those PDEs that are a surrogate for physics that we don't understand yet. So those have parameters. Often we don't know the values of those parameters very well. So we come in with. a lot of different uncertainties. Now I forgot what I meant to say.
Sorry, can you repeat the question? Yeah, I was just asking you how, like what the typical data look like in your field. How big are they? How reliable are they? And that's usually very important to understand then what you guys apply as models. Yeah, of course. So one, if you look at the different conservation equations that we're trying to solve, conservation of mass, momentum and energy, for solving conservation of mass, we need to know the shape of the glacier, the geometry.
Now, with modern satellites and airplanes, it's relatively easy to measure the surface of the ice. relatively accurately and we can construct accurate digital elevation models out of that. The tricky part is trying to figure out how thick the ice is, for which we need grout penetrating radar or seismic methods. All of them have large uncertainties. Doing radar right now cannot be done from space.
So to figure out the thickness, at every point in the Greenland ice sheet or Antarctic ice sheet basically requires you to fly a plane. And that's a lot of effort and of course costs a lot of money. So you can only do that in targeted areas.
And in the last 10 years, colleagues have developed methods trying to combine those observations from our ground penetrating radar with what we understand how ice flows, that it, you know, obeys the laws of physics and conservation of mass to come up with smarter way to interpolate your data beyond just doing creaking. Now, ice thickness, I'm mentioning that first because that is the most important thing. It defines how the ice flows. It defines the surface gradient.
And at the end of the day, ice more or less flows downhill against gravity. So if you don't know how thick the ice is, you're off to a really bad start. So reliable ice thickness measurements are key. We've made a lot of progress in the last 15 years. NASA spent approximately $100 million for a project called Operation IceBridge, which among other things measured ice thickness and that just flew over Greenland every spring for multiple weeks. And that has given us a much more detailed picture.
of where the ice is, how thick it is, and how fast it flows. And you can show that if you use these newer data set compared to older ice thickness data sets, that the models are getting substantially better. And it also gives us an avenue to test whenever they add more observations, is the model getting better or better and better? You can go look into individual glaciers and you may see the model is still performing poorly. And, you may find, well, there is not much data there.
So hopefully at some point someone goes out and can fly that glacier. So this is the main uncertainty that we're still struggling with despite like 10, 15 years of effort. Now, the second one where it's, that is very important and it's really hard to quantify the uncertainties is the climate forcing. So in order to predict how a glacier flows and how much it melts, you need to know how much it snows. And this is a tough topic.
Both Green and Antarctica are very large, but they can vary topography over short scales, which requires high resolution climate models. They are expensive, a lot more expensive to run. than an ice sheet model these days. So they can usually do like one simulation of the past 40 years and that's it. There is basically no uncertainty quantification that they do. up to maybe recently or right now. I think with machine learning, things may start to change there too.
So we have products from observations assimilated into those climate models, but we often don't know how certain or uncertain they are because what we have is spot measurements. There might be a couple hundred spot measurements in Greenland or Antarctica where you can calibrate or validate. your climate model. So that's a big uncertainty. And I've been speaking for a long time. Maybe Doug wants to chime in and add something to it.
Yeah. I mean, sometimes I think when you're working with these really large couple of geophysical systems, it can be the line between model result and data product becomes a little bit blurred. So what do we have? We have... Direct surface measurements from a variety of sources maybe over the past 30 years with varying degrees of spatial resolution, like Andy said. It's gotten a lot better in the satellite era, of course.
We've got these sparse measurements of thickness that we don't completely understand the uncertainties for, but they're pretty accurate. But they are certainly not everywhere. with respect to the total area of glaciated ice on Earth. What else do we have? Yeah, we've got a couple snow pit measurements or shortwave radar that can measure snow accumulation over a few places on Earth.
We have optical satellite observations that can often be leveraged into understanding the displacement of the glacier surface. And there are a couple other somewhat more esoteric products that we can come up with hypotheses about how we might use to constrain glacial ice flow, but we haven't quite gotten there yet, like the distribution of dust layers and stuff like that inside of the ice that you can also back out from some of these radar observations.
But taken together, these observations, the data that we have, Occupy large amounts of space on a hard drive in in that sense. They're big like like there's a a ton of individual measurements out there but sort of relative to the magnitude of the system that we're looking at and the timeframes over which we would really like to Constrain their behavior. The data is super small. Okay. Okay. I see. Yeah Thanks guys super I think super important to set up that background, that context.
Actually, Doug, you're the patient statistician of the couple, if I understood correctly. Then can you tell us why would patient statistics be interesting in this context? Let's start with that. What would patient statistics? ring in this context, in this approach of studying glaciers. Yeah, sure. So... I, yeah, okay. So I kind of think that most scientific problems can be cast in a probabilistic way.
And this is certainly true for glaciological modeling, where what you want to do at the end of the day is to take some assumption that you have about the way that the world works, right? A model. And you want to use that model and you want to make a prediction about the future or something that you haven't observed. But you would also like to ingest all of the information that you have collected about the world into that model so that everything ends up remaining self -consistent.
And that ends up being a really helpful paradigm in which to operate for glaciology. So typically, you know, the large -scale goal and what everybody begins their proposals and papers and stuff with is like, glaciers are important for predicting sea level rise.
And to predict sea level rise, what we need to do is we need to take an ice sheet model, ice physics model, and project it, run it into the future, say 200 years or something like that, and say, well, there was this much ice to start with, there's this much ice now, that difference is gonna turn into sea level rise. So that's one part of it. We don't have enough information about how these systems work to just make one prediction, right?
Like we don't know the bed in a whole lot of places like Andy was saying. And so the sensible approach to dealing with that is to say, well, let's put a probability distribution over the bed and let's sample from that probability distribution and make a whole lot of different predictions about what sea level rise is going to be based on all of those different potential realizations of how the bed of the glacier might look. And of course, it's not just the bed that's uncertain.
There's a bunch of other stuff as well. And so that's a very Bayesian way of looking at probability, right? I mean, you can't hardly escape the Bayesian paradigm in geophysics, right? Because we don't have the capacity for repeat samples. All we have is just the one data point, right? So no replicates here. No limiting behavior. And so, you know, there's just this notion of ensemble modeling.
That's what we would call that this notion of randomly sampling from potential model inputs and running into the future. That's a super Bayesian idea to begin with. And then the other sort of step in this process is to say, okay, well, I actually want to constrain what I think the bet is based on these observations that I have, which is to say, I'm going to start with a big pie in the sky view over of what my bet elevation could be, maybe something.
between 5 ,000 meters above sea level and 10 ,000 meters below. But then I'm going to take all of these radar observations that I have and whittle down the space of possible ways that the bed could be. And that's, I mean, that is nothing if not posterior inference, right? Yeah, yeah. Yeah, for sure. Thanks to SuperClean. Maybe a question for the both of you. Do you have a favorite study or project where the collaboration between glaciology and Bayesian stance led to interesting insights?
And yeah, a study that you particularly like, whether that's one of yours or a stunning glaciology from someone else. What do you think, Andy? Yeah. I think as Doug alluded earlier, combining Bayesian methods with the idea of large ensembles, thanks to having access to large high -performance computer systems, have allowed us for the first time to investigate the parameter space in a meaningful way. Before that, you would basically hand tune most of what you did was based on expert judgment.
Like your prior was what you've learned over the past 10 years, so to speak. And surprisingly, Calibration by eyeballing can yield pretty good results, but it only gives you a median or a mean, and it doesn't give you any information about the tails. So, for years, we would publish one study, a mean of one simulation, maybe a few simulations, but we didn't look at the distributions themselves.
and bringing the Bayesian methods into our field, I think have led to a great deal of to have led us to discover an uncomfortable truth that those tails are really large and they are not normally distributed. So ... It's we've realized it's really important to understand the tails and understand the full distribution and not just a mean or a median or any single point realization of that. So yeah, okay, so that's a really good point.
And that reminds me of a study that we didn't do that I think is really good. But it merits maybe just explicitly stating something about glaciological systems, particularly the ice sheets, which is that ice flow and in particular the mechanisms of. Retreat so the potential for you know Antarctica or Greenland in some sense to collapse and not be ice I see anymore to become ice -free.
That's a super nonlinear process in the sense that If if say we get the bed wrong and it's too shallow if we if we if we were to imagine that the bed is Shallower than it actually is then maybe, or I'll rephrase that and say, if the bed is actually shallower than we think it is, then that doesn't really have that many implications for sea level change. If the things change as normal, if the bed is, it just melts away.
If the bed is a lot deeper than we think it is, then all of a sudden you have the potential for the entire ice sheet to float and physically disintegrate via like, the dramatic sort of calving processes that maybe you've seen if you've seen the movie Chasing Ice or one of these other sort of documentaries. And so the consequences of being wrong are asymmetric with respect to some of these unknown factors that govern the system. And there's a really wonderful paper.
that shows this quite explicitly by a colleague of ours named Alex Roebol, who basically just took a simple model of Antarctica, forced it with sort of normally distributed melting noise, more or less, and a bunch of different scenarios, and showed this really big systematic bias towards more mass loss on account of the fundamental asymmetry in the way that these glaciological systems respond to errors in input data. Yeah, that just sounds very fascinating.
I'm super curious to see one of these models. Do you know if there are any open source packages that, for instance, people working in your field are using in Python or in R that kind of wrap the usual models you guys are working on? And also, is there any cool data sets that we can put in the show notes for? people to look around if they want to. Any interesting applications that you think would be interesting, let's put that in the show notes.
You made some super cool visualizations for one of those papers a while ago, didn't you Andy? Well, I can't take credit for that, but I'll send you the link. I think one of our earlier collaborations where we started exploring the idea of large ensembles was funded by NASA and with support from NASA, they helped us visualizing. our simulations on their big screens and narrating it. I'll send you a link. That's all open and open source.
With regard to packages, most of those models that we develop are kind of big beasts. It takes a while to learn them. Right now, there are very few. wrappers around it in Python. The model we developed, you can access stuff through Python, but we're not at the level to use it as a black box. Whether you should be able to use it as a black box is a different question.
But we have a fund a project from the National Science Foundation that drives us towards that goal of reducing the barrier of entry. and reducing the time to actually do science by taking steps like this. So in the next couple of years, our group and others are working towards a cloud version of the model that ideally can just be deployed with the click of a mouse. And, you know, you, for example, choose the parameters you are interested in in your uncertainty quantification.
and the rest is done automatically. Right now you do need inside knowledge on HPC systems. Each HPC system is different. It can take days or weeks just to get the model to run because each system has a different MPI stack, different compilers. You can run into all sorts of problems. So that's just one step. So we are trying to make that easier, but we are not there yet.
I'll give you an anecdote, which is that Andy has made a lot of progress utilizing a very large computational fluid dynamics code for ice sheet flow called the parallel ice sheet model, which is wonderful and super carefully constructed and really a great piece of software. But man, I don't have the attention span to figure out how to learn it.
And so for a lot of the... A lot of the real Bayesian computation stuff that we've done, I got tired and just made Andy run a large ensemble and then we train a neural network to pretend to be PISM and we'll sometimes work with that instead. Well, that sounds like fun too. Yeah, and actually... That's the future. Yeah. Yeah, go ahead, Andy. That's what we're still working on and what I envision to push a bit further in the next couple of years as well.
Okay. Yeah, definitely super, super fascinating. And yeah, Doug, actually, I wanted to ask you a bit more about that because you said you have a background in computer science, so... I'm wondering how do we integrate the Bayesian algorithms into the computational models that you've talked about for studying glaciers? Are you using open source packages? What does your work look like on that front? Yeah, absolutely.
Before I did statistics, I did numerical methods and I still do a lot of that work.
In particular, I work in the branch of numerical methods associated with solving partial differential equations via the finite element method, which is, you know, doesn't really matter how that works, but there's a really wonderful package for solving set equations via that method called FireDrake or Phoenix, and so it's a really nice open source Python package that a ton of scientists are using for all sorts of different applications in computational mechanics.
And so I use that for developing sort of the guts, the dynamical cores, as some might call them, of these models. And it's a nice tool in the sense that it allows for a very straightforward computation of derivatives of the output of those models with respect to the inputs of those models, which is super useful for all sorts of optimization tasks and also approximation in a Bayesian sense tasks, MCMC or other approximation methods.
And so my typical workflow now is to take one of those models and actually wrap it inside of PyTorch. which is sort of a general purpose framework for automatic differentiation that's popular in the machine learning community. And basically what that lets me do is basically view an ice sheet model as if it were a function in PyTorch.
And I can put stuff into the model, I can get stuff out of the model, I can compute misfits with respect to data between what the model predicts and what the... what the data says and basically take derivatives of that with respect to model parameters in a very seamless and easy way. And there's a, I mean, I don't know, it's all just mixing and matching various really awesome open source tools.
Actually, back in the day, when I first got into this stuff, it was all sort of making ice sheet model solvers from scratch in NumPy and then sticking them into PyMC, which you work on, right? Yeah, yeah, exactly. That's why I was also asking. I was curious if you were using PyMC and other hood to do that, because it sounds like it would be an appropriate framework to... to use it. So I was curious. Yeah. No, now, well, I would love to.
Nowadays, the problems that we work on tend to be high dimensional enough that the MCMC methods generally become very challenging to work with. And so we have to do sometimes less good stuff. And Andy, how does that look like? cooperating in these projects, right? How? Because you are more on the practical side of things. So how do you consume the results of the model, I'm actually curious.
And because if I understand correctly, you are intervening before the model, because I'm guessing you're part of the data collecting team and you have the domain knowledge that can be integrated into the model, if there are priors in the model. And then afterwards, of course, you're interpreting... the results of the model. But how does that look like to cooperate with these kind of models and in these contexts?
Well, the high level view of course is that when we collaborate, doctors are thinking and I do the talking or pushing off the buttons and trying to run the models. That would be the simple answer. A lot of dip. Workflow. is still very cumbersome. So Doug has alluded to the different methods of collecting data sets, all the uncertainties associated with them or the lack of uncertainties with these data sets.
Things have gotten better, but you can imagine still each data set, you find it on a different server with a different way to access it. It is probably in a different grid. It most likely has a different spatial reference system. So we are trying to transition from a state where we spend. half of our time just trying to come up with not very robust workflow to get from the data sets on different servers or websites to ingesting them into the model to run the model and then to analyze the data.
Before we had all that great data, things were easy and hard at the same time. All you had were a few data points and you probably had to write an email to your colleague asking to get access to the data point that they may have asked you to be on your paper in return. At least now we have traded that for spending a lot of time trying to find... figure out those workflows. And there are lots of initiatives right now trying to make that workflow easier. But I don't think we're there.
I still feel like this is sort of half of my time I'm spending with processing the data and getting really mad at XRA because it doesn't quite do what I want it to do. It almost always does. what I want it to do and it's amazing and if it doesn't do what I want it to do then it's going to be a long afternoon and sometimes a little bit of yelling too. I've been there. I feel like we've had similar afternoons.
But yeah, XRA saves the day most of the time but when it doesn't, yeah, it's hard to debug for sure. mainly because there is not a lot of tutorials on it in my experience. So you have to figure a lot of these things on your own. Yeah, and yeah, I was also curious about that because on my own also I've been working with a team of researchers. So they are marine biologists. So quite different. It's got to do with water too, but liquid water and yeah, basically a study of trade.
of sharks across the world and that has been super interesting to work with them because of course I'm here, I'm there for the statistical expertise, right? I have nothing to bring on the shark side of things. I've actually learned a lot thanks to them about sharks and shark trade and things like that. And yeah, that to me is also very interesting because... the models are getting more and more intricate.
These are models that now are really hard and I'm like, damn, if you're not kind of a statistician already, it's really hard to come up with that kind of model if you're really a domain expert. And at the same time, to develop the model, you need the domain experts because otherwise, I could not develop that model without the domain experts, even though I know how to code the model.
And... And I find that also super interesting to see that in a way because it's like, it's also good illustration of what science is, right? It's like really the sum is bigger than each party on its own. But at the same time, as the statistician, you know, I'm a bit frustrated because I know the model, for instance, is not going to be in the paper, for instance. The model is going to be the appendix of the paper. I'm like, oh my God, but it's a beautiful model. I would definitely focus on that.
But my point is, collaborating with the domain experts has been also super interesting because as you were saying, Andy, there are still some parts of the workflow. So on mine, I'm talking about the Bayesian workflow, which are cleaning, which can only need to be updated and improved and working. like that with people who mainly use the model and consume it instead of writing it is super valuable. So yeah, I don't know, Doug, maybe if you have stuff to add on that because I'm listening to you.
Yeah, I mean, what you're saying, I think, is going to resonate with anybody that's trying to work across disciplinary boundaries, which is, I mean, ultimately what we need to do across all branches of science right now, right?
We have all of these amazing statistical methods and... numerical methods and also so much knowledge about the way the structural assumptions that go into how the world works and We have to combine those things to make good progress now, but man if you if It's very difficult to find a circumstance in which somebody's really figured that collaboration out in a in a in a problem -free way, it's Yeah, it's it's challenging I agree it's hard.
I've been involved in a bunch of larger scale projects trying to bring together data scientists and domain scientists and it's kind of both parties sort of need to learn to speak the other parties language and it especially for the data scientists it can be a challenge because you know, let me put it that way. They have really big hammers. They have awesome tools. And we just, you know, in glaciology, we just started taking baby steps. So most of these awesome tools we actually don't need.
We need like what they had in undergrad, like the most basic neural network or something like that will already get us from here to 90%. So when you collaborate with them, they're I can't blame them, I would get bored too. But it's like, no, no, we just need like a simple neural network and that will do the job. So as Doc said, having being able to straddle both worlds between the domain science and the data science is a challenge and we need more people doing this.
I think in our field right now, there's only a handful of people that I would trust. that they're able to do that, Doc is one among them and maybe three or four others. And I think we need more people who are capable to, who are bilingual in data science and in domain science.
But the one, so the thing I'll say I guess is that since this is, we're all Bayesian statistics boosters here, is that Bayes theorem or maybe more, more specifically or broadly, the posterior predictive distribution, if we can use some technical language for a second. It provides an exceptionally useful blueprint for talking to people across disciplinary boundaries. Because I can write this down and I can say, OK, here are the things, domain scientists, that I need from you.
I need you to tell me what you want to predict. Like in the case of glaciology, that often ends up being sea level rise or volume change. And it's like, OK, I can work with that. I need you to provide to me a set of structural assumptions that encodes your best understanding as a domain expert of how the world works. That's your numerical model. It's going to take in some inputs. It's going to produce some outputs.
I need you to tell me what aspects of that model you don't feel like you know enough about. I need you to tell me what observations you have available to you. And then we can put these things all together in a big flow chart, a graph, right? Presumably a directed acyclic graph that prescribes all of the causal relationships in the system.
And then once that picture is drawn, me as a person that understands sort of the numerical methods, the nuts and bolts of doing inference and prediction in this sort of probabilistic framework, I can take that picture and I can convert that into code and I can bring to bear the statistical tools.
So like the Bayesian language of cause and effect and uncertainty is like a neutral ground that I think that we can all start to use to act as a mechanism for translating the language that we all use in different fields. Yeah, learning the Bayes theorem and whatever is associated with it. certainly has opened my world quite a bit in terms of how I think about a problem and I found it the right way to encapsulate my thoughts.
And as Doug said, it sort of levels the playing field that it provides that common language that the base theorem, I think it's closely associated with how we do stuff or think about problems in geoscience. And that has started to make things so much easier.
If you just sit down as Doc said, you write down the probability of sea level rise given, and then, you know, you start with the chain rule, you have your models, you try to come up with a likelihood model, you try to come up with priors for your parameters. And even as like a non -Basian expert, it still provides me with a way to think about it. and provides me with the tools to talk about Doc, with Doc and others about the problems that I have and the goals I want to achieve.
Yeah, yeah, awesome points. And definitely agree that, yeah, also making the effort of making sure we're talking about the same things and educating on these concepts is absolutely crucial. And, well, Andy, so to shift gears a bit, there is a project of yours, and since I see the time running by, there is something I really want to ask you about, and that's... the Parallel Ice Sheet Model, so PISM. I don't think we've mentioned it yet, and yeah, I'm curious about that. What does that mean?
What are you doing with this project? The general ice sheet model or PISM in short started a little bit before I came to Alaska as a postdoc. In fact, few of us may even remember the time before the first iPhone and PISM started a year before the, I think the first iPhone came out and it was the first open source ice sheet model. But at the same time, it was the first openly developed ice sheet model.
Lots of other models have come later and opened their code after, you know, some, after they have reached some maturity. And basically we can go back to commit number one from 2006 or something like that and look at the first line that has been written. And this is mostly thanks to a mathematician named Ed Buehler here at the University of Fairbanks and his, at that time, grad student.
Chad Brown, who somehow got into ice sheet modeling, I think similar to Doc, through mountaineering, going over glaciers, climbing up on ice and getting fascinated with ice as a geophysical fluid. And they started developing a model slightly differently than it has been developed in the past by individual glaciologists without... often without like a super strong background in math and numerical analysis.
So PISM started from writing or by writing validation tests first and then developing the most appropriate numerical methods to solve the problem. And as the name said, the P stands for parallel. So it was also one of the first models that was. developed from scratch in MPI via PETSI and could take advantage of larger HP systems versus at that time when PISM started, you would run your ice sheet model on a single core on your laptop. Since then, the project has grown quite a bit.
The University of Alaska here is still the lead developer. I have full -time software engineer. who does a lot of the testing code development, works with users. We have another team at the Potsdam Institute for Climate Impact Research in Potsdam in Germany, who does a lot of the development as well. And then there are 30 to 40 -ish users scattered around the world who either develop the model or use it purely for trying to answer scientific questions.
and one of the best compliments we have ever gotten about our model is, or was when we found the first publication by accident of someone who just found the model online, went on GitHub, downloaded it, compiled it, figured out how it works because it is well documented, did some cool science with it and got it through peer review. So they never even had to contact. the developers to get help to get anything done. And for us, that's a big compliment.
There are other models where you kind of need to take like a one week long course to even get started. And we've been trying to maintain that level of documentation and co -transparency by keeping a relatively stable well thought out. API, something like that. So through all that backbone development, it has become one of the leading models to answer questions revolving around glaciology and sea level rise.
Of course, again, because it started in 2006, it is starting to age and things that, for example, Doc mentioned that he's developing with his fire -direct code coupled to um, tight torch. This is something we cannot yet offer and it may not be feasible because there's so much legacy code that we can't handle a smooth transition. Yeah, I didn't know that project was that. Oh, that's impressive. And I'm guessing that requires quite a lot of collaboration with quite a lot of people.
So well done on that. Thank you. That's incredible. Yeah. Any links, if there are any links that people interested in could dig into, feel free to join that to the show notes. because I think that's a very interesting project. Doug, I'm also curious, I think I've seen preparing for the show that you, and I think you've talked about that at the beginning, you work on echo geomorphic effects. Can you tell us what this is and what that means and why that's interesting? Sure. Sure, yeah.
I would not say that I am an eco -geomorphologist by any stretch of the imagination, but when you work on glaciology in Alaska, I think we're always interested in understanding and communicating the importance of glacial systems beyond their influence on sea level rise.
Because it turns out that if you plop a giant chunk of ice somewhere on the coastline, it's going to have implications for what the water chemistry is like and what the water temperature is like and what the local climate is like and maybe more broadly how animals can move around and a whole bunch of other stuff.
And so one project that I'm super excited about, we've been working on this for a couple of years, is to try and understand the future evolution of a very large glacier in coastal Alaska called Malaspina Glacier. It's very conspicuous. feature if you ever look at the coastline of Alaska on Google Earth or something like that.
And it also happens to sit very close to a really robust Alaska native community that uses the forelands of the glacier and the adjacent areas as hunting and fishing grounds. And through the course of our modeling, and we can say, this with a fair bit of confidence because we've done a complete probabilistic treatment, we can say that it's very likely that this very large glacier is more or less going to disappear in the next certainly century, maybe faster than that.
And when that happens, it'll open up a new fjord, Icefield Valley. The forelands might start to degrade. And the whole landscape of that area that people are using for all sorts of things, for gathering food and transportation and a ton of other activities, it's all going to change a lot.
And so I'm really excited about being able to utilize some of these modeling tools, particularly in conjunction with robust uncertainty quantification frameworks to provide responsible defensible predictions about how this place is going to be different in the coming years to the people that live there. Yeah. Okay. That makes more sense now. And geo -ecomorphitration, that's the term. That's pretty impressive.
Geo -geomorphology, I guess that's... I guess you'd say that that'd be the study of how ecosystems change in response to changes in the way that the earth shapes. Yeah. That's what you want to do to say... at parties, you know, like Fisher. Awesome. Well, thanks a lot, guys. We're going to start wrapping up because I don't want to take too much of your time, but of course I still would have lots of questions.
Maybe, yeah, something I'd like to hear you both about is potential development, potential applications of of what you're doing right now. Where would you like to see the research in glaciology and ice sheet modeling going in the coming years? What is the most exciting to you? Maybe Andy first. Maybe I'll start with the not so exciting part.
because especially now with those new methods that we're developing, machine learning, artificial intelligence and large data sets, I think there is still a lot to be done just trying to understand the data sets we already have with relatively simple methods. I say this is not particularly exciting and it's also harder to get funding to do that. funding agencies like to see something very new, something shiny.
But sometimes you can make a bunch of progress by just bringing together bits and pieces that you already have, but you just never have time for that. You could develop an algorithm that describes how a glacier caps off in Antarctica and you test it and it works very well there. But then you have to go on and develop something new. you're rarely left with the time to test, well, would that be a good idea for Mellaspino Glacier or for a glacier in anywhere in Alaska or in Greenland as well?
So if I had some time and some money, this is where I think I could make a bunch of progress with relatively little effort. Maybe Doc wants to start with the shiny stuff. Shiny stuff, I don't know. You know what's always a perpetual source of inspiration for me is the United States National Weather Service.
I go on their website and I type in my town name and I click on a location on a little map and it shows me a pretty high accuracy prediction of what the weather is going to look like where I'm at for the next like seven days or something like that. And I... It's this innocuous little interface, but it overlies this incredible system of computational fluid mechanics combined with real -time integration of data products in a probabilistic way. They're doing ensemble modeling.
There's so much to it, and it's this incredible operational system that has just a wonderful, useful interface for people. And you know... I think that we are getting maybe to the point in glaciology with our understanding of methods and capacities and stuff to maybe do something like that. And that's what I'm most excited about is real -time forecasting for every little chunk of glacier ice in the world. Yeah, that sounds very interesting. I'm going to look at that page.
Yeah, let's send that to the shuttles. That sounds very fun. I know, but that for sure. Weather .gov, I bet it's the most widely used application of Bayesian statistics in geophysics of any of them. Interesting. Well, if anybody in the listeners knows someone working at weather .gov who could come on the podcast, to talk about the application of patient methods at weather .gov. My door is open. That would be a great episode. Yeah. Absolutely.
I've done a somewhat, I mean, a related episode a few months or years ago, I don't remember, about gravitation waves. So not gravitational waves, but gravitation waves. I didn't know that existed. That was super interesting. And I'm going to... I'm going to link to this episode in the show notes because that was a very cool one basically talking about the mass of really big mountains.
So probably what the mountains you have in Alaska, Andy and like basically the wave they create through their gravity, which is non -negligible in comparison to the gravity of the earth, which is just pretty incredible. and that has impacts on the weather. So definitely gonna link to that. Before closing up the show though, I'm gonna ask you the last two questions I ask every guest at the end of the show. First one, if you had unlimited time and resources, which problem would you try to solve?
I feel like Andy, you've almost answered that, but I'm still gonna ask you again. Maybe that gives you an opportunity to answer something else. Yes, I've came to Alaska over 15 years ago and I've done modeling of the Antarctic ice sheet, of the Greenland ice sheet, of glaciers in the Alps and Scandinavia and we haven't done much. with Alaskan glaciers. Doug was mentioning their projects on Malaspino glaciers and the surrounding area. But because Alaska is so big, the challenges are equally big.
Understanding the precipitation there, where you go from sea level up to 5 ,000 meters within a couple tens of kilometers poses interesting challenges to like, any modeling or observational approach. And after living here for that long, within unlimited resources, I think I would like to give back to Alaska and study Alaskan glaciers. So I would invest in both observational and modeling capabilities to better understand how the Arctic here in Alaska is changing.
That's like, sounds differently like a a very interesting project. Doug, what about you? Well, yeah, if I'm limited to glaciology, then I suppose I would say what I did before about this notion of a worldwide, every glacier forecasting tool that was widely usable by the general public. I think I'll stick with that one. But since my resources are unlimited, I guess while I'm doing that, I will pay a whole bunch of other people to go out and sort out the whole nuclear fusion thing.
And then there'll be enough electricity to run my computer. That sounds like a good thing to do indeed. And second question, if you could have dinner with any great scientific mind, dead, alive, or fictional, who would it be? So Doug, let's start with you. Sure. Man, why do we call it Bayesian statistics? We should really be calling it Laplacian statistics, right? Yeah. He came up with this notion that we should view probability as a means for communicating our knowledge of a process.
And I think that that's the most Perhaps the most important scientific idea that nobody ever mentions. So I'm going to go with Laplace. I would be really interested to see how he felt about the application of probability in that way to these more complicated systems as well. I love that. And not only because that was my personal answer also in one of the episodes I've done. Awesome. Andy, we'll get to you. But before that, I found the episode I was referencing.
So that was episode 64 with Laura Mansfield. And we were talking about modeling the climate and gravity waves. I think I said gravitational waves. That was wrong. That's gravity waves. Andy, who would you have dinner with? Well, I feel like I'm pretty blessed. I think I have... dinner with great scientific minds on a regular basis when I have dinner with my colleagues at scientific conferences. But if I just pick one person, let's... How about I'll meet Aristostinus?
I'm not sure I pronounced that correctly. He was, I believe, the first one to estimate the circumference of the earth. And I think that was like several, couple hundred years BC. I'm just curious how people thought about science in an environment several thousand years ago. I would love to chat with someone like far back who... came up with like, I think the estimate that he came up with was maybe within 10 % or something like that.
And then suddenly like a thousand years later, people thought yours was flat. I think that would be an interesting person to meet. Yeah, for sure. Good one. I think you're the first one to choose that. I love it. What's the most common answer you get for that question? Well, that question is... bit more like the variation is bigger than the first one. The first one has a clear winner if I remember correctly, which is climate change. So we have a lot of people who would try and tackle that.
The second question, I think one of the most common is Richard Feynman, if I remember correctly. I believe so. Yeah, I think Feynman is the winner, but it's not... Pareto distribution. It's a pretty uniform distribution. It's not like... Yeah, I'm curious. Not a lot of people choose Laplace. Not a lot of people choose base. And interestingly, I think nobody chose base until now. Yeah. Not a lot of people have chosen Einstein.
So that's an interesting question because that kind of goes against prior. It's hard to guess. Sorry, Andy. I would have thought like Einstein or Newton or Galileo would come up pretty frequently. No, Galileo, I don't think so. Leonardo da Vinci does come up quite a lot. But yeah, otherwise, I had Euclid once, of course. That was a fun one, too. Awesome guys, well I think we can call it a show, I've taken enough of your time, thank you for being so generous.
Before we close up though, is there something I forgot to ask you about and that you would like to mention or talk about before we close up? I don't think so, not for me. I think it was a pretty comprehensive journey. Yeah. Great. Believe me, I would still have like, I could keep you for two hours, but no. Let's be parsimonious. Awesome. Well, again, thank you very much, Andy. Thank you very much, Dag. As usual, those who want to dig deeper, refer to the show notes because we have.
Andy's and Doug's links over there and also a bit of the work. And on that note, thanks again, Andy and Doug for taking the time and being on this show. Thanks Alex. Thanks Alex. Thanks for having us. This has been another episode of Learning Bayesian Statistics. Be sure to rate, review, and follow the show on your favorite podcatcher, and visit learnbaystats .com for more resources about today's topics, as well as access to more episodes to help you reach true Bayesian state of mind.
That's learnbaystats .com. Our theme music is Good Bayesian by Baba Brinkman, fit MC Lass and Meghiraam. Check out his awesome work at bababrinkman .com. I'm your host. Alex Andorra. You can follow me on Twitter at Alex underscore Andorra, like the country. You can support the show and unlock exclusive benefits by visiting Patreon .com slash LearnBasedDance. Thank you so much for listening and for your support. You're truly a good Bayesian. Change your predictions after taking information.
And if you're thinking I'll be less than amazing, let's adjust those expectations. Let me show you how to be a good Bayesian Change calculations after taking fresh data in Those predictions that your brain is making Let's get them on a solid foundation