Today, I'm excited to be joined by Jesse Krabowski, a PhD candidate at Paris 1 Panthéon Sorbonne and principal data scientist at Pimsey Labs. Jesse's diverse background spans macroeconomics, finance, and machine learning, with a wealth of experience that includes roles at the OECD and across Africa, Asia, Europe, and North America. In this episode, JC texts us on our journey through the fascinating world of time series analysis.
We dive deep into state space models, exploring their flexibility in capturing observed and hidden states and their ability to decompose time series data into components like trends, seasonality and autoregressive elements. We also discuss the Kahneman filter, its historical roots in rocket tracking, and its modern applications in filtering noise and updating beliefs about dynamic systems.
Jesse shares insights into how these tools are applied not only in economics, but also in fields like smart analytics. This is the Earning Basics Statistics, episode 124, recorded October 23, 2024. Welcome Bayesian Statistics, a podcast about Bayesian inference, the methods, the projects, and the people who make it possible. I'm your host, Alex Andorra. You can follow me on Twitter at alex-underscore-andorra. like the country. For any info about the show, learnbasedats.com is the last to be.
Show notes, becoming a corporate sponsor, unlocking Beijing Merch, supporting the show on Patreon, everything is in there. That's learnbasedats.com. If you're interested in one-on-one mentorship, online courses, or statistical consulting, feel free to reach out and book a call at topmate.io slash Alex underscore and Dora. See you around, folks. and best patient wishes to you all. And if today's discussion sparked ideas for your business, well, our team at Pimc Labs can help bring them to life.
Check us out at pimc-labs.com. J. C. Grabowski, welcome to Learning Basics and Statistics. Thank you. Yeah, have quite of a lag, let's say that to the listeners, so we'll try to navigate that as much as possible. But you're very far from me, you're still on planet Earth, but we're very far apart, let's say that, without divulging any personal information. And I can say it is very late where you are, so thank you very much for staying up that late.
I know you're doing it only for the show and for the listeners, so thank you very much. Yeah, I I do anything for the fans. You have to make sacrifices in life, and these are the ones that matter. So I'm very happy to be here. I'm very happy to do that. If I have to take the bullet to get Bayesian entertainment, infotainment, out into the world, I will do it. I appreciate the dedication.
So let's start actually as we always do with your, with your origin story, Jesse, because, well, I know you from the, from the Pimcey world, you're a Pimcey card developer, you also work at Pimcey Labs. So when I was working there, we definitely, we work together even on some clients, but I've never met you in person yet. So, actually don't know really what your personal story is, you know, how did you start working on these topics? And also, what are you doing nowadays? Yeah, that's true.
We sort of know each other more as like pictures and text. So it's very nice to have an opportunity to finally see each other and have a conversation. I am an economist. I am currently a PhD student at Paris I Pantheon's Open in my fifth year, I guess. I'm wrapping up soon. I'll defend soon. One hopes. My wife hopes. My directors hope. We'll see. My route to Bayes, though, specifically was sort of very roundabout.
As you can imagine, economics is not a discipline that has been infiltrated by the Bayesians. There are, of course, Bayesian econometricians. But they're few and far between. They're not the standard. I never got a Bayes course during any of my econometrics courses. I think, you know, economists are very much living in their own world and they always want to do their own thing.
So the Bayes papers I have seen, they're still deriving their own Gibbs samplers and the papers are more piles of statistics than they are applied causal questions. So they barely look like Bayes. kind of as I know it. Yeah, so my path was I did an undergrad at Michigan State University. My very first ever econometrics professor was Jeffrey Woldridge, who's quite well known. He wrote a quite well received textbook on sort of introduction to statistics and also panel regression models.
And then I took a break. I was a farmer in Senegal. I was a kindergarten teacher. I dug holes in the sun for a while and at some point I thought like, well, you know, I should probably do something. more serious with my life. How about going back and getting a master's?
Went into finance and finance, I was seeing all of these traditional econometrics models, like these time series regressions, VARs, VECMs, where you're doing sort of, you're jumping through these hoops to build an estimator and then run essentially OLS with some correction on the standard errors. Or you're doing some like Rubin causality, pseudo experimental designs, like out of mostly harmless econometrics, know, diff and diff or instrumental variables.
And life really felt like, you know, that flow diagram that Bayesians show their kids to scare them at night, where it sort of explains like what kind of data do you have? Like then do this test, like what kind of, then do this, then do this. That's really how statistics felt. I felt like it was all old, was stuck in the past. So I thought, this was like in 2012, 2013, when like random forests were all the hotness. So I thought like, learn machine learning. Machine learning is gonna be great.
It's gonna change the game. I'm gonna fit all kinds of nonlinear models. I'm gonna train a neural network. And I got into that for a while. It's cool. But then I felt like I had traded one black box for another. Yeah. You can jump through all the hoops to get the estimators with the corrected standard errors. Fine. Or you can jump through all the hoops to convert your data into such and such format and then feed it into a random forest, pass it into XGBoost. And then you get something out.
Also fine. Or maybe the estimates are better, but then you can't really do anything with it. Like you have no causals. You can't have no interpretation to it. A lot of times you can't even get feature importances out. You can do something with like Shapley values and like permutation stuff, but it's not as good even as looking at just beta coefficients in linear regression. So I looked for a middle ground and Bayes kind of came up naturally and especially PyMC. Yeah, this whole.
flexible modeling structure, right, where you can just do whatever you want. The power of Bayes is not that you're doing uncertainty quantification, although that's really nice, right? To me, the power of Bayes is that there's exactly one estimator, the posterior, and it's robust to any functional form that you pass to it. So if I want to do just an OLS regression, no sweat. And whatever kinds of nonlinearities I want to put into that, there's no reason to do any kind of adjustments to it.
You just get out the posterior, you do your posterior predictive samples, you compute whatever you want from those samples and you get the answers you're looking for. And then along the way, you also get these additional benefits. You get uncertainty quantification, you get a community that takes causality as the most important thing. You get these very interesting numerical tools like pi tensor.
would that kind of allow you to backdoor into other machine learning techniques and really blend things together with stuff like GPS or Bayesian neural nets. So it's just really a robust field. It's a really welcoming community. It's a really broad and powerful toolkit. I mean, I'm preaching to the choir of course, but that's kind of what drew me in. It's definitely what kept me around.
Yeah, I definitely resonate with everything you just said because that's also my experience when I started learning patient statistics. Really, the community was one of the main reasons why I sticked with it because there's still hard stuff, right? And you have to learn all the time and you have to be uncomfortable all the time and sometimes feel that you're like, you know, do I really know that? How much of an imposter am I? So that really helps.
And for you personally, I'm curious to hear what your favorite part of that whole framework is, because you do a lot of stuff today. We're gonna talk about the fantastic work you've done and you're still doing on the PMT experimental package to aggregate state space models. So we're gonna talk about everything time series today. But I know you also work a bit on Pytensor. So I'm curious, you know, what's your favorite task I would say from these open source world is.
Yeah. So we were chatting slightly before we started recording and Alex made a comment about his own Bayesian journey where he said at some point he became more interested in the methods than the topics. And I, that really like hit right here, you I'm very deeply interested in numerical computing and these methods of computation in back propagation. So I don't know if I have a commit to PyMC. I might have one.
I think all of my work, I've done, of course, all this stuff in experimental, but I've done much, much more work in PyTensor. And a lot of what I've been working on is linear algebra. Some of that's selfish because I needed certain tools for my PhD thesis. So I just came in and said, like, I don't have a... I need a function that solves discrete algebraic Riccati equations. know, like, do we have that? Like, no, okay, I'll write the app for it.
but you know how open source is it's sort of a bit of a gravity well, you know, once it's got you in its claws, especially PI MC specifically, we have some very good devs and very friendly devs, but they're like, you know, you catch more flies with honey, right? So you'll come to PMC forums and you'll say, oh, I have this problem. And they'll say, yeah, that's a bug. Open a PR. And you'll say, oh yeah, I want to be helpful. I'll open a PR.
And so you open a PR, sorry, open an issue, open an issue. And so you go and you, oh yeah, yeah, I can do that. I'll open an issue. And they'll say, okay, yeah, yeah, you can fix that. It's a good first issue. Open a PR. Like, well, I don't know how to open a PR. Like, oh, don't worry about it. Don't worry about it. Just do it. And so you do it and people are really patient and they walk, they hold your hand and you go through five or six rounds of reviews.
And then they say, oh, actually, this all wrong. And they rewrite it anyway and then push it. But your name's on it. And you're kind of like, wow, I contributed to one of the most popular probabilistic programming languages in the world. That's nice feeling. My name's kind of on that now. So then you run into another problem and another problem, and it of keeps going and keeps going. I've wandered a field of your question. Yeah, to directly answer it's, like the, I like the backend stuff.
I like the thankless jobs. I'd like to think that I'm trying to hit the parts of the code base that everybody needs, but nobody prioritizes because it's not the thing that you actually call as a user, right? You'll never know that, you know, a matrix inverse is being rewritten into a more efficient form. It's just that suddenly your models run faster. and everybody's happy. But I'm happy to be the guy who's trying to think about how can we do this better and implementing that.
Yeah, and I thank you for it because I'm much more of a front end developer. say, you know, I like developing the models and making the API for the users easier. So it's very much user facing usually what I do and develop.
Yeah, I think that's also I am very grateful for the work you do because then that makes My job not necessarily easier, but it gives me a lot of ideas It's like wait, so now we can do these so if I combine that stuff with that thing we can do some cool new model And also we can do that new API. That's way that's just way easier for people to use so that's very complimentary but Yeah, it's like, this is not something I really enjoy doing. So I really love that you do enjoy doing that.
that like, yeah. Thank you so much. It takes all types, right? I mean, well, thank you. did, you did pm.findConstraintPrior, right? So thank you. I use that one every day. Yeah, yeah, I mean, me too. You're welcome. Yeah, we're doing that quite a lot with, well, with Ricardo, of course, Ricardo Viera and Luciano Paz. So yeah, that's something we just, we came up with that during one of the Pine Scenaps retreats because, yeah, I mean, I use that all the time now.
So, and I wanted to do that also before. and I was like, just, you know, it feels too hard to do that, but it's kind of like, we kind of know how to do that. So instead of having, especially the gamma distribution, for instance, you know, you're like, I want a gamma. with alpha equals 25, beta equals 52. It's like, how the hell do I get there? And before that, I was doing that by hand and just checking the plots and that was horrible. That's wild. So we came up with fine constraint prior.
We just does that basically talk to the machine and you tell the machine, well, just give me a gamma that has most of its mass between five and 20 and that's fine. I don't care about which parameters give me that actually. And now there are smarter people than me who put that into another package that's called Prellies. And actually, Osvaldo Martin is one of the main people who worked on that.
So I definitely recommend listeners to check out Prellies, but also to listen to a very new episode that we're going to have with Osvaldo Martin. That should be episode 122. So just before you.
so by the magic of time travel when your episode is out people will be able to listen to the previous episode where Osvaldo is talking about all stuff prelis and prior elicitation and all the cool stuff you can do with that so definitely recommend checking that out yeah so I wanted to have you on the show today to talk about your masterpiece well at least the first one because you're You're still very young, I'm very sure that you're going to have other masterpieces along the way.
But I want to talk about the current one because you contributed a sub module to the PIMES experimental package that's called state space. And so I put that in the show notes already. So for people who want to dig deeper, you have the link in the show notes.
and you have also the link to Jessie's GitHub repository of a presentation you did about that so you can also check these notebooks out but if you want to play with the submodule just install PIMC experimental and then you have everything out of the box first thing though, Jessie something because I've personally always been very interested in time series because
a lot of my job has to do with time series and even more now with the marlins and the thing is I've used a lot of gps so gautian processes to work with time series but sometimes gps are too complicated to fit so that's when using structural more structured time series is very interesting so That's how I started digging into that and reading a lot and I'm still learning a lot about that topic.
The first thing I noticed is there's so many different terms, structural time series, state space models, autoregressive moving average, integrated moving average. It's like, you I'm lost when you start. So first, could you define what a state space model is for us? Sure. Sure. So I think it's useful to think about two types of time series models. So one, I call just a curve fitting problem. So people might be familiar with profit, for example, or, okay, even take a step back, right?
If you do a linear regression where your design matrix is a column of ones for an intercept and then a ticker that just counts zero, one, two, three, four, five, six, seven, eight, right? This is already a time series model because you're integrating time. Okay. It's just, it's simple. You don't have these recursive relationships. So this notion of recursion is going to be at the heart of what makes time series different to everything else that gets done in statistics.
So a state space model, is a model that is defined by some policy function, which we'll call g, such that I'm in a state of the world xt, and that if I know the function g, it takes me from xt to xt plus one deterministically. And I can just recursively apply g, because it only depends on the current state vector xt. xt completely summarizes the entire world. Yeah, as it matters to me. And then I apply this g of xt and maybe structural innovations. So again, with this terminology, right?
Some random vector epsilon. So you have your xt is equal to g of xt minus one and epsilon at time t. So I come to today with a state of the world that I inherited from yesterday. I turn on the news and I learned something new about today. right now, and then I make a plan and I sort of act. however I act, that's defined by this policy function G. So there's also lot of connections to control theory and sort of this like theory of optimizing agents.
Yeah. It gets lost when you just talk about things like aremas and soremas and all of this, but you can really think about there's somebody acting in the world. and that person needs to decide what to do and how she chooses to act is going to create the next state of the world. And this function G is going to totally summarize how she's going to choose to behave given what she observes, both the history and the innovations, the news.
So a state space model is a model where Basically this, you have a state space, which is this XT vector, and it recursively updates by this policy function g. Right, so that means that state space models are the broad category that... encapsulates all the different models we're going to talk about today. So SARIMA, vector autoregressive, structural time series, all these models can be considered as state space models.
Gaussian random walk is one of the simplest ones, but it's still a state space models. Yeah, it can be represented as a state space model, right? A Gaussian random walk is a nice point of departure. because it's a model that has a lot of obvious equivalency. mean, obvious. You can think of a Gaussian random walk as just the cumulative sum of a series of iid normal draws. And in this case, you're not taking the state space of view, right?
You're just going to draw a hundred Gaussians independently, and then you're going to take a cum sum, and that's still a Gaussian random walk. but it doesn't have this notion of recursion built into it, but you can do a little bit of algebra and you can find a form to represent it that way. It also admits a GP representation. So it's quite interesting. Yeah, so a Gaussian random walk is a really great first model to study and kind of see the differences.
State spaces are extremely powerful and they're extremely expressive. So you can basically, can you write anything as a state space? I'm going to claim you can write anything as a state space. If there's a very wise listener who can find like a really cool counter example, please let me know. But a Gaussian random walk is a model that you can write as a non-state space. Yeah. So we have this idea of recursion. So you could just say like, tomorrow looks like today.
Yeah. You have this, In expectation, your best guess for the future is just today with a little bit of noise. And if you just iterate forward on that expectation, then you get a Gaussian random walk. Now, on the other hand, and this is what's in PyMC, is you could just sample a bunch of IID normals. You could say, I have 100 time steps. I want to forecast. I'm just going to sample an IID normal at every time step. And then I'm going to take the cum sum across all of those Gaussians.
And that also gives you a Gaussian random walk. And it turns out you can also write it as a GP. If you take the outer product of a feature vector and then take the maximum at each point, that ends up being a valid covariance kernel and it works out to be a Gaussian random walk. So it's this funny little model that is, can be written in a lot of different ways, but can give you a first intuition about how a state space works. Yeah. In how would you say you have outgrown?
the need for a Gaussian random walk because that's always kind of the you know the main question in modeling is that yeah you also you always can do a very you know fancy or model you could always turn to a GP or to a generative additive model with a step linear trend and and not a regressive component on the residuals blah blah you could you can always do that should you though how would you make that kind of decision Yeah, right. It's so time series is very rooted in forecasting.
I think more than anything else, time series is very, concerned with out of sample and all of these different models, you'll write them down and they'll look very much the same until you start to extrapolate beyond the data manifold. And you start to ask, how does tomorrow look? does the day after tomorrow look? And what your assumptions about those forecasts are, going to inform the types of models that you reach for.
So like outgrowing a Gaussian random walk, it's funny because it's kind of a microcosm of the statistics journey we talked about where you want to have something fancy. So you leave the Gaussian random walk and you're like, I'm going to make a, a Varma and I'm going to have a hundred million time series. You know, I'm going to do this really complicated thing. And then it forecasts just as well as a Gaussian random walk. it's sort of like you come home.
A Gaussian random walk encodes the prior that tomorrow looks like today. Yeah. That your best guess for the value of the world, sort of the state of the world tomorrow is whatever you see today. The traditional example is stock prices, right? Stock prices are said to incorporate all information public and private about the value of a firm. So there's absolutely nothing that you can go and find in the world to improve. your prediction of what that price ought to be.
Therefore, the logic goes, your guess of what the price is going to be tomorrow is just the price today, plus whatever news you wake up and see tomorrow. Yeah, yeah, that makes sense. So then you would say that if you need something more complicated than that, then that would be good justification to switch to something more complicated than the random walk. Exactly. if you know, so let's say you're working on temperature data, right?
You know that the temperature varies from daytime to nighttime and also from season to season. So, right, like right away, the Gaussian random walk assumption is broken because I can do better on my forecast tomorrow by incorporating additional information about, yeah, what season is it? So you get a seasonality component or something like GDP, gross domestic product is a function of how much stuff you're going to make is a function of how much stuff you have, right?
If you have more factories, you can produce more stuff. So if you have a high GDP, you're going to have higher GDP growth, which implies that there's some kind of exponential, excuse me, exponential trend process going on. So now a Gaussian random walk is broken again, because I know that I'm at some such and such level. And therefore my guess for tomorrow is going to be larger than if I knew that I was at some lower level. I see.
Yeah. And how does that, there is another term that's very important. in this literature that's innovations. What is that, Like you said, I think that time series people have an incentive to squid ink a little bit. Everything has to seem a little bit more complicated than it is. And step one to doing this, as any good... member of any priest-caste knows is to invent a bunch of vocabulary that nobody understands and to have your own jargon. An innovation is a random perturbation.
It's just an error. It's just a residual. being more fair to time series people, it has a different name because it has a slightly different notion. When you think about OLS, for example, when you think about the error structure, my errors are normally distributed or student T distributed or what have you, you're thinking about what is the shape of information that I don't have access to, something like this, right?
I have a guess on what the sort of population average is for how tall people are, but you know, it could be high, could be low, but on average I'm going to be right. And I'm not going to be biased in one direction or the other, therefore it's normally distributed. The word innovation is trying to suggest that number one, it's something that's coming in from outside. The time series process really is.
driving how the world is evolving, like a boat on the ocean, you know, it sort of is going along. But then it's also buffeted by waves and by winds. And those are the innovations and they knock around the course of the boat and they cause history to unroll in one way or another. So in implementation terms, it's just going to be the same as a residual, but sort of philosophically, there is this difference. Okay, so innovations have to do with the standard deviations of the residuals?
You can think of it something like that from a very statistical perspective. But again, I would push you to the philosophical perspective that an innovation is the standard deviation of external shocks that push the system around beyond the dynamics that you're modeling.
Okay, so it could be the magnitude of the of the external shocks exactly Yeah, and you could really think of it in terms of shocks right like when I write a structural economy macroeconomic model, they're called shocks we think about like yeah, Apple invents a new sort of technology that makes everybody more productive and nobody's foresaw that it's nowhere in the model But then suddenly the economy moves faster So it's really like an exogenous shock and that's an innovation, Like literally.
Yeah. I see. Another example I'm thinking about is like, for instance, if you are studying the income of taxes before Uber and then you have Uber coming in. and that would disrupt extremely the market. And so that's a shock. then the magnitude of that shock would have to be taken into account by your model.
you underestimated the magnitude of the potential shocks, so you underestimate the standard deviation of the innovations, then your model will have troubles predicting what a potential new Uber would do to the market. Exactly. And your forecast will be too optimistic, right? A lot of this is going to come down to the HDIs on your forecast uncertainty.
So if you, like you say, if you just say like taxi revenues are going to continue kind of as they were in the 1990s and the 2000s and the early 2010s, you know, you would have sort of predicted like, you know, flat or maybe some small growth. But then as soon as you say like, well, there's this possibility for massive disruptions, right? then your uncertainty bands kind of explode because is that going to come today? Is it going to come tomorrow? Your model can't say, but the possibility exists.
yeah yeah that makes sense to continue on the terminology before we we dive into you know the different kind of models and another two other terms that are present all the time are observed states and hidden states so that i think is very important to define especially because i'm guessing they could be confused with hidden Markov models. yeah, that's very important to define what they are.
Yeah. But there is a deep connection there because a hidden Markov model maybe is just a type of state space model that people are familiar with. It's a special kind of state space where the observations are what they are. mean, maybe they're Gaussian. But then the hidden state is categorical. Right, you're in state A or state B and then you have some transition matrix from state A to state B. So this is also a state-states model.
And the difference between hidden states and observed states is quite natural for Bayesians, I think. Because it's what's the difference between a prior and a likelihood, right? It's just a prior is a likelihood and a likelihood is a prior, the difference is just one is conditioned on data. So a hidden state and an observed state, these are both entries in this XT vector, the question is just which ones can you see and which ones are, which ones can't you?
A canonical example from economics, there's this notion of total factor productivity, which is kind of how productive people are. Like it's sort of the sum of all technology. by which I mean something very broad like management techniques and literal machines technology but organizations, social networks. So there's this kind of latent miasma in the economy and when it's good everybody makes more stuff and when it's bad everybody makes less stuff.
So it's kind of a hidden state that drives things. Yeah, okay, yeah. And is that also something you could relate to? observed in latent variables. for instance, 100 % okay. So for instance, a soccer player, a forward in football, in soccer, you only observe how many goals they are scoring, but it's actually coming from a latent variable, which is his goal scoring latent skill. that would be the hidden state.
here you do you have, do you have, because in the hidden Markov model, you have that transition from state A to state B. Do you still have that here or because in the example of the soccer player, for instance, I don't see any switch, any transition from one observe one and hidden state to another. But you could, so you could imagine that the players of latent ability follows a Gaussian random walk. We don't imagine, so he's an elite player already.
His skills are going to sort of stay where they are, but it's possible, you know, that he's going to drink too much one day or maybe he's on fire, right? He gets injured. his sort of latent skill, it's not, it's continuous and it's sort of moving up and down over time, but it's mostly staying the same. And then wherever it just happens to be pushed up and down by these random things of life. And some of it there are feedbacks, right?
So injury is a great example that you could say there's this latent talent or this latent skill for scoring, but then there's also this observed intensity on the field. So the player can expend more energy and try harder to make, to make goals, but in doing so increases his risk of getting an injury. So this intensity is going to be another hidden, is going to be a, maybe a semi-observed state, right? We see something about it.
it's going to interact with his future ability through this channel of injury probability. So you get structures like this. Okay, yeah. Okay, so I see, because me when I hear about transition probabilities, I'm thinking about change points, know, but here it's not really that like you could have like the the Gaussian random walk is actually a transition between could be a transition between different hidden states because the probability of scoring a goal, for instance, is changing over time.
And when that probability is changing, this what you call going from one state to another. Do you use that? So when I use the words going from one state to another, I'm thinking about time steps. So this is like a slight difference from the hidden Markov language where you say like I'm transitioning from the high scoring potential state, which is a discrete. yeah, modeling box to a low scoring potential.
You can think instead about like some kind of like a latent scoring potential that just sort of wanders up and down. And then when we go from state to state, we're just waking up whatever our discretization of time is. Those are the transition points. Okay, okay, I see. So that makes sense. The transitions are the time steps. Exactly. Yeah. the, the applications of the G function. Right.
So every time that player receives an opportunity to make a decision, that's going to be the unit of discretization. Okay. That sounds a bit like GPS. are underlying functions. And then you have a realization of the GP, which is at the time points you actually observe. I mean, we're going to maybe get around to talking about this thing, the Kalman filter and Kalman filters are basically iterative GPs. It's solving exactly the same problem of conditional normals.
It's just doing it iteratively, bringing in one piece of information at a time, as opposed to observing the entire state vector and then doing the conditioning operation all at once. Okay. Yeah, we'll definitely talk about that in a few minutes.
so something I'm curious about so I definitely recommend people going to through the different notebooks you're having the state space presentation also in times experimental because you go into the auto aggressive models which I think are the most well known because they are the simplest even though they are not simple and and you also talk about the kind of generalization of those so sorry max which are available in the state space module.
You also go through the vector or the regressive models. But I think, at least personally, the most interesting thing about these different models for me is the compossibility. It may be because I come from a very Gaussian process perspective. So. GPs, what's awesome is that you can combine them and have a GP for a long-term trend, GP for short-term variations, GP for seasonality, et cetera. You can do the same thing with state space models. And you're demonstrating that in one of the notebooks.
And I find that super interesting. Something I really liked is the ability to have a trend that changes with time. Because that's always been a very weird stuff in my head, whereas like, I don't like deterministic trend, because almost, it almost never works, right? Yeah, like a trend is almost never deterministic. If you if you think about the football player, again, is like, you could have a trend on his skill, but he's gonna age.
So if the trend is an age, these two because then your projections are like, yeah, Messi is just gonna keep progressing and when he's 60 years old man he's gonna score 100 goals per season but no so you want something that says that's kind of mimicking a GP where the GP would be able to pick up that nonlinear behavior that's kind of like a parabola but not really but if you want more structure you want a model that can do that so yeah maybe can you talk about
these composability these these difference different modules, well, not modules, but different components of the state space module. So let me start by just saying that we've been talking very abstractly about state spaces. And the state space package is quite nice. I mean, I'm biased, but it's limited to linear Gaussian state spaces. So a state space can be absolutely anything.
What you can do here is this G function is going to be restricted to the function Xt is equal to A multiplied by Xt minus one. So you have a transition matrix A and you're going to matrix multiply it by Xt minus one. You can add a bias term, yeah. And then that's gonna give you the next state of the world. So it seems... so that means that you wanna say two things with that. I mean, I'm... putting words in your mouth, but trying to explicit that means you don't have an inversing function.
So you're not doing, you're not doing like soft max of something for instance. Correct. And second, that means you are not doing using another likelihood than the normal likelihood. Exactly. And to, so to like push you again towards kind of this language, this time series language. the innovations have to be Gaussian. So the sort of the way the world evolves has to be centered on zero with like standard deviation.
So it rules out, example, one, it rules out, example, like Poisson processes where, you know, the world is kind of Gaussian, but then once in a while you have like a big jump. Yeah. You can't do that in state space yet. When you say in state space, is that in the theory or is that PyMC state space, In state space, you can do anything, right? And indeed, if you go on YouTube right now and you sort of search for state space, what you'll find is this Mamba model.
And there was a lot of activity in the machine learning space about state spaces where they're using neural networks to learn recursive models. that have kind of structural dynamics. That's all very nonlinear. That's all very, very broad in general, but it's still doing this basic core idea of like XT equals G of XT minus one. What we're doing in PI MC state space is something a little bit more tractable. So, okay, why did I bring this up?
I brought this up because... linear stuff have really nice properties. And one of the properties is you can kind of stack stuff together. Yeah. If I have a trend and so the trend is going to be some matrix and then I want to bring in a seasonality, you can actually just concatenate these two matrices and then it makes a new state space in the same way that when you have a GP kernel, you can add another GP kernel or you can multiply GP kernel and you get out another GP kernel.
So the composition operation is a little bit different. We're doing block diagonal concatenation instead of multiplication or addition, but really it's the same idea. So what can you do? The first thing is you can play with derivatives of time. So when you talk about a trend or a deterministic trend or a changing trend, I think it's really helpful to think about this in terms of a car, like position and velocity. So your time series, Imagine we're literally tracking a rocket or a car.
You have some position, let's just say in one dimension, so it's moving along on a line. And then that position is a state. At every time step, I have some notion of where the car is. But then the car, I also know that it's moving. So the position is changing. How fast the position is changing is your trend. So that's like the first derivative. And you could also think that the trend itself is changing. So you have an acceleration term and that would be like a second derivative.
If you drop the car example and come back to thinking about a graph, just a line on a graph, we're really talking about a linear, like a straight line. That would be a position, a deterministic trend. That would be like a velocity. You're just adding a constant velocity. If you're adding a constant velocity, that's constantly changing, you actually get a quadratic trend. And then you so on, such and so forth, right? Third or fourth, fifth order polynomials.
Where it gets powerful is that you're allowed to have innovations on each of those hidden states as well. So you observe the position, you don't observe the velocity, but the velocity still could be being perturbed, right? There could be somebody in the car who's hitting the gas or hitting the brakes. So at each time step, the trend could be increasing or decreasing and that could cause the position to change at each time step.
So that's kind of block number one, know, kind of what's the trend like and how many derivatives of time does it have? Then you have- That would be the moving trend. Yeah, we call it the level trend component. Yeah. And so the level is like the baseline. It's like the position, right? Like what is the actual value of the time series right now? Yeah. And so you allow that allows the position to change through time, which I mean, there's an innovation, right?
everything, everything changes by innovations. If you don't have an innovation on a, on a state, the state will just be whatever your initial guess for that state is. So if you imagine a deterministic level, flat line for all time. Yeah. That's like a baseline. That's that doesn't vary with time. And then your trend is like the trend component of the level trend is how fast does the level evolve? Exactly. Yeah. What's the Delta T? Like what's the Delta position for every Delta T, right?
That's that the value of your trend is going to give you that. Okay, yes, so the trend that's that's weird because you know when you talk about the trend when I say a trend I'm thinking about the level actually, but but no the trend is just giving you the is the derivative of the level. Okay. Yeah, I see. Yeah, and so What those I'm thinking what does a quadratic? do know trend represent? How do you interpret that? How do you interpret the order of the trend?
So that's why I really pushed to this Newtonian physics analogy. So when you think about, okay, order one, fine, it's like a position. Order two is like a velocity, right? How rapidly is your position changing? And then order three would be like the acceleration. Like how fast is the velocity changing? is going back to the GDP example, is GDP growth accelerating? Is it getting faster and faster over time? Or is it getting slower and slower? GDP could be increasing, right?
But it could be increasing at a decreasing rate. So that sentence is quite strange because there's sort of three derivatives in that sentence, but it's something that people say and it sort of has a meaning. And that would be the third order. And then there's like games for the higher ones like GDB. The GDB has a level that GDP changes through time, which is the first derivative. It's like, it's the acceleration in, in then that acceleration changes through time. Velocity.
So the position, the position changes, that's the velocity. And then the velocity changes as the acceleration. Yeah. Okay. So it would be like, you would have that would be order. The order of the level trend component would be, would be three in this case. Exactly. Yeah. Three. Cause then you would have three derivatives in your position component thinking about it like that. If that's helpful.
Now I'm thinking for, know, when you use that for users, they have to choose it and how do they choose that? You know, it's like, it like the number of nodes of the, of the splines for instance, where you do some probably some leave one out comparison to check what makes the most sense, know, how do you choose that or if you choose it with domain knowledge. Yeah, I would say it comes with domain knowledge.
would, I personally, what I use more than anything is a, an order two on the, on the, on the, the order. So I have a position in a velocity, but then I put the stoke, we, so you're also allowed to choose the innovations order. So. who's going to be able to get shocks. and if you put, innovation order two, so if it's two, two, this is like a Gaussian item block. But this is quite jumpy, you know? So it's actually nicer if you turn off the innovations to the position component.
So the position is only going to change because the velocity changes. And what you end up getting is very smooth trajectories. So it really looks like you don't have any bumpiness in the actual position. You just get these nice curve, slow changing curves. And that's usually what you want, right? Unless you have some specific knowledge. Yeah, and you explained that actually in the structural time series notebook. So I'll rush for listeners to that.
But yeah, so go ahead with the rest of the components. Yeah, yeah, yeah. So OK, so the level is going to be important. The next most important thing that people think about in time series is definitely seasonality. So are there repeating patterns in the time series? I think people think about that because they're very evocative when you plot your time series and you see like, spikes happening every Christmas or whatever in your sales, right? It's like, you know, you want to capture that.
It's really like in your face. So there's multiple ways to do this. We give you two. Yeah, I guess you can look at the notebook. You can either do it as kind of in a frequency domain. So using Fourier basis, which is going to be, I mean, very similar to what you get out of like in a profit model or even in a GP, I think. I think I don't know the formula for a periodic kernel, but I think it's something with like sine functions. So I think it's going to be quite close to what you get there.
Or we have like kind of longer lags. So I call it a, like a time. like a calendar time seasonality. So you could say like every Monday there's an effect. So my time is coming in daily and I want to have it so that every seventh day there's some kind of seasonal effect. There's benefits and negatives to each. So Fourier bases can represent fractional time. So if you want it to be precise, for example, and say, I want to have a monthly effect. Well, a month isn't an even number of days, right?
So if you wanted to do a calendar month, like, well, what are you gonna do about February? You have about 30, sometimes you got 31 sometimes. So you take an average, or you could just say, I want a Fourier basis that has such and such property. And it can kind of learn to find the different bumps if you give it enough basis functions.
On the other hand, if it loses an interpretability, so like I said, I want a Monday effect, you can't really look at the coefficients of Fourier basis and say, like, there it is, that's the Monday effect right there. Whereas with these sort of calendar, it's basically dummies, you can do that. I can pull out the coefficients and tell you, like, yeah, every Monday we get plus 10 to our sales. Yeah. Yeah. Yeah. Okay. Yeah. So that's the two different.
types you have and you talk about that in another book. think it's time frequency and seasonal frequency, something like that. And, the next one. Always always get, you know, seasonality is quite confusing for me. It's just, reason like this is weird. So it's the, know, you have a lot of vocabulary and so on. but yeah, in the end, This is almost always something like a Fourier series. It's just like, when you look at that, it's just like, splines are like Fourier series.
Why do we have so many names? It's just weird. yeah. It's like different traditions, right? And there are, I'm sure there's some smart person who's come up with an equivalence class between like GP and spline and Fourier basis and everything, right? So some of it comes down to preference. Some of it comes down to like compute. So practically, In terms of computation, the calendar effects are a little bit more onerous because you have to actually make a matrix of the season size.
So if you wanted daily effects for the year, you would need a matrix of size like 365 by 365 in order to track the movement of time as you evolve the system. Whereas for a Fourier basis, you could just do it with a two by two matrix because you just have one basis vector with a sine component and a cosine component and you're done. Obviously that Fourier basis would only be able to learn like strictly periodic functions. It wouldn't be able to give it any like bumpiness.
But then you could go to a four by four or an eight by eight. No problem. You could get a little bit more richness in the design of the shape. So if you're, if people want like a suggestion, I would start with frequency unless you have a really compelling reason. Like you really value that interpretability that comes with doing calendar effects. And if you're doing calendar effects, probably you want zero subnormal prior on the parameters. 100%. It's actually not identifying. Reminding people.
Yeah, you actually have to use it. For a long time, I was doing some crazy stuff where I was dropping one of the days and not telling the user and trying to figure, well, should I drop the first day or the last day? What if they care? Should I let them drop it? Just don't drop anything. Just use a zero subnormal. That's what I converged on. Yeah, for sure. And then what are the other components that people can use? then we'll switch to the channel. So those are like the big two, right?
So those ones, that's gonna get you like 90 % of the way there. The rest of them I'll go through quickly. You can have a cycle component, which is similar to a seasonality, but it's kind of more irregular, right? Think about business cycles. They're kind of like five to eight years, but you don't really know. and they, like how high is the high, how low is the low. So it's like a seasonality, but a little less structured that's in there. auto regressive components in there.
So, I was trained to think of the auto regressive component kind of as a trash can. You have all of this nice structure in your level and in your season. And then there's still some kind of temporal dependency. but it might not really mean anything to you. So you just throw an autoregressive component on it to try to soak up some of that auto correlation and the residuals of the first two components basically. And then it was really interesting to me.
I didn't know that basically AR components are mainly to model the residuals. I was like, okay, that's that. This was like an exploding head emoji moment for me. It makes sense though, right? Because there's no... Yeah, that makes sense. Yeah, there's no physical interpretation to it. Like why does today look like yesterday times 0.5? I don't know, because that's what the data says.
Yeah. There may be situations where you have a really strong reason to believe that if you think about like... know, flood recovery or something. And so people can clean up 80 % of the flood damage every week after the flood, right? And it sort of has this exponential decay pattern to it. And then you would say, that's definitely an AR process. But even that example is pretty contorted and I can't come up with many good ones. So it's just like a nice feature.
And it's also a feature of this linear Gaussian setup. that you can think of every component as being modeling the residuals of the other components, right? So like everything is sort of conditioned on everything else. And so it you have that interpretation of the AR component. So you have that. And then the last one, which is really nice and we've done some exciting developments on it is a exogenous regression component. So you're actually allowed to- that seems very interesting.
Yeah, you're allowed to embed. a regression problem into your state space model. And here we're really thinking about residuals. So it kind of turns your thinking about regression on its head a little bit. The betas of, so think about it as OLS first off, because we're Gaussian linear, right? The betas are going to be your hidden states. So every day you're going to wake up and there's going to be some sensitivity of the world to some factor. And then the factors are coming in externally.
So the sensitivity is there and then boom, you know, the temperature is what it is or like boom, the weather is what it is. And then that's going to have some effect on our soccer player's ability to score goals during that game. So once you do that conditioning operation, you're going to have a bunch of residuals and you want some time structure on those residuals. So that's where the rest of your state space model comes in.
and you can directly interpret these structural models that way as you're going to do regular OLS, y equals x beta plus epsilon. But then we're going to say, okay, epsilon is not normally distributed. It's distributed with all of these rich time series dynamics. So that's cool. The only limitation of that is that you have to pump data in. So if you do a regression component, you can't forecast forever. You can't just apply the G function.
You have to tell me what's the weather going to be tomorrow. Yeah. mean, yeah. And that's funny because, I mean, for me, it's completely natural and intuitive now because I work so much with different flavors of regression. But I find that for newcomers or non- technical people, you know, when they asked me, so can you make a regression, like a projection of the time series for the next five years? I'm like, sure. What are going to be the, what's going to be the value of the covariates?
Well, I don't know, you know, I just want the prediction. I know, but I need the covariates, you know, because you just told me that the generative graph was depending on the covariates. So, I mean, it can be the mean, it can be something you want, but yeah, it's We have to input a value in there. And it's one of the powerful things about these state spaces, right, is that in principle, so the structural submodule we're talking about now, for now, only works on one time series at a time.
Now, this is very high on my to-do list to generalize it, to allow you to have multiple time series. So you could take all of these factors that your stakeholders are interested in and then endogenize them to the model and say, well, you want to be able to forecast all this stuff jointly, then all we have to do is figure out the time series dynamics for all these covariates. And then we can connect them through like var dynamics within the model or the levels can be correlated, right?
Or the innovations on the levels can be correlated. There's lots of places to intervene on these models, even in this restricted linear Gaussian setup that allows for a very powerful suite of tools and would let you just say like, yeah, here's a hundred year forecast. Your error bars are going to be like this, but hell, it's not be So here, yeah, yeah. I mean, so what you would do here, means, so I'm thinking about what you just said here, that would be a vector autoregressive, right?
Where you put all the time series in relations or are you talking about something else? Well, so maybe it's like a transition point. I think a vector autoregressive is a generalization specifically of autoregressive. So the value today is gonna be a function of the value yesterday. And that's it, right? And then we can have multiple time series and say that the value of GDP today is going to be a function of the value of GDP yesterday, but also of the interest rate yesterday.
But you could go deeper, right? You could say that to go back again to our soccer player example, we observe the soccer player's intensity and we also observe his, number of goals that he's scoring per game. And these two things could be connected through some latent Hidden state like some latent velocity right when the velocity of His intensity on the field goes up then his Scoring also goes up something like this, right?
so this wouldn't strictly be a connection via the Observe data in the past but instead by some latent factor in the past and so in principle all of this can enter all of it can connect it's all just matrices you just click them together with this block diag operation and carry on in your merry way. stops you except the implementation. Yeah, exactly. Yeah, here you'd have to do a, you'd have to do a for basically. Because I was thinking, wait, you're saying you can't do multiple time series.
I was like, can just instantiate multiple time series from the state space module. But that would not relate all the time series into the covariance matrix like the var is doing. Exactly. Both the covariance, so the innovations are correlated and the transition and the g function is connecting everybody together. So you have two different places where they're connected. Yeah. And so you're working on implementing that like when you have some time? Yeah, after my thesis is done.
That'll be like top of my list once the thesis is done. Yeah. Nice. Yeah. So feel free to send anything my way to review. like, I can't wait to try that stuff Be careful what you wish for because you'll get it. Yeah, for sure. mean, can't wait to try that stuff out. That sounds super fun. And I'm already thinking about a ton of data set I can apply that to. yeah, that'd be awesome.
I'll say one last word on the regression component before we move on is just that this component is where a lot of the causal time series stuff happens. So when people talk about interventions into a time series or like counterfactual analysis, a lot of what they're going to be doing is like, quote unquote, forecasting, but where you're applying like a do operation on the exogenous data that's coming in on the regression.
So I run my estimation, I get some parameters, then I go back right to like halfway through the time series and I change the covariant by, you plus 10. And then I compare the resulting time series to the time series that was actually observed and sort of think about the differences that is some kind of causal effect. And that's the, are that the IRF? So the impulse response functions that you also touch on in the different notebooks?
Yeah, so the IRFs are very, they're called, they're very tied up in causality. It's a notion of causality. Some listeners might be familiar with this term, Granger causality, which comes out of econometrics and isn't really causality. but Granger won a Nobel Prize and everybody does it. So the question is, does an increase in the interest rate drive changes in GDP, right?
Like if we fit a VAR, I mean, here what it comes down to is saying like we fit a VAR, is the cross term parameter significant? Like is it non-zero? If it's non-zero, then we can say that in some sense changes in the interest rate are causing, right? Entre guillemot, changes in GDP. Now, would Uta Perl agree with that definition of causality? I don't think so. But it's something that people talk about. With IRFs, you're thinking about the other side, which is the shocks.
So you're saying, I... So I estimate a system, have some transition function G, I'm going to... Suppose that the interest rate suddenly jumps by 100 points. How does the system unroll after that point? So it's kind of a way to explore the time series dynamics of the system in a very controlled way. Yeah, it just sounds like a contrafactual. Yeah, but it's a counterfactual that's like very, very, in a vacuum. Like we're not even putting it in some time.
We're not saying like, did the shock apply in such and such dates and then seeing how the system would have evolved. We're really just saying like the state is in some like, it's in the steady state. It's like flat. Everybody's calm. No shocks are coming in. Then we shock it. How does everything move? And there's a sense in which that gives you all the possible information that you need about what the counterfactuals that you could run would look like.
And do you have to use a VAR to use IRFs or if you're doing structural time series, as we just talked about, the IRF also makes sense. So the IRF makes sense anywhere. You're just studying how the system reacts to a shock to one of these innovations. So you get to control how does the innovation enter the system and it helps you to understand what's going on. It's mostly used with VARs for two reasons. One is that VARs are stationary.
Another bit of jargon that's very important in time series, which basically means that if you let the system evolve forever, it won't go to either plus or minus infinity. It'll find some nice calm what we call a steady state where everything will kind of simmer down and reach that point. What that means is that when you shock the system, it's not going to be perturbed onto an explosive trajectory, which is kind of a physical, right?
Like there's no real systems in the world where if it's slightly perturbed, it just explodes, right? Natural systems tend to be more robust. So those situations are less interesting. The structural time series models are all non-stationary due to the trend components. Yeah, so this like trend and position, it's never going to come back anywhere. Like if you have a position, it's going to stay at that position forever.
If you have a trend, it's going to be going up or down based on how the shock comes in. Seasonality, right? It'll oscillate forever, which I guess is like a kind of stationarity, but not really. So VARs are a place where it makes sense because they're stationary and it's also interesting. So you could also shock an AR1, right? You could shock a Cerema, but it would be like a shark fin, right? It would jump up and then it would come down with like an exponential decay. It's not that interesting.
Whereas in a var, you get these rich cross time series dynamics. effects, Yeah, okay. And so one last thing about the exogenous regression component from the structural time series. And then we'll move to Kalman filters to play us out. Yeah, so something I also loved when I looked into that was that you add the ability to basically add regressors covariates to your time series.
So it'd be like, if you're using GPs, you could have a coefficient on anything that's not time-related, and then you have your GPs. So here's the same thing. the effect of temperature on bike renting for instance you just have a slope on that and so that's what these these component is doing but something that's also very fun is that you can add the ability to make that slope time varying itself by just asking for innovations in the the the sub module Can you talk about that a bit?
Yeah, that's what I hinted at when I said that the regression setup is kind of turned on its head. You would expect that bike sales is the state of the world and that every day like bike sales relate to, sorry, temperature relates to bike sales. So you would think that your XT vector would be temperature and bike sales, right? But actually this would just be a var. So we're not talking about a var, we're talking about something else.
What the state of the world is when you're doing an exogenous regression component is you have bike rentals and you have the sensitivity of bike rentals to the temperature. And so this hidden sensitivity state can fluctuate. If you allow that to vary, then it can go up, it can go down, and you end up with a time varying regression. It would be equivalent to a Gaussian random walk prior because remember that we have, we're limited to Gaussian innovations, right?
So what you get by default when you ask for the innovations on the hidden state is beta of t is equal to beta t minus one plus an epsilon with some standard deviation that you estimate. Yeah. Yeah. So it's like, but it's a very easy way. You know, if you had to code that by hand, that'd be kind of a pain. But yeah, it's a very easy way to add a Gaussian process, Gaussian random walk, sorry, Gaussian random walk on the slope of your covariates, which I found really cool.
Also because then I'll let people check out the notebooks, but then the predictions are really easy to make. And so I think that's why also that sub-module is really... changing the game because you could do it in PyMC. It's just, you have to be very careful when you're doing out of sample predictions and it's easy to mess up. So yeah, like here, no, it's taken care of for you. So that's made me think a lot about Bambi.
Absolutely. It's something that I want to be, I want to emphasize that the state space module adds no new functionality to PyMC. You can do everything that's in state space in PyMC. if you are determined enough. The problem is is that first off you have to know what a scan is and it that already rules out, I think, 90 % of the audience because they're so horrible and they're so difficult to work with.
it's just muddling through the shape errors and the idiosyncrasies of setting up the scan is enough. And then you're done and you're like, okay, well, now I want to make forecasts. Like, well, you got to another scan, And like, well, just shoot me. So. The real value add of the state space module is doing these post estimation tasks. You have forecasting, just as dot forecast and give it your estimated parameters and it goes boop and it gives you everything you need.
But there's also in sample tasks you can do that were related to this common filter thing that we've been tap dancing around. Yeah, I mean, so I think it's a good time now after only when I'm in 23 of recording. So thanks a lot, Jesse, for taking so much time. But yeah, I still want to be mindful of your time. So let's close this out with the Kalman filter. Yeah. Yeah. What is that? Because that really is the workhorse of the state space module.
You told me already that you're also working on an an expansion of the Kalman filter to allow users to use the space model not only on normally distributed data, so it would also be for for cam data. But yeah, so what's that? Yeah, so the Kalman filter is literal rocket science. It was developed at NASA during the Apollo missions as a way to track rockets in outer space. And the the original NASA guidance computer for the Apollo 11 mission that first landed on the moon, it's on GitHub.
And you can go in there and you can look at the source code and there's a Kalman filter inside of that source code. So, I mean, I always tell that when I talk about this because it's like, really feels like you're standing on the shoulders of giants, right? And this is where all of this stuff about position and velocity really comes to a head. There's this idea that there's an object out there that's moving at some unknown rate.
and in some unknown place, but we have a belief about where it kind of is. And more importantly, we know the dynamics of how it's going to move, right? The rocket can't teleport. It follows Newtonian physics. So Newtonian physics are our state space. It's exactly the level trend model we've been talking about all of this time. So the rocket has a position of velocity and acceleration in three-dimensional space, and we put components on all of those.
And then periodically we get signals from the rocket, right? Like whatever radio signal comes in and says, I'm over here. So we have some prior belief based on the last time that we got a signal and also based on our dynamics of our system. We get a new signal that comes in. And then the question is, how do we update our belief about where the rocket is? Not only given that new signal, but also given the kinematics that live inside of our transition equation.
So the Kalman filter is just a way to put all of that together. It's basically a way to iteratively receive bits of information about the world, combine it with some dynamical system, some model that you have about how the world works, and then produce the best estimate of what the real state of the world is, given that the information that you receive might be corrupted in some So why is this powerful? Why does this matter?
Well, first let me say all it's doing is it's computing a multivariate normal likelihood step by step. instead, and this goes back to this connection between a time series state space models and Gaussian processes. A Gaussian process is just a really, complicated multivariate normal. And so is a state space model. But it's just that instead of creating a covariance matrix, doing a conditioning operation, and then computing the likelihood of the data given this conditional normal.
What we're going to do is we're going to say, I have a conditional normal, which is my prior, given all of the information I've received up until time t. I'm going to make a forecast about what is the next state of the system, and then I'm going to get some data, and then I'm going to reconcile. the error in my forecast with the data that I received in order to update my belief about where the system truly is.
it's a way to iteratively compute a multivariate normal, which is nice because you never have to invert a huge matrix. I can, instead of inverting this giant matrix, take things chunk by chunk, of eat the buffalo one leg at a time. And that can be very nice. It was very nice when they were doing Apollo and they didn't have a lot of memory. they could just keep updating the system in place. It's also nice for us, we take advantage of that.
It's nice because the Kalman filter makes no assumptions about which states are observed states and which states are hidden states. So when you update the forecast for the next state, you don't know what data is going to come in. But whatever data does come in, you're going to use that as your conditional. and you're going to do this conditioning operation and then update your hidden states based on that. Why is this important? Because hidden or sorry, missing data is precisely a hidden state.
Normally I observe GDP, but you know, someone didn't write down the number. Well, no problem. It's just a hidden state for this one time period. And the common filter can still do this conditional updating operation and it can propagate the uncertainty that I didn't see that one day forward through time. And then the last advantage is that you get a couple of slightly different outputs.
So you can do dynamical, you could do the prediction step where you just say, I have some information at the time t, I wanna know t plus one. Fine. You can think of that as like a prior. You can also get a posterior, which is what is your belief about the true system at time t, given that you observe data at time t. So it's like the best reconciliation between your forecast and the true data conditioned on measurement error. So this is why it's called filtering, right?
Because you're filtering out the noise that's associated with the measurements that are coming into the system. A lot of times in applied applications, we don't use measurement error. We just think like the data is the data. But if you're concerned about measurement error, it's really easy to just include another term and the system handles it very easily. And then you can also do retrodictive predictions, you know, you can just do retrodictions.
You can go back and say, okay, now I've observed a trajectory for this many time steps. I want to go back and figure out what's my best guess of where the system was 10 time steps ago, given that I now observed where it went later. Yeah. Like think about a car driving through a tunnel and you lose GPS. So you have this kind of hole and where it is, but then you see it coming out of the other side of the tunnel.
While it's in the tunnel, maybe you know there's a tunnel there, so your uncertainty about where it's going is broadening. Once you see that it came out the other side, like, was traveling straight the whole time. And you can retrodict and get rid of all of that uncertainty that you had. And this could be very useful again when you're filling in missing data or when you're trying to get the best possible estimate of what the change in the level of your trend is, right?
So it's a powerful algorithm. It has a very long history. used all across many disciplines and we use it because it's cool and it's Bayesian. Yeah, I was gonna, I was gonna say first, yet again, an algorithm that we're using that's coming from physics. It's just like, almost all of them. It's incredible. Second, is that the algorithm that Alan Turing and his team was using during World War Two to find out where the German, no, to crack the Enigma code? No, don't know. don't think so.
I this was slightly later, but I can check. I'm saying no, but if I'm wrong, someone correct me, Yeah. No, think you're right. I think it's coming later. And third, though, I have a basic question. The way you describe the Kanban filter, I'm like, don't we already have that? And that's called base formula? Yeah, the common filter is Bayes formula, right? Just applied to this sequential setup. So it's very cool.
It's a bit of a shame that, and this is also on my to-do list now, in principle, this is designed for online data. So it's a way to update your belief about a system as data streams in. So in principle, you could train your model and then you could freeze the hyperparameters and then you could start learning about the system such as it is as it evolves over time.
I think people would be bit disappointed because it's not fully online in the sense that we're not backpropagating all the way to the uncertainty about the priors as more data streams in. But it's quite neat that you can just leave it on. and then get better and better estimates about where your little Arduino drone is flying to. I see, see, okay. And so why, so last question and then I'll ask you the last two questions, but.
yeah, no first we definitely need to add some links and resources in the show notes about those topics. So first, know, time series, the kind of which we've talked about during the episode. So structural time series, AR, REMax, VAR, et cetera. And also Kalman Filter because I'm sure some listeners are going to want to dive into that. yeah, the other question is, why is that for now in state space limited to normal likelihood?
Because that's how you can derive nice formulas for Bayes' law, Anders. You get a nice closed form solution, everything's like Gaussian and Gaussian, linear and linear, right? It's all just, it just reduces to some nice matrix multiplication. So if you want non-Gaussian, you have to be a little bit more creative about how to figure out these posterior beliefs given that you observe data, right?
So people do things like particle filter where you're tracking multiple little particles and you do like a population mutation step and then you like get a histogram approximation from the particles or they do like a they do a linearization. No, I'm just gonna do like a Taylor expansion of the linear common filter and try to add some curvature. We'll explore it. Okay. Okay. I see. Yeah. Yeah. So that's the kind of thing you'll, you'll you're looking to when you start adding that to state space.
But I think I want you to understand what the bottleneck was and as often is, well, know, It works. This one works. That one doesn't. Yeah. Sorry. I'm switching to Spanish now, you know, my wife is Argentinian. So now my brain is just, you know, talking Spanish. It's getting late, getting like less late than where you are. I'm winning though, if it weren't. So I think we can call it a show, JCB. I could keep you here for like three hours, but no, I won't.
Before I ask you the last two questions, is there anything that... you'd like to add that I forget or didn't get to ask you? I will just say that if any listeners are feeling brave and they want to come try the state space module, please do and please report any bugs that you find on the PyMC experimental. Open issues, do not be worried about being bothersome or something.
It is through the process of talking to people and having more users and more eyes on it that the module gets better and better. So please give it a try and report back what's good, what's Yeah. Yeah. Be honest, but be courteous. You can be discourteous too. can take it. Yeah, I know you can, but I encourage people to not be discourteous. Jesse is doing that on his free time, you know? So that's like the least you can do to thank him. And then if you see him in the street, buy him a drink.
So, Jesse, before we close up the show, let me ask you the last two questions I ask every guest at end of the show. So, first one, if you had unlimited time and resources, which problem would you try to solve? If I had unlimited time and resources, which problem would I try to solve? How to make my wife happy, I guess. I haven't cracked that chestnut yet. Yeah, same. if some listeners want to open a PR on Jeezy's repo, that'd Yeah, on love advice. I'll be happy for that.
Yeah. I would say, so I mean, if you want a more serious answer, I'm very interested in questions about dynamical systems and emergent behaviors, especially in social groups. So I would love to figure out how do you live in a large scale society while still having meaningful social connections? Because it seems like there's a fundamental tension between those two things. And I think it's something that's very important in the modern world.
So I would love to study this question of, know, how do you have strong interpersonal bonds without living in a small, less complicated society. Love it. Yeah, that makes me think back to my political science days. I think when I answered that question, when I was being interviewed on my own podcast, I think I answered developing critical thinking to that question. I think it's a bit... Probably along the same lines, right? Yeah. And second question.
If you could have dinner with any great scientific mind, dead, alive or fictional, who would be? I guess my academic hero now is Thomas Sargent. He's a macroeconomist, Nobel laureate. He's still alive. He runs quantecon.org. If anybody has heard of this before, it's a really great resource for learning Python. and economics and just computing like numerical computation in general that is a lot of algorithms and model design. He's very no-nonsense.
He's very, he does the kind of research that I like to do where you start with a question and then you try to break it into pieces and model it. And he's got my favorite quote in academia, which is in the kind of science I do, if you can't write a Python program about it, you're a bullshitter. And I think that there's a lot of truth to that. Yeah, that makes sense. And that's quanticon.org, right? Yep, like quant, like Quantify, so quant and then econ.comics.
Yeah, on the website right now, I'll add it to the show notes. I have to say the picture he has on the homepage is absolutely beautiful. I don't know where that is, but that's breathtaking. You have to take it. The close seconds. The pro-cosec is probably Jim Simons who just recently passed away, the founder of the Medallion Fund, he endowed the Stony Brook Mathematics department where one of our colleagues works.
He's like a polymath, adventurer, investor, so I don't know how much I would have in common with him, but that's a pretty nice life. You do motorcycle trips and then make billions and then, you know. Pretty nice. That's cool. And what's the name of the first one? Thomas Sargent. OK, Thomas Sargent. Yeah. So yeah, if anybody in the audience knows Thomas Sargent, let us know. We'll pass it on to Jesse. I appreciate it. Awesome. Well, Jesse, thank you so much for taking the time.
That was really, really cool. Yeah, I had fun. So it's not exactly the record, but when I were in 40, we're getting close to the show record, so well done for your stamina and endurance. I will say for the record you cut me off, you know, said we're done, so I'll blabble for six and a half hours. No one will listen, okay, but I will do it. So you know, like, let's do that.
when you add the new features we talked about today to the stay space module, let's get you back on the show and we'll talk a bit more about Kalman filter, but for, know, count data and you were talking about, oh yeah, adding structural decomposition for vector or to regressive models, right? So let's do that. Perfect. Awesome. Awesome, so as usual, we'll put resources and your website and your socials in the show notes for people who want to dig deeper.
Thank you again, Jesse, for taking so much time and for being on this show. Thank you so much. This has been another episode of Learning Bayesian Statistics. Be sure to rate, review, and follow the show on your favorite podcatcher, and visit learnbayestats.com for more resources about today's topics, as well as access to more episodes to help you reach true Bayesian state of mind. That's learnbayestats.com. Our theme music is Good Bayesian by Baba Brinkman, fit MC Lance and Meghiraam.
Check out his awesome work at bababrinkman.com. I'm your host, Alex and Dora. can follow me on Twitter at Alex underscore and Dora like the country. You can support the show and unlock exclusive benefits by visiting Patreon.com slash LearnBasedDance. Thank you so much for listening and for your support. You're truly a good Bayesian. Change your predictions after taking information and if you're thinking I'll be less than amazing. Let's adjust those expectations.
Let me show you how to be a good Bayesian Change calculations after taking fresh data in Those predictions that your brain is making Let's get them on a solid foundation