by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk by Machine Learning Street Talk
myself and realizing there was no good entry into the field. And so I spent a number of years on my own just trying to get through all the different papers and map out exactly what was relevant and it took me a couple years before I realized I had a set of notes and decided to start putting that into book format, kind of covering everything in one book, kind of self-contained, so that newcomers for the field would have a really good reference to begin with that is more focused on
the engineering side and less on the neuroscience side. So what is the free energy principle? Right, so the free energy principle, if you look at behavior of certain agents and see that you observe that certain agents that survive and exist in an environment, they tend to revisit the same set of states over and over over again. So there's a set of states that are most comfortable to the agent and it could be for very very basic reasons like they can survive, physical
integrity will be destroyed if they exit those states. So for example, we have a certain you know temperature range over which our bodies will survive. And so those are states we'd prefer to be in. So the lowest level simply description of how agents, what kinds of behaviors or actions they need to take in order to stay within those bounds. So this is a set of states we can call the preferred states of the agent. And so the question is what does the agent need to do in order to
maintain that that the itself within those expected or preferred states. So the key insight here is we call the we take this probabilistic look at this and we say that on average if we were to sample an agent over time, they would be found in this characteristic set of states more often than other states. This is sometimes called the ergodic density and the early free energy principle literature. And so the question is describing what agents must be doing if they
are to be found in those states. And the short answer which I'm going to get to the explanation to why that is is they're minimizing variational free energy. So there's a bit of a totology here which is well if they weren't free energy minimizing agents, well then they wouldn't be found in those states because they wouldn't exist. So there is a bit of a you know because it's a description of the system. And so the question is now asking what would an agent need to be
doing in order to maintain itself within those preferred set of states. And so the whole aspect of the free energy principle is completely focused on this quantity known as surprise, which is the negative log probability of the data. And surprise is a term from information theory which has a formal definition for a realization of a random variable. And so the idea is essentially that the agent wants to restrict itself to a set of external states that are conducive to
its survival. However, it doesn't know what those states are, but it can infer it from the data. So if it is able to bound the data that it's receiving over time and making sure that it's consistent with its self model. So for example, you expect certain sensations. I expect that you know when I'm hot, I'm getting too hot. I'm going to start getting a lot of unexpected sensory information telling me you're going out of that zone right now. And so that sensation
tells me whatever this data of environment is. I don't know what it is, but I'm inferring from these sensations that this is not going to be an unexpected state. My model is telling me right now based on my evolutionary history that I cannot survive these states. I'm feeling pain and same for the lower end of that temperature spectrum. So you're getting data samples from the environment that are inconsistent with the model is expecting. So this is a way of specifying
it from a cybernetic perspective set points. What set points do you expect from your environment? And are you deviating from that? Well, if you are, you should make some corrective actions to fix that. And so action comes into play here where you then take an action in order to bring that your sensations back into equilibrium or the average what you expect to be receiving from the environment. But the other issue that now emerges here is that computing that
quantity surprise is usually an intractable problem. It's hard to do in most situations. So the proposal is that instead of directly computing surprise, which would then bound the types of sensory information receiving, we instead minimize a proxy, which is known as variational free energy that the brain can compute. And so this applies to brains, but other kinds of dynamical systems may also be computing this quantity inherently
in their behavior. And the reason we can make that distinction or that comparison is the brain itself is a dynamical system. There are variables in the brain that change over time. And the activity or behavior of those neurons will change according to the free energy principle in a way that minimizes variational free energy until you really you reach the steady state
or equilibrium. And so the behavior of these neurons whenever they're deviating on average are always going to be moving in a way that their dynamics reach an ec a steady state on the gradient flow of variational free energy. So this is over time. And so running this now backward, because you minimize variational free energy, over time, you minimize the surprise about certain sensations you're receiving over time. So everything you're receiving becomes
expected. And as a result of that, you it's also shown there's a proof for this. This also bounds the external states of the environment that you would like to be in. And so you stay in that ergodic density, which means you
keep revisiting states and what is sometimes called an attracting set. So from the dynamics perspective, you have this pullback attractor you keep getting pulled back in because of your behavior to the dynamics, which keep you restricted to the set of states that are preferred by the agents self model. And so the free energy principle is this whole description of the system that performs these actions and behaviors in order to stay in its preferred states.
Yes, what are the interesting so you're talking about there's this dynamical system and you get the slow emergence of things and agents. And then you get a kind of homeostasis, which starts to take effect where the thing continues to exist because of like mutual constraints of dynamics and so on. This is all really interesting, but we were discussing this a little bit earlier that the free energy principle talks about what happens given a
partition of a system. So the world we live in it has this really interesting property that you know, it's this huge dynamical system. And over time, with all of these micro interactions, we see the slow emergence of this partitioning. It partitions into things and agents. And you were just described, you're using interesting language, you're using a gential language,
you're saying the thing wants to continue to to exist. Now, the thing is, all of these weird interactions that they they they cause the emergence of these things. And then it's almost like the free energy principle suddenly, is it like a dimmer switch where the free energy principle suddenly starts to come into play? I mean, how did the boundaries actually happen in the first place? So I think one perspective on this, it's really useful to consider. I like to
talk about Terence Deacon's origin. So this was just just to give you a really simple example of how self organization can make a system that persists over time without reference to the free energy principle per se, because this is kind of a separate development. But then I'll connect it back to the free energy principle. So if you imagine a system of compounds that form an auto
catalytic loop. So you have four compounds, we'll call them ABC and D. And compound A can be transformed into compound B. If there if there's an proximity with some kind of enzyme or something that will help to transform one own compound into another. And you have this link where A turns into B, B turns into C, C turns into D, and then D turns back into A. So as long as that's in proximity, it will continue to just loop back and forth here.
And so then you add other elements into this model too. You need to enclosure to make sure that those elements stay close together. So you have a positive feedback loop here. But you also have a situation where that wall depends on some of the compounds in that loop. So that wall could be formed of you know, phospholivine bilayer, in the case of a cell, but something else that
would just keep those compounds in close proximity. So the raw materials that are needed to make both the wall and those auto catalytic compounds exist outside of that boundary. So if the wall starts to disintegrate, over time, the raw materials come back in and they replenish the wall and the system then through negative feedback now persists over time. So you're not doing in this particular kind of scenario, you're not actually doing any kind of,
there's no magic going on here. It's just simply because of the interaction of these compounds in this specific coordination setup, you end up getting this auto catalytic loop stay persisting over time. So we could then take the perspective of the free energy principle and say then, okay, I see a system here that is persisting over time and this boundary, which happens to be physical in this case, whereas in the free energy principle, it's statistical boundary.
But we see that it persists over time and that boundary has just formed merely because of the proximity of certain compounds and the interaction of negative and positive feedback in this system. We could then ask the question from the perspective of the free energy principle, we're taking that boundary is now given. It is formed. What is the behavior of that system doing? It's maintaining its boundary. And I'm, I'm using anthropomorphic language,
sometimes it's hard not to talk about it that way. But this is just, you know, some comp compounds in a primordial soup. There isn't any agency per se involved here on this level in the way that we would think about it with human
meanings, for example. But the idea is that you can take a simple example like this and see the emergence of these kinds of boundaries separating inside and outside and then applied the free energy principle to describe the system is revisiting the same set of states over and over and over again. And it's doing that in such a way that its external boundaries are being maintained such that it maintains its physical form over time.
So then we can describe that in the language of the free energy principle. And this is the weird thing about these emergence high rockies because you get this canalization where you have a divergent set of physical processes in the micro scale. And then they kind of converge to some consistent macroscopic process. And your example was great. I can think of many low level processes and feedback loops, which give, you know, rise to the emergence of physical
objects. I mean, even Conway's game of life is a great example. Yes. Just a trivial set of rules in a two-degrid world. And through successive, you know, local interactions, you see the emergence of these macroscopic objects with sophisticated behaviors. And maybe it's just a, it's a, it's a, it's almost like a fact of the universe that I can just put a sugar gradient in some water. And of course, we'll see, you know, a whole bunch of little
micro organisms swim towards the sugar gradient. And you see this, this thing that emerges. And that's a completely different physical process to the one you just described. But the amazing thing is that we can now turn on the free energy principle. And almost regardless of the underlying physical process or even the scale, it's like we see the convergence of this, of this model that that works in all situations. Yeah, this actually reminds me of Stuart Kaufman calls this order for free.
Sometimes. And the way that I like to talk about is you have a really big space of possible things that can happen. But as certain events just happen to coincide, you know, things move in certain proximity to one another, as certain interactions begin to happen, that space becomes progressively
restricted in the kinds of behaviors that it can do. And so sometimes this is also referred to sometimes like as the insulating principle, where you have certain high level processes, which begin to enslave or constrain lower level on a spatial temporal scale of what could happen. And so even though, you know, it seems like this order is sort of emerging out of just, you know, the movement of things, it's really because certain interactions
constrain the possibilities of what can happen. And you get to a point where it's almost certain that that's the next reaction that's going to happen is what to us as observers on the outside looks like a persisting pattern. And so it is really interesting, then, that if you try to write down the dynamics, the question is what what, what, what, what if we're to describe the
system in terms of differential equations? And the conditions under which these types of behaviors would emerge that give rise to this order where you start to see this separation between a system and its surroundings. What does that, what does that look like? And that's kind of where you start seeing the emergence of what we call the free energy principle. It's just a description of the behavior of systems for which these types of things emerge and how you would, how, how they would come about.
And is it correct to use a general language? There's always this question of where, you know, is it real? You know, our language model is really reasoning. Is this thing really an agent? Because it certainly behaves as if it's an agent. And as we've just described, well, actually, there are all of these low level physical processes that give rise to the emergence of this thing. And now we can describe it as if it has goals and wants and desires
and so on. But at some point, is there a, is there a blowering to that distinction? I mean, do you think of it as an as if property? Or do you think it actually is an agent? I think this is a question, you know, the philosophers will have lots of different viewpoints and opinions on this. From my perspective, I think that the language we use should be one of utility. So what it, you know, what it gives us to call something define a word in a certain way.
When you look at the history of physics or any, any scientific field, which builds models, you always have a distinction between a model in the target system that it's representing. So this is, usually, you know, in philosophy of science, you talk about the idea that models are not always the same thing as what they're trying to describe, but they are at some level of
description. Because you can also describe the same system with different types of models that still represent that system, but are not the same or isomorphic to those systems. So for my perspective, I think that science produces models which describe reality in the sense that if we predict that reality is going to behave the same way under this model, behaving a particular way under this model, and it does for practical purposes, we can
say that they are the same. Now, the question is, is it really that way? Well, it's same with our minds in our brains. Our brains are also representing reality in a particular way. And we do know that there are, you know, our eyes, for example, don't see everything that's out there. We, there are different colors that are not possible in the color spectrum that we can't see or other types of even, even like infrared or other things that we, you
know, we don't perceive them. But for agents that we are, we can still navigate and do things in our environment that for practical purposes, this is our reality. What is it in itself? I don't know, right? And so I think this is kind of an epistemic versus ontological question. I see myself, when I look at active inference models as saying, well, there's probably a hierarchy
of agency. And so at a minimal level, this thing is, you can say it's behaving as if it's an agent, or I could just say that there's a spectrum of types of agents. This is a very simple one that's easy to study. And it's very uninteresting. And the type of agency that we're talking about is layers and layers and layers of other parts interactions above this.
And that's a more interesting agent that when we colloquially use the word agent, we usually mean something that behaves in a particular way that's controlling its environment. But is that low level description that I described of some compounds? Is it a control on its environment? You could, it depends. I mean, you could model it that way. And if you model it that way as a control system, I would say that it's an agent. But I feel like
it's a choice. It's up to us that helps us make predictions. What it really is in itself, I don't know. But I also, I think that's an interesting philosophical question, but it's a separate issue from when we model things. Because I can just, I can use my model and it does a certain thing. Does that answer your question? Yeah, I mean, this is such a common discussion now in AI parlance, especially with large language models.
They clearly don't reason the way that we do. Yeah. And I'm a bit of a cognitive show, you know, like Gary Marcus, I've been pointing out, well, you know, clearly they don't reason the way we do. And we've given computational arguments for this. Even more clearly, they don't have agency like we do and perhaps creativity. But the thing is so many people are just saying, look, Tim, what are you talking about? You know, Claude 3.5 is bloody amazing. And it acts as if it can reason and as
if perhaps it has some limited agency and certain circumstances. And at some point, you just got to draw the line and say, well, you know, if something behaves as if it has something, we know that these models don't, they don't feel the wind on their face. They don't know the taste of an apple. They can't, they can't sort of interact with the world the way that
we do. But having learned this compressed representation of language and importantly being cognitively embedded in our ecosystem, it behaves as if it does. So I can really see a future. It's starting to kind of, you know, creep up on me now, where we do just start to think of these models as
having those cognitive properties. I think the point where, well, I'll meet back up and say that the there's often a conflation, I think, between the, you know, are we, this is especially the neuroscience, like are we literally saying the neurons and all these parts are doing these mathematical things? Or are we describing it as if it's, you know, it's a model. And that model is helping us understand from a high level these behaviors.
Um, versus we're just, we're just, you know, we've just made an LLM, it does these things. You know, and I think sometimes it gets conflated because those, historically, the cognitive sciences and machine learning have had a very close, uh, kinship with one another. And because we're now in this era, where everyone's interested in neuroscience and talking about the brain,
it's very tempting to then make these, these analogous comparisons. Because we have a benchmark or measure of what counts as intelligence and we want to know how good our models are, are they, you know, like us or they not like us. Um, but to some degree, I feel like those are different discussions. There's this biological plausibility angle. There's also like my model does
these amazing things. And this is what it's doing. Um, an LLM can be interesting and amazing independent of the fact that it's not biologically plausible or doesn't do it the same way as the human brain does or something like that. And so I don't really see those questions as, first of all, I think we don't have the right vocabulary. I think we're missing the language to talk about them because we don't, what does intelligence mean? What does
sense mean? Uh, what does agency? These are questions that are constantly being refined as we have better and better models. I do think the math helps us refine and talk about models in the same way. We can say, well, an agent is when and then you give an equation. It does this thing. That's what I define as an agent. So it's not as vague as the words we use, which are, you know,
more value-laden than at least mathematical concepts are. Um, but I think where it really starts to matter and then I think it's going to be really important is in the legal realm because I think that's where you have to start being careful because, you know, if, you know, if you go on the internet and do a little search, you'll see people saying like, you know, chat, chabby tea and cloud and all these models, they should be treated as first
class citizens. They're real people and we're actually, you know, forcing them and enslaving them and stuff like that. I think that's obviously a very, it's a fringe view, but without a good definition of what makes an agent and things that matter for legal language, there are real big consequences for how we define things and we may, um, you know, it can go either direction. We may inadvertently treat something as being actually like us when it's not or we may miss when it actually is.
And I think that's where the boundary becomes really, really important and clear to specify. But unfortunately, I don't think we have the right language yet to be able to make those distinctions. I think it's a fun to debate it and to talk about it and, and, and, you know, pursue that line of knowledge. But I think, um, in some sense, I'm more interested in what a system does. Uh, and rather than benchmarking it against, you know,
is it like a human? Well, uh, can it be a human at certain things? Maybe it's a different kind of intelligence. Maybe there's more than one way we can be intelligent. There are lots of other things that come up to me as questions, uh, when I hear that kind of a, uh, a statement. Yeah, it's interesting. I mean, first of all, I mean, I, I know, um, terminology gets overloaded. So when we say things like creativity,
reasoning, agency and so on, it means different things to different people. But I still think it's a good yardstick for, uh, distinguishing different types of cognitive capabilities because a lot of, um, folks just say capabilities. They just have a blanket wall. It's just capabilities, um, you know, as, as if that's some kind of like spectrum that that we could easily
measure. But more broadly, we're talking about functionalism, which is that there exists an abstract representation of, of cognition, uh, that could be implemented in a different substrate. You know, because clearly we could just do a simulation of the world and, you know, build some, uh, biologically plausible, mimetic intelligence and it would capture all of the stuff that, that we have. But we were just talking about this canalization. So even in the physical
world, there is abstraction. Uh, that's presumably why the free energy principle works. So, you know, you have all of these diverse physical processes. And then that gives rise to some kind of abstract functioning, which produces intelligence like behavior in, in the, in the physical world. And surely that implies that we could build some kind of abstract, um, uh, you know, analogically, uh, plausible version in the, in the world of computers.
Right. So that that presumably exists some abstract form of intelligence that runs on computers that we could implement in our life times. And that seems pretty interesting. Yeah. I mean, that's where I hope, I hope active inferences going as to that point where we're able to actually
build something like that. So you're saying that that's, that the, the distinction and the questions that you're asking here to just define not just capabilities, but also like these sort of gradations, um, is important for benchmarking where we are in this progress, uh, toward that goal. Well, yeah, I mean, there must be a, there must be a spectrum of fidelity. So the, the reductio ed absurdum is we, we simulate the entire universe.
And then we go up the spectrum with, uh, gradations of increased abstractions. And then there's some commensurate loss of fidelity, perhaps of, of intelligence. But there must be a Goldilocks zone where we have a system which is not too complex. Yeah. Still quite abstract runs on computers is very useful.
But you're also talking to this other philosophical problem, which is that, and we already see this with, with information technology in general, that when we move into the information realm, our laws don't work anymore. Right. So for example, look at Uber, you know, when, when you have a technology that exists and it doesn't respect the territorial physical reality, it doesn't have any friction with regulations anymore, it can do anything.
And it's the same with ethics when you have these AI chat bots that behave as if they are humans, but the laws don't apply to them. We're in, we're in ethical now man's land. Yeah. I, I, um, I think we're at a stage now where we don't have, I'm hoping the mathematical language will help to bridge that gap. Like it makes it easier to talk about these things. We don't have the right words to being able to either benchmark as well.
Like no, what is this spectrum look like? What does that whole space look like? But then also in that legal realm as well, we don't have, at least our laws don't move fast enough to catch up with it. And we're still talking and debating about these ethical questions. Uh, before we have a formal framework. So that makes it a big challenge. And I think that's going to be probably in the rest of the century as technology gets fast, but evolves faster and faster.
There's going to be a lot of debating around a lot of these types of terms as we go forward. Cool. And just before we move out the free energy principle, is there, this is also, I guess, an ontology question. Is, is there a best partitioning? And as I understand at the moment, the free energy principle doesn't tell us about how the partitioning happens. What, what could we do that is, is there a best partition or is it a probabilistic thing?
How could we write algorithms to figure out the partitioning? This is an active area of research in, um, the, uh, in invasion mechanics right now. Um, and I think the closest answer to your question at the moment, the most promising answer is about, um, uh, weak and strong Markov blankets. There's kind of a spectrum. Basically, there is this idea of sparse coupling. So you have these two
systems and they have different statistical properties of one another. And so, um, the idea is that they are separated by a Markov blanket, where one system, when it's conditionally, um, independent from the other system, when conditioned on that Markov blanket, the paths that it takes in the state space, how it evolves over time will be conditionally independent of this other system. And if that, that, that boundary dissolves, the two systems essentially mix and they have
the same statistical properties and so they cannot be distinguished from one another. So the question is like how, that, that interface is in the middle, how strong or weak does it need to be in order for the free energy principle to apply? Um, what kinds of systems of all classes of systems that exist, does the free energy principle actually apply to? Um, you know, can you have a system that's sort of half mixed and still it applies or is it have to be weekly mixing or, you know, what,
to what extent? And that's kind of where, um, this idea of weak Markov blankets comes into play. It's a, uh, the notion in Bayesian mechanics that you can, there's a way to quantify the degree of this coupling behavior between the systems under which, uh, the free energy principle would actually apply. So this is a new area of research that I think is going to become really important for,
uh, defining the free energy principle in more detail. What is Bayesian mechanics? So Bayesian mechanics has sort of evolved out of what was, um, earlier formulation of the free energy principle. And it's at this point, um, a burgeoning field in its own right. And the idea behind Bayesian mechanics is to apply the free energy principle to the study of dynamical systems, um, in particular, um, the reason we use the word Bayesian mechanics, um, is to think of it as a
new or a different branch of physics. So that, you know, we have like classical mechanics and quantum mechanics. Um, Bayesian mechanics takes the ideas of Bayesian inference and then combines them with statistical descriptions that you see in thermodynamics, specifically of non-equilibrium thermodynamic systems. It kind of brings them together. Um, information theory is kind of a big part of this as well.
Um, but it's a way of describing systems, uh, that we've already been talking about that are partitions like this where you have two independent systems with different properties. And then saying, well, if they are maintaining this separation and staying statistically independent from one another, what is the, what can we describe with the behavior and what kinds of systems, um, are applying the free energy principle and what is the behavior of these systems? What are the
conditions that need to be in place for those systems to exist? Um, and essentially, you get this idea that this is sometimes going back to our as if conversation. If that system is being maintained over time, it behaves as if it's forming Bayesian beliefs about the external environment that it's embedded or interacting with. So Bayesian mechanics encompasses all of those ideas and themes and it makes the claim that these types of systems are minimizing rational free energy. That is
the mechanistic way, mathematically speaking, under which this unfolds. There's a lot of other deeper connections to physics, um, and to, for example, uh, Lagrangian mechanics, uh, and descriptions of, uh, these systems from a physics perspective that kind of comes into play here. Um, but broadly speaking, Bayesian mechanics is this new developing field, which is taking these ideas and trying to apply them as a way to formalize when we talk about what is complexity, what is a complex system,
what is self organization. These questions may have a formal answer. As up to this point, there's a lot of philosophical answers for what those things are, but there's no formal description mathematically of what they are. And when it so happens, the system we are talking about is a brain, um, or animal and human behavior, that's kind of where active inference sort of falls out of this. It's one type of dynamical system we might want to be considering, um, there's a
very special case of this more broad field. Is Bayesian mechanics a field, which is only associated with the free energy principle? I mean, you know, for example, physicists learn statistical mechanics. I mean, what, what's the relationship between that and another, another similar adjacent field? So Bayesian mechanics as a term, um, was, I believe that there's a paper in 2018, um, um, as a paper, uh, for instance, as a perspective paper called, am I self conscious?
And I believe that's the first time the term appears, but it also appears in, for instance, 2019, uh, monograph, um, uh, as well. Uh, and so I would say that, uh, it's a proposal, it's not like other physicists would know this field exists yet, um, but it's a proposal for a field
which takes these ideas of physics and tries to use physics to describe living systems. And this is coming back to, um, as it's quoted famously in the active inference field and, and, uh, and now in Bayesian mechanics, um, urban Schrodinger's question of what is life, um, in his 1942 book, I mean, his 1944, I think, um, where he tries to describe, can the ideas of living systems be
described in terms of physics? Um, and so the question is, yes, they can. Bayesian mechanics is the answer to that, but you have to borrow language from information theory and statistics, um, in a bit of machine learning in there as well. And so it's more of a proposal for a future field to exist that would stand alongside as another type of physics. We're quantum mechanics describes small things. Um, you have classical, classical mechanics describing behavior, paths of motion,
a bigger things. And then you have something like, um, Bayesian mechanics describing these statistical beliefs of persistent systems that are things, thingness, things that exist, that are combinations of particles that persist over time. Um, so it stands in relation to those other fields, but it's not
a field that physicists would know. Um, but the hope with my books is that that's what's, what's going to happen is we're going to have a convergence of, of these fields coming together in active inference and Bayesian mechanics breaking out of their niche to a much broader spectrum of scientific inquiry. Yeah, and just on, on that, can you, can you give me some examples of how it might be used in a completely different field? Yeah, sure. So, um, for Bayesian mechanics, um,
modeling, um, systems that, uh, have this kind of complex behavior. So, um, we can model, for example, living systems, um, like art of the brain, but if we go beyond that, um, I think there's hope that we will be able to model things like weather systems, being able to characterize them
mathematically, being able to characterize, um, any chaotic systems like the brain, like things like epilepsy networks, um, earthquake prediction, uh, anything where you have this sort of many, many body problem or, you know, chaotic, uh, uh, systems where you want to track behavior over time or predict what's going to happen. You can be like turbulence, it can be stock market prediction, um, it can be things, uh, in the, um, the realm of, um, behavior of, um, systems of people. So,
it could be social systems or ecosystems, um, or other kinds of behaviors. I'm sure there's many applications and physics that study these kinds of problems, but they're hard to predict. And again, active inference and Bayesian mechanics are ways of looking at these kinds of systems and describing them, but they give us a way to talk about what those systems are doing
and how they're behaving. So I think there's going to be some time before it reaches those levels of application, um, but eventually that's where that fuel will eventually goes to enable us to design better technology, um, based on being able to track the behavior of the systems as they evolve. Very cool. So we're going to go into a little bit more technical detail now and start going through some of the, um, framework. Why don't we start with it with a modeling framework? So, um,
the description of agent environment interaction. Let's start there. This is a big emphasis in, um, my book. I want to have, uh, readers start with an understanding of just the actual modeling process, rather than just jumping into all these terms and theories at the heart of it, active inferences a way of modeling agent in an environment's interacting. So in order to do that, you need some description of the environment. If you have an agent that's modeling its environment, you need a
way in this framework of saying what that environment is and how it behaves. So in some cases, we may just have data. So we already, you know, we've collected it from some real, real world process and we're just seeing how, um, the agent reacts to it. In other cases, we may want to simulate it ourselves. Um, and so the first idea is to take the environment and say that it's a generative
process from a statistical standpoint. So a generative process means that we have, um, an environment that exists in a particular set of states and transitions between those. So for example, in a very simplified scenario, maybe it's something like weather, you know, whether it changes from, um, you know, to, uh, snowy, uh, to hot, to cold, to wet, different kinds of changes that are happening. And that environment, when it's in those states, this generative process,
it emits some kind of sensory data that is associated with it. So if it's, uh, rainy outside, the sensory data, what's gonna be a result of that is it's wet. So those processes are connected to one another. It's a transformation of when you're in this state, here's the outcome that happens from being in that state. And then on the other hand, we're interested in the systems like agents that do not know what the true state is. And this is kind of part of this sort of boundary between
an agent in the environment. It doesn't know it's actually outside of its head. All it knows is a sensation that's receiving. And so the question is we have some processed, statistically, that is generating this information or sensory data. And I know we've been agent observing that data. So it's the agents job as a generative model to generate predictions about the kinds of sensory data it expects to receive from the environment and use that information to then infer
what is that state that I don't know. So I'm seeing that it's wet right now. So what is it, does that mean that it's hot? Or does that mean that it's rainy? Um, maybe it could be rainy and snowy, um, for example. So this is a way of under uncertainty and agent being able to model its environment, statistically, and develop a probability distribution over the possible states of the environment that could have generated that sensory observation. It's now receiving.
So when you put that framework together, um, you have the generative process and the generative model. Um, you have the process of perception, which is what I just described. But then you also have the idea of action. So if the agent wants to control the environment to change the kind of sensory data it receives in the future, what are the necessary conditions we need to specify how that relationships work works? And in a code setting, um, you would literally disspecify
here's a generative model. And in a loop, you know, in a very simple, simple simulation, generate state, uh, agent takes it in, does some computation, uh, determines what the state is, and then outputs an action, which loops back to the environment again, the generative process, and you would just run that simulation in a loop. In the real world, it would be like a robot
that would interact with its environment. Um, and in that case, you agent would model the generative process, but the actual connection to the real world would be a physical one. The agent's actuators would actually, you know, grasp or move forward, interact with the world in some way to control its environment around it. So these agents have boundaries. We were talking about that earlier, uh, Markov blankets, in fact, and they can be kind of decomposed into different
types of states. Tell us about that. Yeah. So implicit in the description, um, but not, not specifically stated, um, and this is because I actually leave out the discussion on Mark obliques in this book, because, uh, it makes it simpler without introducing too much complexity. Um, and I'll talk about it in the Bayesian mechanics book a lot more. But implicit in here is this idea.
I mentioned that the environment exists in these states. And I also said that they generate these observations or sensory data and the agent and that interact agent then receives those sensory data and is then able to make its inference. However, what I left out was, um, that both that sensation and the actions the agent can take, um, they exist in what's called the Markov blanket. So this is what this partitioning means. If you have, uh, in a kind of a Bayesian network sense,
you have four different types of modules. You have the external states, which are part of the environment. You have the sensory states, which the agent can receive. And you have the agent's internal states that it uses to represent its environment. And then you have the active states, which the agent is, uh, is the agent's actuators that apply control to the environment.
The sensory and active states together compose that Markov blanket and make that partition, keeping the internal and external states conditionally independent from one another and therefore having different statistical properties. So I didn't explicitly call out those states, but they're actually in there, um, in the description that I gave of this framework.
Can you tell us about the statistical formulation? Sure. So, um, the statistical formulation that's involved in building active inference models, um, starts from the basic principles of Bayesian inference. So specifically, Bayes' theorem. Um, so at its core, uh, the actual mechanics of it is, uh, is very straightforward. Um, you have a, I've used this term generative model a lot now,
so far. So I'm going to spend a moment to actually describe, um, what I mean by generative model, um, in terms of the mathematics of it and show how we can just, in description alone, use this for a perception problem. So you have an agent now, um, where this agent could be, uh, quite literally written in about, you know, 10 lines of code. It's a very simple agent.
It has two important components that we need to describe it. So it has a prior belief about the kinds of states of environments, uh, assumptions, you say, prior assumptions about, the state of the environment that could be more likely than others. So you might have an agent that has seen that, um, over time that it's, um, uh, raining very often. So I said, rainy was one of the environment states. Um, and so that would have a higher probability, for example,
and its prior beliefs. Um, and so this is literally a probability, uh, distribution that just encompasses, before seeing any data, what does the agent believe about what's most likely about, in terms of perception of what it's going to perceive. And then the other component that you need for the generative model is a likelihood function. So the likelihood function says the probability
of some observation it's going to receive, uh, given a state. So it's a way of specifying, well, if it were rainy, um, or let's just make this minor, say rainy or not rainy, what would we expect from the environment? Would we expect it to be more likely to be wet or not wet? This is a very simple example because we know when it's rainy, there's a very high probability
that it's going to be wet and the inverse is true when it's not rainy. And so that's a very simple encapsulation of the probability, probabilistic relationship between the states of the environment and the actual observations the agent's going to receive. When you put those two components get together, the prior and the likelihood, you end up getting a joint distribution over states and observations. And that is the generative model. So in notationally, it'd be P of, let's say
observations, oh, comma, states S, that is your generative model. So there's a lot of subtlety in here. The reason one reason we can call it generative is we could say, well, the agent could then predict, well, if I think it's going to be rainy right now, here's the probability of what's observations I should be receiving. That's going to be really important because it defines an expectation about the kinds of sensation the agent expects to receive that's encoded in its
model. And so when you are actually performing Bayesian inference, you're asking the inverse question. You're saying, I have this model now about how states and observations are related. And now I'm actually getting an observation from the environment. Well, my likelihood actually tells me the probability of that observation given a state. But what if I ran that backward and asked about, I'm not going to use the word probability here because it's actually not a normalized
probability distribution. But what's the, let's say credibility of what states would be more likely than others, relatively speaking, given that I've now observed it. So it's like running your likelihood model in reverse while also taking into account your prior beliefs about what state states are out there are most likely. The result of this is an unnormalized probability distribution.
Sometimes call it unnormalized posterior. And in order to get that posterior estimate, which is the essence of Bayesian inference, that posterior tells you the probability of a certain state, given that you've just observed something. This is really important because that's the essence of what the agent's internal states are representing in this modeling framework. We have an environment
or generative process that is given, that is generated in observation. The agent takes in that observation and using its generative model, which I just described is composed of a likelihood in prior, inverts that model, normalizes it. And the result is a probability distribution of its internal representation of beliefs about what state is most likely given an observation. With that alone, we've just done a perception problem in the simplest possible generative model
we can imagine. We've perceived the agent then perceives the idea that it is rainy because it's just observed that it's wet, for example. So that is the basic statistical framework. And if it seems really simple, it's because this is a very simple example. But the idea is we can add a lot more elements and build upon this very easily and make it much more complex from here. And the next stage would be to talk about how variational inference and free energy would play
into the situation. So where does variational inference come into play? Right. So the scenario I just described is exact Bayesian inference. So it's the kind of thing that you would do, you know, a simple problem you could run by hand, that you could run, you know, on a computer really, really easily for a very simple scenario like that. However, in most real
world scenarios that we care about, the notion of estimating that posterior distribution. So in this case, we have some unobserved state and we want to know what the belief or probability distribution over that state is given what we just observed, which could take place dynamically as
well at every time step we're going through. The issue is that this quickly becomes an intractable computation because the dimensionality of the things we're dealing with increases, especially we have multiple variables in our model and things like that is just not practical to actually compute it. So like all machine learning, you know, solving intractable problems, the key way
to do that is to turn it into an optimization problem. And so it's at this point that we say that the posterior distribution that we'd like to estimate is currently something we don't know what it is. But could we propose something that's approximate to it? So an approximate distribution which represents our best guess and say that we make assumptions that everything is
Gaussian in the continuous case or categorical in the discrete case. Can we then figure out a loss function of some kind so that if we were to tweak those parameters to the point where we reach the minima of that loss function, then we've approximated the posterior. So this is an interesting question and there's some technical detail on how we derive this. But the short story is
that what we end up getting is this loss function known as variational free energy. So technically speaking, I should say that variational free energy is a loss functional because you actually have inputs that are functions themselves. And so the idea is that you in order to compute variational free energy, you need a generative model which takes some input that the agent is just received. So we already have that because when we define the agent, we usually define its generative model.
And more complex scenarios, the agent could learn its own generative model as well, but we're keeping it simple here. And the other thing you need is that distribution I mentioned which is my best guess about the posterior. And specifically in some forms of variational inference fixed form, we assume we want to just know the parameters of that distribution. So the question is then we have
this input to variational free energy functional, which is a mean invariance. And we have generative model and we do this computation to see if we minimize variational free energy by tweaking those inputs parameters. The resulting parameters we get at are the parameters of the posterior or an approximation to it. So just to reiterate again, this procedure is just necessary because we're in a situation where high dimensionality precludes us from actually computing this exactly so we can
approximate it. And the really fascinating thing about variational free energy is that it applies to all unobserved variables in our model. So right now we didn't know what the states of the model states of the environment are has to be inferred. But let's suppose there are also parameters that are in our model too. So these probabilities either for for a for the case of a continuous model be a mean in a covariance matrix for the case of a categorical distribution, it would be a
probability vector. If those are unknown quantities that we haven't supplied to the agent, then as far as variational free energy minimization is concerned, that's just another thing that we want to infer. So alongside states, we could also infer parameters. And in active inference, we take that a step further and we say actions are something else the agent doesn't know about. It doesn't know what actions to take. Can we infer what actions to take using the same technique
of minimizing variational free energy? So variational free energy becomes this sort of universal loss function where if there's any unknown quantity in the model that the agent does not know about, it can try to find what the optimal distribution over that quantity is by minimizing this functional with respect to that particular variable of interest. Yeah, that makes sense. So in this modeling framework, everything is a model. So there's an action model and all you need to do is
is this variational inference. And then you can let me just run this past two against with a couple of questions. So as I understand it, it's a surrogate model which has lower fidelity, which makes it kind of statistically tractable. We represent that as an optimization problem. We solve that optimization problem and then we use the statistical surrogate in situ instead of the original
posterior, which we can't compute. Correct, yes. Yes. And when we, let's say we've got an active inference modeling framework, we've got a whole bunch of variables, some unobserved variables and so on. Do we at every single time step, solve this optimization problem on a model, by model, on a variable, by variable basis, or are they linked together in some way? Okay, that's a great question. So this is getting into more of the kind of
an integrated of the implementation. So the idea, this is something to make it a little bit clearer, because I think this is a confusion, is that when you're doing this minimization, it happens at a time point with an observation you just received. So if you imagine two time points, these are iterations, you know, you would, of gradient descent that you would essentially do on this loss function, given an observation you've just received for this scenario I'm talking about.
But you're right that there may be a scenario where you actually have, you know, you have different things in the system. And so the question is, you know, you have this sort of entanglement problem. You have different variables that depend on each other. So how do you kind of get off the ground? So to speak, if you are trying to infer one thing, but that requires knowledge of another unknown thing you don't know as well. So there are a couple of approaches to this.
In the active inference literature, there are two main techniques that are used in continuous state space, active inference models for doing this. One is known as generalized filtering. The other is dynamic expectation maximization. And they treat the problem in a slightly different way. In the dynamic expectation maximization formulation, which we'll call DEM. In DEM, the models are the variables that you don't know that you want to infer are treated
separately at different time points. So you kind of have this outer loop where you believe that the parameters change relatively slowly over time. And you kind of accumulate evidence for them while in the background, while you are then estimating the states, but you kind of freeze the parameters in a particular setting. And then you try to estimate the states and then you go back
to the parameters and you kind of loop over time through this. And so that kind of treats them as separable where you kind of see them as independent components that you optimize separately over time. In generalized filtering, you actually optimize them simultaneously. So you actually have a trajectory of states that are evolving over time that you're tracking as the environment changes. And you're also at the same time learning those parameters and they should both, they should all
converge together. So obviously, the beginning, if you don't know the parameters, your state estimation won't be very good. You'll be off. But then as the parameters get better and better, then you start to converge upon the states as well alongside the parameters. So the more technical answer is that in different scenarios, one approach might be easier or better than others.
It sort of depends on the kind of modeling problem, the volatility, the environment, and a lot of other sort of details that are specific to a modeling problem in question. So one thing that I see when a lot of people first encounter variational free energy is, they hear the word free energy. And to them, the most familiar term is something they learn in physics. You learn about free energy in physics. Then they also look at the free energy principle or
active inference. And they see the word entropy. And entropy is something else you've learned in physics. And I think this causes a lot of confusion sometimes depending on the field you're coming from. If you're hearing these terms being used like this, this is an unfortunate confusion that comes from information theory, a field of, to describe information communication systems that was developed by
Claude Shannon in the late 1940s based on a lot of other prior work. And so there is actually a connection between the concepts of entropy and free energy as it's used in physics, as well as the concepts of entropy and free energy as they're used in information theory. ETJane is one of the famous authors who was one of the first people to make these connections. And it will be a big focus of the second book. We'll be to make those connections a bit more
explicit. But for the purpose of this active inference book, this first book of the two, I think introducing all those other elements and then makes it confusing. So when I use the term variational free energy, it's specifically talking about a statistical quantity that comes from information theory that is distinct and separate from Helmholtz free energy as you would use it in physics. So you can think of it as it's purely a loss function that is used in the case where you
want to make Bayesian inference a tractable problem. And so you propose this loss function. And that's a sufficient understand it in terms of active inference without getting into all of the external terminology that kind of moves into more of the Bayesian mechanics and free energy principle side of things. Let's talk about surprise, especially its relationship to some of the variational inference stuff that we were talking about. Sure. So surprise is another quantity that comes from
information theory. And it can be thought of as a negative log probability of some outcome of a random variable. So it's and surprise is a term. Tibus, I think it's 1960, was where that term first came from. It wasn't originally in Cloud Shannon's original formulation of information theory. But the idea of surprise, the very basic idea is that it's a quantity that is very useful in characterizing the quality of a model, let's say. So sometimes in
statistics, usually you use the model evidence. For example, when you're comparing two models in a Bayesian model comparison procedure, you'll use model evidence as a way of saying this model is better than the other model. Model evidence is P probability of some data point. So let's say either in some data point, for example, oh, some observation P of O. Surprise is a negative log of that. So if you think about the property of surprise, let's say a random variable observation equals some
quantity. So example, wet. So the probability that it's wet. So the question you might ask yourself is if you have the negative log of that, that means when the probability of that is very high, surprise will be the inverse of that. And likewise, when a probability is low, this surprise will be high. And there is an intuitive nature to this. I don't want to conflict the psychology of surprise with information theory too much. But roughly speaking, if you think about
a low probability event as being surprising, it's something unexpected. Then surprise is a way of characterizing how surprising is some data that are received in my model. So for example, if you are, if you have a really good model, this is getting back to this idea of model comparison strategies that's predicting its environment, what you're predicting is every single time a sensation comes into the environment, how expected was it under my model? That's what P of O,
the probability of an observation tells you. If it lands at the mean, well, that's exactly what I expected from my model. So it's unsurprising to me. But if it deviates from it, that's kind of like sending up an alarm signal here saying something is up here. My model predicted something to be happening and I've deviated from that now. And in fact, that's actually a learning signal, which is why this is related to information theory. It's a way of compressing and paying attention to the things
that are high information content, meaning they have uncertainty in them. So what part of the structure of whatever your modeling have you not captured yet? Well, if you've perfectly modeled your environment, well, then nothing is surprising because you know exactly what it's going to do. Every sensation that comes in your model will exactly be at average. It's exactly what the model would say is the expectation of what sensory data you're going to receive.
So surprise also really important quantity because it turns out that's really the only thing you need to minimize in order to do all the rest of the stuff we've talked about. But the question is then how does this relate to variational free energy? I think this is another area where this gets kind of confusing. And so I'll make this connection really explicit. Surprising is usually really difficult to actually compute directly. It turns out variational
free energy has a dual role. On the one hand, it's used for a way of approximating the posterior. So when we minimize that loss function, we get the parameters back of the posterior at that minima, which is approximate to that to the true posterior. It turns out that variational free energy when it's minimized also is an approximation to surprise. So while we cannot compute surprise directly, if we minimize variational free energy, we approximately also minimize
surprise because it's always going to be greater than or equal to surprise. So that's when we say it's an upper bound on surprise. That's what we mean. And this is the consequence of what's known as gensins inequality. You can prove this mathematically. And I do so in the book in chapter 4. So to put this all back together again here and make this link really clear,
we wanted to minimize variational free energy in our model. Because in doing so, we are able to estimate some unknown variables that are part of our posterior, say our states or parameters. But in so doing, we're also approximating surprise, which means that inherently, if we decompose variational free energy into different terms, we can rearrange them algebraically.
We see essentially that a model with minimized variational free energy also has minimize surprise, which also means that the model is more and more aligned with the actual world, external world itself, the environment that's generating that data. Simply because every single time the environment generates that data, if we have a really good model with low surprise low surprise low for any possible observation we get, then we that's what exactly what
happens. We minimize variational free energy. So inherently, we are essentially aligning our external our internal model of the environment with the external world. And surprise is a way of measuring that discrepancy and as well as variational free energy by proxy. So the further away that is from that surprise bound, the further we are away from minimizing that uncertainty. Let me play this back. I mean, what do we want to do more? Do we want to minimize
surprise or do we want to get the highest fidelity representation of the posterior? There are other ways you can minimize surprise as well, one of which is action. So closing that boundary even further is possible through action, which I haven't really mentioned in this framework yet, because yes, it's true that you also want to perceive and that's what the utility of estimating that posterior is. But you can also do that and bring that bound close closing it even
further by taking certain actions. So if you imagine you're receiving surprising information right now. So that means that the sensory data you're getting in here right now is deviating from that mean of p of the observation, probability of an observation. You're getting further and further away from that mean. So the question at this point is, you may be at some free energy minimum, you can't go any further now. So now what do you do? You're not at the surprise bound, you're still getting
surprising information. Well, what if you change the environment itself? So you take an action, you act upon the environment and you alter it in such a way that it produces different sensations that are now in accordance with your model. And I think this is a key really, really important distinction about how active inference models work because we get to a point where you close that gap
by controlling the environment itself so that it produces the sensations you expect. And that's why I have been saying a couple times throughout that action is a form of expectation as well as prediction. It's a way of saying I expect these sensations to be received by my model in the future that are self-consistent. There's a least surprising types of sensory data that I would expect under my generative model. My generative model makes a prediction and I want that to conform with
reality. Well, I could take an action and try to close that gap. So I think that's where you talked about minimizing surprise all action is a really key component in that that brings that gap lower by making the two conform with one another the environment and sensory data and what we expect. Yeah, I mean, we don't like surprise. I was speaking with Mark Soms about this and he actually said that the more prediction errors we have, the more conscious we become, because we take
actions to reduce our surprise. But it's almost quite depressing because perhaps we're just automaton when we're in the modality of not being surprised. We don't actually take control and deviate. But my question wasn't very well formulated, but in a way, surprise is just a kind of, it's just a measure. What we actually need to have is a reliable world model. And that's this posterior distribution. And the world model is a good one if it minimizes surprise. So it's not
like surprise in of itself is the objective. It's just a side effect of having a good model. Right. And actually, when you compute surprise, it's a marginalization over the generative model. So it starts with that generative model. And when you sum out across all states, surprise is a negative log of what you get from that. And so it's basically a way of kind of, it's a measure of expectation about the sensory data that's inherently derived from your good model.
So obviously, if you have a bad model of the world, like if you found it, you know, surprising to stay in the sun and like stay in really, really hot temperatures. Sorry, you found it very surprising to be away from a fire. And then you went into the fire, for example, that's a terrible model because then your physical integrity of your body would
be destroyed. Right. So it's relative to the model of the agent. And I think that's kind of the idea is that evolution is produced good self models or models of the world where we have imbued in us either through evolution or through learning. We learn about what things are supposed to be surprising for the kind of agent that we are. So that's a really important component to it too, is it's not, it's it's a it's based on the actual model we have in
pre supposing that it's a good model given the world that we inhabit. It's a little bit like, you know, the planets reach an orbit. And it might be a similar thing with with our world model. So we have certain patterns of behavior. And when I'm in London, my world model will adapt because obviously I'm minimizing surprise. And when I'm when I'm back here, it will adapt. And that will almost it will make a kind of pattern, you know, a kind of changing pattern. And we could almost
describe that pattern as a non steady state equilibrium. So, you know, it's not like the world model is convergent in any way. But there is some kind of homeostasis to to its behavior over time. Yeah. So, in the case you're describing here, so you're describing like a model of the environment of the kinds of things you expect, but it's changing depending on where you are. So there's this element of, you know, we are models are adaptable. So in certain situations and circumstances,
certain expectations are more likely. And so this sort of idea of homeostasis is very context dependent. So for variables that are like very, very primitive, things like, you know, blood sugar and things like that, you know, they have to be within an acceptable range. And that's the kind of the homeostasis idea is you expect them to be in this range. And so you take actions, eating food and so on so that you stay within this acceptable homeostatic range that is basically built into our
body physiology. When you start talking about more higher order things, we start having these very context dependent ideas of so-called attracting sets like in certain situations or circumstances, we will be, you know, we have certain contexts. So, you know, for example, I have like in my own house, I have my workspace and where I work and certain regularities that I expect. But when I go to the office or, you know, I'm out in a coffee shop or in public, those will change depending on
the context. So it's kind of a nested sort of thing because, you know, overall, there's a sort of not like, you know, grand or set of states that I tend to occupy. But then each one of those, if you kind of go down the level of more, you know, and more and more granular, there may be other specific, you know, pockets of attractors that are local to that area. And so we are, you know, composed of all these different types of attractors depending on the kind of situations and
circumstances that we're in. And is it fair to say that there might still be some kind of path between those attracting states? So rather than thinking about the states of the probability, you know, at different, you know, points in the state space, there's some kind of pendulum or some kind of structure of those states sort of traversing between each other. So we can kind of, you know, almost
zoom out a little bit. I think there's definitely a good sort of geometric interpretation of that, kind of a manifold, you know, where, you know, essentially there's a sort of center manifold. It was this line that we're kind of trying to stay, you know, steering on. But, you know, things are also dynamically changing. There's, this isn't the exact same situation here, but there is a
one term from a paper that I really like. Carl has used the term before of, you know, gradient descent while trying to hit a moving target, which you imagine, you know, you're, you know, imagine like a flowing landscape that's just sort of changing over time. And so the actual target
you're trying to hit may be constantly changing. What, what, what is the center of the manifold, or what is that, that line that you're trying to stay in the middle of and balance on, you know, to take Andy Clark's kind of perspective of, you know, surfing a wave, that may be meandering in itself. But it's still the minimum of something of that overall global landscape. It's just that this is sort of changing and moving over time. So I think there is a, there's a geometric
interpretation in there. But it's a bit more mathematically difficult to visualize what that would look like. So tell me about the role of action and act of inference. So action, I think in agency, this is a big part of what makes active inference models important. And I think one way that I'd like to frame this is in terms of the exploratory nature of agents, because we've been talking so far about this idea of, you know, minimizing surprise, which is a kind of uncertainty,
you know, you keep wanting to have the least surprising things. And I think action is key to this picture, making sense, because, you know, one thing I've always asked, you know, is why do we seek out on certain things all the time? You know, I seek out new food experiences, we like magic tricks. I am, you know, a big fan of really noisy heavy metal music and stuff that just, you know, is would be very stimulating and over stimulating and very, you know, but why do these things appeal
to some people less than others? Why do certain people go after experiences that push you to the edge of certain sensations that you expect? I'm also a very private person. I like being at home. I like having my space, you know, so I like a lot of that regularity too. I'm not like a, you know, I don't go bungee jumping and skydiving and that kind of a thing. So I think action is a really interesting aspect to this because in order to survive, you need to explore your state space.
You need to explore what's around you. You need to have a good understanding of where you are in your environment and what's permissible, what's not permissible. You need to gather information and in order to better attain the goals that you have. So I think the exploratory component and the curiosity component is where action comes into play and active inference models. As it allows the agent to sometimes you can think of this as resampling the environment.
You're looking in one location here and you're getting sensory information. Sometimes you actually want to move to an area with greater uncertainty that surprises you because it reveals something new about the structure of the environment. So you may incur this uncertainty bump or cost at the expense of then understanding more about what's going on around you and then you know more about your environment and you're more confident about how to take a decision to make to get the
reward or the thing that you want to achieve or accomplish. So I think this sort of you know, self-fulfilling prophecy of this idea that we take actions to fulfill the expectations of our model is something that unfolds over time and may have these little bumps of uncertainty along the way where we may end or off the path to explore. But overall if you zoom out of this kind of grand scale there is this always staying at this kind of steady state that happens at all
times. It's also interesting as well because as a representation of cognition it's physically aligned because of course you know we're physically embodied in the real world and we take physical actions in the real world and we build in these different inference models that essentially predict trajectories of actions and I guess you can think of that essentially as a goal right? Because you know what's the value of doing this sequence of actions over time. But as a
as a representation of framework and AI that's really cool. It's almost like the primitive becomes stories right? And stories are just sequences of actions that that leads somewhere in the future and that's a really great segue into exactly how you know action works in in in these models for
planning ahead into the future. There are these different you know paths or stories you could take you know that you trajectories through that state space that you could do but which one do you do and that's the ultimate question in active inference is which is usually framed into discrete state space models is trying to understand the best trajectory that accomplishes both achieving your prior preferences and expectations and the exploratory behavior that comes with that of exploring
your environment. And at different times one or the other may be more important depending on the context of the problem that you're in and selecting among those different stories as you put it or different trajectories is the key basis for how active inference models work. You've said that one of the confusing things to newcomers is is the difference between the continuous and discrete
versions of active inference can you can you tell us about that? Yes so the depending on what papers you look up so for example you know one of the most cited papers from from from first in is a 2010 the free energy principle unified brain theory and you go look at that and you see all these you know there's differential equations there and and so on and so forth and then you go look at some of the newer papers like Lanss at Lansdacosta's paper at all you know active inference on discrete
statespaces it's this is from 2020 and you see something completely different and I think this is one of the issues is people come into the field and all different papers look a little bit different and the question comes out of what is active inference where you know where do all these equations
fit together I definitely think the the part all book is on an excellent job of actually helping to big to frame some of the differences by separating different chapters on each one so let me give the short the short answer basically is that up from around 2003 to 2010
uh say 20th 2013 2012 there was a big focus on continuous uh state space active inference models so these models are framed in terms of differential equations um and so all the probability distributions are uh Gaussian and usually you're trying to estimate the mean of these distributions
whether that's a mean over you know parameters or states or sometimes precision variables uh which are these sort of hyper parameters in the model and because they're changing over time there are these dynamics that are differential equations of how do that what do their trajectories look like
and they're very literally solved like differential equations you have a velocity and you're integrating it to figure out what the position is at different times so there are these trajectories that you're mapping out um for variables as they change while interacting iteratively with a dynamic environment so all of that was heavily developed in that early time period the first sort of decade of active inference but around 2013 to 2015 um you saw a switch uh to looking at more um discrete
state space models and the big reason for this is those early models treated action is kind of like a reflex so these sort of reflex arcs where the agent um you know gets some information and reacts by taking a reflexive action on it but it isn't really focused on planning into the future and that's
really where the distinction comes into play um I think there was a push for wanting to model the behavior of planning agents in the neurobiological context and that's where uh Friston and his collaborators spent a lot more time on developing the discrete state space formulation a 2015 paper
active inference and epistemic value is one of the first uh 2015 to start moving in that direction more formally um and since then 2015 to present there's been most of the work has been done on the discrete state space formulation um a big reason is that discrete state's models are very quick
everything can be done with matrix algebra um and so it simplifies computations we think in terms of categories and symbols so it's very convenient to represent the world um a lot of things that we study in the world are static categories they're not changing you know like on the level of like a
stock market or turbulence and things like that so they're much easier to map onto the real world and apply um and so uh and also when you're planning ahead into the future you have to calculate you know expectation operators which are much easier to do with summations than integrals so there's
lots of benefits um and that's where that the distinction came into play and the discrete state space models uh use categorical distributions instead of instead of Gaussian um and it's formulated as a partially observable mark on decision process um and we're now at a point where these models
are becoming very formalized in their structure into what is called the universal generative model the idea of being they're kind of like the idea of universal function approximators but a very general kind of model that would apply to many different situations and circumstances um
and so there there are of course also hybrid models that mix the two together um which are also very useful for a specific kind of modeling problems where you want to turn continuous data into bend discrete categories so if you're looking for uh kind of the state of the art active inference
a lot of it will be found in the discrete state space models but there is still ongoing work on the continuous state space models that are especially relevant in the context of Bayesian mechanics so they they draw much more upon those older models um and it's an area of active development right
now yeah um and shout out to uh to lance de kosto because i read one of his papers recently i think it was something like active inference as a theory of agency or something that i was using this new discrete uh p-o-m-d-p framework and it was actually it's it's very elegant yes it's it's beautiful
um but a couple a couple of points on that so does it not make sense i mean clearly we understand that planning in humans is a system two thing it's a it's a discrete thing uh so i guess it doesn't make sense to plan on a continuous model but they're not i interviewed max Bennett he's got a book
out brief history of intelligence and he's saying what humans do do this you know we have these all of these different simulations that run in our brain and we're we're presumably that must be somewhat continuous because it's not it's not all discrete what what are your thoughts on that
can we only plan with discrete models no i definitely think we could plan with continuous models i think it's not necessarily a physical or a limitation it's more that just that it hasn't really been developed yet as much in the active inference literature and the discrete models at the particular
time were useful for solving a lot of really common problems and getting you know a lot of leveraging some of the benefits of discrete models um and i think there is interest in going back to those continuous state space models at some point in the future to develop um aspects of planning
uh at some point so i don't think there's any real physical limitation i think um there are some just challenges that come with working with differential equations that you just don't have to deal with um so it may just be a question of just convenience for the time being but i think ultimately
um active inference models are going to have to tackle this problem in continuous state spaces at some point um and i think Bayesian mechanics will probably unlock a lot of that as those models are developed a lot uh in in more detail on the subject of conumin system one two dichotomy what do you
think of that and does does that in any way conflict with the you know free energy principle and active inference um that's an interesting question i actually haven't have never really thought about it in terms of uh active inference before so give me a second to kind of like formulate a position
here so i think um i mean do do you do you recognize because a lot of people were talking about this on patreon last night and a lot of people just say it's it's a bullshit distinction between system one and two well yeah as in there there are there are cognitive processes in the brain and
some of them are very um almost instantaneous some of them take longer they're much more iteratively maybe there's a vague boundary between those modes of cognition and to system one and two but but some people think no you know reasoning planning it it's a completely different mode of abstract
cognition so my position on this would be then um i think system one and two is a useful metaphor to help people kind of understand how they might you know thinking a certain situation but I think that binary is definitely over simplified and if i'm not mistaken i believe you in kind of
in himself has said like you know this is just sort of a a coarse grained you know description of what's going on um and uh i might be wrong on that that's my recollection of reading something he said uh around when the book was published um and so my opinion on it is that um there are levels of
what yeah one thing i studied in in what my first postdoc was cognitive control um and so there's a level of um automatic reflexive behavior um that you know agents will use um and there's also deliberative planning um you know and suppressing certain reflexive activities that you would just
do without thinking about so i think there is definitely a level of um the term it's used a lot uh i believe it's Simon Newell if i'm not mistaken is satisfying um yeah we spoke about that with david spivak okay then if you listen to the david spivak i haven't heard that one yet
well that's an incredible coincidence i just published that just to that okay all right yeah so it's a really interesting concept and it it mixes in with these ideas of like bounded rationality you can get into the old your economics and stuff like that um and so the idea being that you know
in most situations that were in and this gets back to the idea of like what action do you take or what sequence do you take you know you you can't sit there and and deliberately plan every single one actually i have that problem though like i want to think through every scenario and the end
result is i just don't do anything sometimes because i'm just trying to deliberate every single option that's in front of me and i so it's very familiar to me it's this idea of you know you have to make a decision eventually especially if you imagine our brains evolving you know 40 000 years
ago where you know you have to make decisions and actions in that context where you may have to um very quickly decide if you're you know being chased by a uh uh some kind of a predator of some kind or you know any decisions you need to make have to happen quickly so jumping to conclusions
um can be very computationally efficient but it can sacrifice some level of accuracy about the world and when the does when the consequences getting eaten or not eaten sometimes that's a loss you're willing to incur so i think there is some level of that built into our brains in that
you know you see it a big tree of fruit for the first time when you're wandering in a jungle you're gonna eat all of that because you may not get food again but now when we're you know in our in our modern world where we have access to food and things like that that's not really as relevant
but those core ideas or primitive concepts about our human physiology and behavior may be still inherent in our in our minds um so that kind of system thinking of just automatic behavior i think is definitely true to some degree because it's just efficient you know you don't have to
think in plan about it you just know i do this thing i'll get back a reward um it's computationally efficient but then there are levels of control that are needed for more deliberative planning where we sit down with you know a pen and paper and we map out what we're going to do um and i think
that's useful for more complex tasks um but to say that they equally they kind of split into this binary partition because i think definitely over simplification i think there's probably a spectrum of things that that fall you know along that i don't even know how to really call them part of
a spectrum even really i think it's more just that some tasks require more cognitive load than others um and context may dictate whether or not that's actually going to happen based on the needs of you what your needs are in a given moment um but it's a useful kind of broad stroke way of
looking at the brain if you're trying to explain especially for our own physiology where we have you know we struggle with decisions every day about should i do this thing or should i have this big meal or should i just wait and eat in moderation or something like that so i think it's just
a useful metaphor so um we humans have this propensity to explore yes we like um you know not just epistemic foraging but we like exploring cities and doing all sorts of stuff like that but what does how does that translate over to active inference so in active inference there's uh kind of the name
of the game is uncertainty reduction in different ways so you can reduce um when you're taking actions your policies or plans these sequence of actions that you want to decide it decide among their different levels on which you can have these sort of uncertainty reducing behaviors
in different contexts so what you would consider is there are situations where you um want to reduce uncertainty about the states of your environment if you don't understand them you want to take policies or action sequences that reduce uncertainty about the model parameters
um you may also on the like a higher level of structure learning also want to reduce uncertainty about your actual model itself there's a whole you know state space a possible generative models you could have which one is the most uncertainty reducing model that's best for a particular
situation so um I think when I say uncertainty reducing there are all different special cases of variational free energy minimization or error reduction um if you if you formulate in terms of prediction errors which is a uh a perspective from like predictive coding models um so I would say
that you know that exploratory behavior is almost a consequence of the very basic need to maintain set points so I'm going to kind of deviate a little bit here into kind of the sort of cybernetic perspective but there's it's been proposed both in active inference in other fields that
the very basic physiology of agents you know whether it's bacteria or other very simple organisms is maintaining set points for us it seems like blood sugar which I've mentioned before or um you know blood pressure levels uh other things in our body temperature ranges what we're sensing and so on
between permissible ranges and so this is kind of like a a pit controller will do this where in engineering you deviate you want to bring back to that equilibrium it's a very simple process that thermostat can do this kind of thing but then you imagine this is sort of a homeostatic control
and in cybernetics there's this distinction between homeostatic and allostatic control so what if you need to anticipate your future needs so you you want to say well I could be hungry in the future so there are things that I need to do in order to satisfy that later on I have to go
out and explore and then gather more information first to then satisfy that need so if you start thinking about these layers of abstraction building on a really simple principle of this biological imperative of just maintaining physiological set points I think when merges out of this is this
picture of exploration becoming kind of a natural byproduct of the general need to maintain physiological parameters within a certain acceptable range and so exploration becomes another path to uncertainty reduction in the case where you're anticipating future surprises it may happen
um it's a very successful strategy and you know social systems are all about uncertainty reduction or like spreading out the uncertainty among different people who can all explore in different ways we all different personalities and different things we do and we can come back
and share those beliefs so that's kind of how I tend to view it is that it's it's sort of an emergent property maybe of basic physiological needs that have blossomed into something much more complex and human human brains yeah I mean one thing that is a well apparently a sophisticated
feature of cognition is future planning and I was reading Sarah Hookisbuker around um she was talking about these compute limits on the the executive order in the US but anyway she had a lovely little history in the intro as she was talking about you know in the 1600s after we had like the
great fire of London and the great plague and stuff like that we actually started thinking about risk management yeah we started thinking about you know there are bad things that can happen in the future and actually we should start you know planning and and sort of you know in instituting
governance and doing preventive measures and like washing our hands and stuff like that and that that's a kind of societal phenomenon but presumably in our brains long before that and certainly in the brains of mammals we have this predictive architecture and when the when the
the future facing architecture is baked in then this set point management in the future actually leads to this exploratory behavior so it's almost like the whole thing's just fall into place yeah that's really really wonderfully put I think that's that's a and it must be very true
I imagine of just you know older civilizations whether they're nomadic or you know agriculture agricultural based you know at some point you have to start thinking about safety and behavior of a whole group or you know group of people and that's where I think this exploratory behavior
and curiosity and you know exploring your environment would have probably become part of human civilization very cool very cool just before we close out I mean now we've got a a patreon discussion as well and folks folks at home should listen to that so I remember it being very good
when we're talking about agency and I've I mean trying to avoid talking about agency recently because you know the audience are frankly sick of it because we've been talking about it so much but but I remember we did have a very good conversation about agency I mean is agency real
it feels like a loaded question and a by design so I'm gonna do the pedantic thing and ask what what do you what do you just understand your definition of real in this context here yeah is it an as-if property is it is it you know again I'm making up terminology here but I guess like an
agency instrumentalist okay might be someone who thinks that it's a bit of a useful fiction but it's not real in any sense but as we were discussing earlier what is what does it mean to be real if the phenomena you know does everything that you expect it to do then how could you say it's not real
right so I think my answer is is similar to the you know kind of the sort of model versus reality sort of question that I bought it before the model and target system and that I think it's a useful abstraction that is good at describing the kinds of models that we are just we talk about
in the context of neuroscience and so on so for me real is more equated with its ability to be a useful descriptor of some thing that we're trying to model I see that that could be as if you know it may be it may be literally what's happening I don't think we can know that I think it has to be
as if it's as good as we can say and I definitely feel that agency is something that is you know present in all the kind of models that are able in some way to alter or control the external environment I would describe that as some level of agency along a spectrum and of course when
you start getting anticipation and planning and other things it's a more complex form of agency um does that answer your question yeah yeah it does okay so I think we agree on that first step so it's so it's it's an instrumental thing it's a model it's it's not necessarily and when I say
real of course it's not it's a non-physical abstract concept so what could I mean by real well maybe I'm not even sure myself but maybe I'm I'm alluding to it being part of the physical makeup of the universe maybe like if if someone like god created the universe maybe they were doing symbolic
AI and they they've coded up a computer program and they've got an explicit concept of a goal right and they said okay we're going to have all these little agents and they're going to they're going to have these goals and this intentionality and this is the causal structure and
and maybe we're just kind of even though the universe is is generated and emerged we as intelligent agents are kind of introspecting and we're pulling out that that original structure that that was the makeup of the universe in a platonic sense yeah I think that's kind
of a distinction that I would draw is like it's a description of something that I think is real but it's a description of that thing I don't know what that real thing even means because I don't I'm I'm bounded by my concepts in my mind whatever that is but it's describing something that is
in some sense real because we are able to you know actually interact with what's around us um there may be other descriptions or better descriptions or alternative ones that still describe that same real thing if that makes sense yeah it does and I kind of agree that they
seem anthropomorphic so you know we're we're bounded cognitively by the the knowledge that we have yes but then again the knowledge we have was endowed to us by the laws of nature and evolution and it's entirely possible that there's this weird resonance that evolution has actually found
the you know in a in a meta sense has found the the modeling framework that was used to create it so there's almost like this information bottleneck that you can just start the universe again maybe there's a Goldilocks sound but you know it'll lead to the emergence of the self-recognition
of the very primitives that went into building it in the first place which is quite interesting yeah I mean you know this this is kind of it speaks to the idea of this cybernetic idea of like are you you become a model of your environment so does that mean you know you're recapitulating
what is real now in this model like it's not just a ziff right I think because it's what I'm trying to draw the distinction is like scientific models versus models in our mind that that is a very similar kind of thing to me like it's just a different different way of doing the same kind
of thing of modeling something that's real but kind of a metal metal metal layer of that right because the scientific models are based in our models of the world that we are now modeling other parts of the world in a very specific kind of way so I do I do feel like there's definitely you know
if if there wasn't a connection to what is real we wouldn't be able to operate in the world so I think there is definitely like some convergence on there in there somewhere but I don't know how deep that goes there may be things that we uncover about the universe that we never expected that
we can never model and then in that case it is the then it would be that our brains behave or even our scientific knowledge behaves as if but I find that to be unlikely that it would be that way yes but it also goes to the abstraction canonization point that we spoke about before so you know
there's all these these levels of emergence yet we have converged on this shared epistemology I mean of course it's very divergent in many ways but some of it isn't some of it feels for a universal we understand each other and it describes the universe extremely well you know would be
forgiven for thinking that it was platonic in in some way yeah and certainly some abstract knowledge probably is platonic because it really is universal it works in in all situations but so that that that's that but then there's the question of okay so it's just an instrumental
description but people then use it to explicitly build AI systems that's what go five people did they say okay I'm going to build an agent and and it's going to have these goals and it will create these sub goals and these are just explicit things that I've designed or maybe they've
been metaland or something like that and and now now you're actually using the abstraction as as the primary representation or in philosophy like in nickbostromes work you know with instrumental convergence and orthogonality and so on the the goals which are these instrumental abstract
objects are now becoming the basis for theory you know that we're making statements about and you know what what happens when AI agents have a certain level of intelligence and what kind of goals and intentionality will they have so it feels to me dangerous to take something which is
an abstraction and then use that as the basis for theory or building AI agents what what what way is it dangerous to you well dangerous in in the sense that I mean maybe I'm talking about two things there so you know using it as a basis for theory and philosophy and using it as a basis
for building AI agents so maybe we'll do the latter one right now so you know go find never worked particularly well but there are still loads of people who strongly believe that we can build symbolic AI systems that do explicit reasoning and planning and certainly for a specific set of
tasks you know like I mean planning is is a great one they work significantly better than current AI systems and we're seeing neurosymbolic approaches that blend the two things together but but these are things potentially where we have an explicit goal and and what is a goal it's just
presumably a future world state and but that's that's kind of brittle right because how do you explicitly in the complex world we live in how do you explicitly and wrote you know robustly represent a future state now we seem to do this because we have this way of just talking about
future situations and our representation overcomes a lot of the Britonists but when we start coding these in computer systems it it seems much harder to do that so you're seeing is that where the risk comes in and that making the assumption that these systems are the as if versus
real and we'll quickly see the limitations of our own models and if that as if versus real boundary is as true as we think it is when we start applying them to the real world and seeing if they really behave in the flexibility and in the way our our own models internal models would in the same
situation well yeah in the sense that in the physical world as we were saying earlier we have a symphony of diverse complex processes which run and then they lead to you know future situations and there's also lots of convergent and divergent behaviors with you know chaos and all sorts of
stuff like that and we have the ability to kind of categorize patterns or modes of of that future state space and we call it a goal yeah and we then put that goal into an AI system and as we were saying before this might work you know maybe it's possible to represent a very complex physical
system using a very abstract computational model and maybe that will work but maybe it won't maybe we are miss you know we're underestimating the amount of categorization and robustness that we use when we think of a goal and we're actually reifying it and we're stripping away all of the
actual cognition that we're doing yeah I mean I think that's that is a risk of you know over of having our models being way over over simplified we don't really we may not know where that limit actually lies and I think the only way we can know that is by actually trying to apply it
in the real world which to me is a good benchmark of how successfully our models have really captured reality in the first place but I definitely agree with you I think that's probably something that we're going to see as we deploy more agents into the world the limitations of the models that
we have may become more we become clearer to the degree for which our models are truly representing reality or behaving as if they do in a way that's not quite in line with what is really happening in reality that our our minds our brains and our collective knowledge we take for granted for
what we have in our models so I've been very skeptical of AIX risk and my main reason for that is I don't think current AI has any agency I mean in a sense I agree of them that if there was an agential AI you know that was a super intelligent that would be very that would be something to worry
about but I don't think current AI is but what do you think about the future of of agency in in AI systems I mean does that concern you and I mean my definition of agency is you know like strong intentionality and self causation so if we did have AI systems that exhibited that would that be
an overnight thing would it emerge from you know would would there be some kind of I don't know maybe cyborgs or human computer symphony connect collective intelligence and then would it diverge how do you see that panning out I think it's there's a lot of you know money in
AI right now and push for a lot of focus on developing we haven't really you know got to a point where there's a huge push for agency yet but some of that is going to be really useful I think like an industry manufacturing and things like that that may be the beginning of it
in autonomous vehicles and stuff like that and maybe then it will you know continue to be further explorations of this sort of you know cybernetic relationship between human and machine that I think is probably going to emerge in the next decade two decades or so the part that worries me
with the intentionality and agency aspect of it is the degree to which the uh models and research moves and the money that's in it relative to our understanding of the limitations and risks and ethics and the legal system of what's permissible so I think like
we are going to move in that direction and I don't think there's anything that's going to stop that um and I don't have nightmare scenarios that a lot that you know that's kind of popular in science fiction I think I'm much more worried about people um and us developing systems are
hooking it up to things that are you know mission critical systems that are decisions that humans should be partly involved in um and that level of agency we should be careful as we're designing them to make sure that they don't get into the wrong hands that we have right the correct laws
around the operation of that of that equipment and I am worried that if we are moving too quickly if the rate in which the law and our understanding of these systems catches up there may be some kind of event that has to happen first before we learn what we need to do to actually regulate
and control these systems properly and on that are you are you um do you lean towards safetyism which is to say we should do risk assessment and we should prevent bad things from happening before they do or do you lean towards libertarianism which is to say um we can't really
adapt quickly enough and if we just trust the thing then safety will kind of come out of it I think I'm a bit too cynical to believe that latter point because I do feel like the incentives right now for for companies isn't necessarily in the risk side of things
um I do think there is some level of self-regulation like there are there is a growing interest in AI safety interpretability and you know explainability these kind of things that are definitely present um some companies do at least have ethics committees I don't always know how you know if
that's just for show but they do at least exist and there are some thinking about it um I think I put myself somewhere closer to the center slightly left maybe where I do think um I believe in the and the ability of some level of you know we can't necessarily know and adapt in time anyways but it is safer to still be having the conversations and putting some safeguards in place um and having them at least there and having the discussion continuing rather than being way on the far end of
the libertarian end of that spectrum yeah it's so worrying because I about say worrying I don't know to be honest but I mean certainly when we start to augment ourselves with technology yeah I know a lot of people criticize transhumanism because it could potentially lead to massive disparities in
society but also in a more diffuse sense when we become a collective intelligence where we are entangled with AI's just like in the animal kingdom you know that there's a relationship between the collective and the agency and and cognition of the individuals like in like ants for example
and you know ants are quite dumb because that they form a collective and when we become a collective entangled with AI's it could easily reduce our cognitive capabilities like our agency and of course we're just purely speculating here but there is a potential future where
that's quite concerning but is that is that something we should regulate against or what should we do that's a great question because I think this is the same kind of thing with any industry you know regulating it versus you know do we make people people make their own choices um versus do we
intervene um you know in terms of any kind of governing system um and I think I I find myself I think that you know speaking of agency where you know we are free agents we are able to do and make our own decisions and I do trust that people um generally are somewhere in the range of their best
interests but not always so like I have some level of I'm not on the extreme end of that but I do worry about like this the incentive structures of companies that's really where I think like it's driving it is that's where the money is and that's what that's what you know we're moving forward with
that incentive structure um I think that uh we definitely need to have these discussions and legal system needs to be paying attention to it but I think there's let you know we don't know what the consequences will be like I know I'm a I love sci-fi and fantasy and stuff like that so I
the idea is of having this transhumanism and you know nanobots in your body that will destroy diseases all these things that you kind of you know it excites me on a on a fantasy level like I know that's not that's just a you know a fictional representation of what the reality could be um and I
don't know what that would look like in actual society nothing anybody knows I don't know how anyone's going to react to um these things even if they're done in the most beneficial way possible we might find out some things about our human nature that we didn't realize we really required or took for
granted um and so it excites me on a theoretical level but I definitely feel like we need to be talking much more openly about these things and um having enough regulation that allows for growth without having too much that it stops the research in general and now where those boundaries lie I feel like that's not my expertise that's kind of a it's kind of a high level answer to a very complex problem but that's where if I would draw the kind of put the the land in the sand somewhere it would be
you know maybe this is uh two on the nose but some kind of uh you know enough so that you can have that that complexity and enough that you can have that adaptability but not too much that you become restrictive um you know kind of talking about dynamics and systems and things like that I think that's always the best place to lie and I spoke with Pedro Domingo he's a bit of a controversial guy but he was advocated he's obviously a libertarian and an an an accelerationist but his basic thing was
that oh um if you look at the way markets work right now and traders we actually have ayes we have we have ayes that act on our behalf and the ayes share information with each other and you know there's this kind of equilibrium in in the trading markets where you know certain thresholds
are met and trades are made and the whole thing just balances and yeah we don't really understand how it works but it doesn't matter you know it just works and and he was advocating for a kind of governance like that where you know we all have our own personal ayes and you know it's much more
of a direct democracy when the a i can actually understand what we want and we can you know be far more dynamic and so on does that does that seem um you know dystopian to you or do you think that's a good idea um I feel like a lot of these arguments feel like kind of the invisible hand sort
of thing like it's you know I mean it doesn't really um I think Richard Thriller called called the invisible hand wave which I feel like is you know to be reference to Adam Smith of course that yeah he didn't hand to the market yeah right I mean I and I think like that's I mean you know
I don't think we know I don't think he you know um I don't think we can confidently state that's what's going to happen um and I don't know if you know it depends on your level of what you think is um a good measure like is market all that you know what about people's overall prosperity and
things like that and people's general perception like even right now in America right now like you know the economy is you know doing well and employment you know but the general feeling of a lot of people is not great um and then maybe that maybe to do other factors that are not necessarily
directly linked to the market but I'm just saying that when you have an economic measure or something like that um that may be a future you look at but that isn't necessarily mean that you understand all the dynamics of what other things are affected um you know world is very complicated and there
are other factors you to take into account so I would say I'm not that optimistic um but I don't think there isn't some truth to that that there isn't some level of this kind of self-regulation that will probably occur um naturally because we are also incentivized to be looking out for our
best interests the problem is I think the systems that we are now studying and interacting with are so complicated that we may not be able to even when acting in our best interests know what the outcome is going to be um and I have advocated this is kind of you know this isn't something that I
would say is possible yet and it would require a lot of really good data collecting very accurate data collecting but it would be very interesting to me to move toward an effort of predictive modeling and on the scale of like complex systems where we make public policy decisions and you know you
do something that seems like a great idea what are the outcomes going to be oh I didn't expect this to destroy an entire class of people for example even though it looked like a good policy on paper kind of the unexpected consequences sort of scenarios it happens a lot when well-meaning people put
forth a good policy that ends up hurting or harming other people as a result if we had ways of predicting incentive structures like how can we incentivize people to make monetary decisions companies or individuals and show them that if you made this decision versus this one you would
still benefit society or benefit a group of people and there would be no net loss to you like can we make better predictions as companies or as individuals in a complex system that requires not just social and economic markets but also the environment and you know whatever is going on with
our planet and things like that which are all kind of interlinked now you need so much data to make a predictive model actually work about what's something like that but I do think like that's one way we should be putting our efforts toward if we're going to be developing AI models is
developing something that has the capacity to reason in a very complex setting using our collective knowledge as people so we can make better predictions that may benefit the planet and benefit ourselves I want to talk about the the strange bed fellows in the the FEP adjacent community so
this is something I notice coming in straight away that there are lots of crypto people and these are people who believe strongly in in decentralization and you know almost like they want to have a small government they don't want they presumably don't believe in safety isn't they they don't want
any any interference and then and then there are people who have the complete opposite view like a lot of the the inactivists for example and they think in terms of systems and and collectives and so on and you know one one school of thought is that we could just build these distributed systems
and good things will emerge as a result of it and I actually I think I asked Carl Friston this question a couple of years ago and I got the impression that he kind of lent in this direction and then there's loads of discussion of when we design FEP systems and we use them for managing
smart cities and countries and stuff like that that we should institute a governance framework on the top what what do you feel about the dynamics there in the community yeah I mean this certainly gets into some discussion also like on a spatial web and other other things that are you know
talking more about governance systems and you know what the future would look like I personally haven't run into I've seen a little bit of talk in that direction but I haven't really paid too much attention to like the political angle of how the different perspectives have
looked at things related to what you're describing but I will say that it does concern me that people would that there would be this level of confidence that like I understand the from the 300 principle perspective like there's this sort of survival aspect right like you know we'll find
our way to our attracting stats it's going to be okay like we're it'll it'll happen but also you think about like the way that our bodies and minds have evolved it took a billion years and a lot of organisms died you know so yeah maybe it's possible that yes eventually it will self-regulate
itself but at what cost is that you know are we going to incur along the way is that something that's going to be straightforward that you know or would could there have been a better path that would have been a bit safer maybe slower and less efficient that would have incur less of a negative loss
so I mean and you know there is a history of civilizations destroying themselves you know that's happened hundreds of times of civilizations eventually just dying out whether that's from you know warfare or other things they did and you know I I just wonder if we can be so confident to be on that
complete deregulation side although I understand and and definitely respect the nuances of the other questions that come with that which is like who regulates and who makes the rules and there are so many other things that go into this that it's hard to you know formulate a complete answer but I
would say I still lie on that perspective that I think you need more than just this putting all of your hope into self-regulation as being the primary component to drive decision making yeah I mean I hosted a debate between Beth Jezos you know Giam the ex-cropic guy and he's actually
interesting because he also has a start up you know talking about the physics of AI very similar stuff and he was debating Connolly he and obviously he's an accelerationist and he was making you know all of these you know yeah we should cut Conno at one point in in his best style so at all what
we should trust the void god of entropy yeah because he was basically arguing that that good stuff just just comes out of all of this but I mean I'm a bit of a centrist and I mean clearly I think it was a good thing that we banned smoking indoors but now in the UK they want to ban smoking in pub
gardens and that seems like an an overreach and and sometimes in retrospect we look back and we think yes it was a good you know good thing that we banned that but clearly markets have failure modes and you know sometimes it's good to not to regulate because it increases
velocity and innovation and sometimes it's it's bad to to regulate but it's it's a really difficult thing to get a handle on I completely agree and I think it's only getting it harder I mean as our world becomes more and more interconnected there's there's more and more considerations to
take into account and I think that's why I was emphasizing this need for you know finding ways to predict and complex systems you know like that's of course like a pipe dream because the amount of data you would need to do that kind of a thing but this is the kind of thing is like how much overreach is the right amount of you know how much overreach is maybe not the right word but how much control and regulation is the right amount and that of course will differ by opinion I mean everyone
will have different levels of how it affects them I don't think they're easy answers to this but I definitely think this is a really really important area of the technology we're developing that should be taken into account and should be not taken lightly because it's it's very easy to get pulled
into the AI hype train it's so exciting where it's this amazing time you know all these things and whenever I'm thinking about active inference I'm really just thinking about math like that's just for me it's just I'm excited about I'm writing code and I'm making you know looking at math equations
and then getting really excited about these philosophical topics but really they'll have an impact on the world and they really in a way that really matters and we should definitely consider it interesting and just in in closing I mean if you if you could you have a magic wand and you
could go back in time and you were the government would you regulate social media oh no that's a really difficult question I mean I think I'm going to be consistent with my prior responses which is some level of regulation yeah because I don't think inherently social media in itself is a
problem I think it's also the kinds of problems that have emerged from the usage of social media are also a response to general the general feelings of people in the world and they're using it as a way of an outlet and you know for example if you think about like you know controlling
elections and some of those things right there's that whole angle there's also like the psychological angle of you know how people are are you using it those are all questions I would ask is like why are people using it to compare themselves to other people and all these other things and why
are they feeling that way what about our society means it when we have a tool we use it for a certain purpose like this and there are no easy answers to that because some of it's just like that's just how things are right now and you can't you can't solve those questions by just you know they're
complex societal problems that lead to that but I do feel that social media has been really wonderful in a lot of ways too and it's provided a lot of net positivity to the world I'm really not I don't really use it so I don't you know this is not I don't a very little social media presence
at the moment but I do feel like it's been you know amazing for example in you know in unsafe situations where hundreds of people have you know used social media for communication it's brought scientists being able to broadcasted papers are on the world you know there's it's the ability to
share knowledge and information which is what everyone got excited about with the original internet um so I don't know I feel like that's a hard question to answer and I think I would go with the I response that I think some level of regulation um would have been useful but it's always hard to
know if that regulation even if done in good faith would also lead to more calamitous outcomes that you would not have foreseen because you're imposing a regulation um and this is kind of a block markets and other stuff you know these things happen um so I guess the way I would put it is
that I'm glad that I'm not in government itself but I'd be happy to contribute as a researcher from the outside but it these are hard decisions and I you know don't envy anyone who's in the position of having to make them I don't it's certainly true that regulation done badly as even
worse than no regulation at all absolutely and I agree it's very complex there are failure mode to social media it's it's a good thing and in some ways it's a very bad thing and when you stop banning things when when do you when do you stop yeah and who decides like you know yeah I
mean this is ending with free speech and there are other questions that we have too is like well what speech is permissible you know there's a kind of gray zone of like there's certain things you you know yelling fire in a crowded room or you know so I don't know I think there I think there's
in legal systems there is the principle of least harm where it's sort of you know you have some level of laws that will maybe it'll harm you know some people but on on the whole it's good for everyone but that's getting a harder and harder to compute what that looks like I think is the
problem but I think that's kind of where I would tend to or it is like just enough that it has least harm without also at the same time incurring you know stopping the acceleration is at the same time so active inference is quite inscrutable for people coming into it I mean I know when I did the
first call for us an interview I was reading some of his papers from about 15 years ago and they were remarkable in the sense that they were bringing together so so much jargon and terminology from different fields from neuroscience from cognitive science from physics and it was it was really
inscrutable I felt that some of his mathematics was it was a little bit difficult to get a handle on what was your journey into active inference so my journey to active inference started in 2018 I was trying to write a grant I was actually my first postdoc and realized that I was wanting to leave
academia it's when I moved into industry and at that time I discovered Carl's 2013 life as we know at paper and it was very inspired by it but I didn't understand any of it and so as I dug deeper and deeper it just was this endless hole of finding new fields that I didn't know about
that I had to keep looking into and researching and learning more not knowing where to stop and so that continued for about three years of just exploration and the same kind of thing that you're saying this reading these papers that just pull in so many different topics together everything
just seems very inscrutable as you said interesting so I mean tell me more so how did how did you build a mental model of this it's because I think it's really difficult because active inferences is an example of a discipline where you have so many different folks from multi disciplinary
backgrounds and that's different from something like machine learning so do you feel that you've focused on one of those backgrounds so you know let's say coming out of from a very technical perspective what have you embraced the default spectrum I think I'm getting closer to embracing
the full spectrum and coming at it myself since my PhD work was in neuroscience I had that neuroscience background which even though there are a lot of other topics that are blood in there the core that is neuroscience that gave me a good anchor but I wasn't at that particular time as well
versed on the technical side of things and I was actively trying to learn that anyways for as I was transitioning into industry to be a machine learning engineer at that time so it was natural to start slowly bringing in all these ideas so I would say that you know there's some a lot of the
areas in active inference also involve a lot of the philosophy and cognitive science that I don't know as much about more of the neurobiology and the mechanistic side of it as well as now the technical machine learning side and I've had to slowly pull in these threads as I've learned more
and more about the field I was in a neuroscience a lab we were doing computation neuroscience I was working with Dr. Ken Keshita who's at Wake Forest in North Carolina United States and we were doing some work modeling human behavior with reinforcement learning so it was an adjacent field and but at
that particular time I had the neuroscience background but I did not know about active inference because it was relatively niche at that time so you have this vision to you know essentially educate up and coming scientists on on active inference because it's really really important why the focus
on education why do we need this I think the active inference field if you kind of look at the sort of genealogy of how it's developed and also into new ideas like the free energy principle and Bayesian mechanics are these new terms that have also emerged alongside active inference
there's a very very historical perspective you can take on the field and when you look at how it's emerged through time to where it is there's no good entry point into the field but at the same time I see active inference as being really revolutionary in the same position that deep learning was
in the early 2000s where there are a lot of interesting ideas and papers out there but the explosion had not yet happened and so because we are perched on this precipice where we're going to start seeing all these new and important ideas come out of active inference and provide it some
sort of challenge to the way that deep learning is currently done today I felt it was really important to have an important a very clear resource that would guide readers through the field that did not have the neurobiology focus on it so just looking at the core mathematical mechanistic ideas but
also the implementation so that I can bring it to a much wider readership both for new comers and students undergraduates and grad students but also for researchers engineers and machine learning engineers or you know robotic engineers and any of those sort of applied fields that use these kinds
of ideas and what are the high level things you talk about in the book and of course this is the first edition you've got another edition coming out next year that's that's correct so there are going to be two books they're going to be separated for the active focusing just on active inference
the second book which is going to be submitted next year is going to be on Bayesian mechanics specifically so there's a nice demarcation between those two so for the active inference book the basic idea is to cover all the fundamental concepts and themes you need to know so starting from
just the basic way of how can you think about modeling this interaction between agent and environment in a statistical setting so it requires just a change in perspective from how normal statistics is actually done just so you can see that relationship work and then begin to build models from that
starting you know there's two versions of active inference one in continuous time in continuous statespaces and one in discrete statespaces so the book provides all the fundamentals of the background need to know things like predictive coding, variational Bayesian inference connecting
latent variable models with more common parameter estimation models and then leading into the core of active inference describing all the key papers in the state of the art and then the third part of the book goes much more into extensions where the field has gone next how to apply it to different areas in a historical perspective of how all the different antecedents and also influences of active inference kind of coalesced together in this sort of nexus where active inference lies at
the center of all of it. You made an interesting point which is that you know we might be on the precipice or something so it might be like the you know the the image net moment back in 2011 and back then we had these sophisticated algorithms but they were very difficult to to determine you know the only top folks at Google were doing them you needed lots of compute parents on.
Is that analogy the same for active inference so how useful is it now really like how how good do you need to be what kind of tools and software do you need how does it work?
So for active inference as its origins were in the behavioral neurosciences so a lot of the early models were designed for a very simple experiments that did not have hugely complex environments so there has been some work trying to look at scalability from the perspective of amortized models so you know using neural networks to learn parameters to kind of bridge that gap
between the active inference world and the deep learning world. This is kind of a bit of a hybrid approach and that has shown some promise in some areas for scalability but the where active inference is right now the current software that's available pi mdp for example is the primary
software there are others that are sort of related like rx and fur that do similar kinds of things we're at kind of this interesting point where we are poised for this kind of expansion into this highly more scaled space and applicability because there's been a lot of papers looking at robotics
for example performances between active inference and robotic methods or in between active inference and reinforcement learning but there isn't a this is sort of why I like to make that analogy between deep learning around 2006 it's getting there it's comparable and there are lots of really interesting ideas that are there that showed the high potential for active inference and we need more people to come into the field to develop it and that's you know the motivation for writing this book
was to get that wider readership involved. Yeah that's really interesting I mean I'm I don't quite a few people in the discord community are just saying to us how do I how do I do this you know and many examples come up like I want to build a multi agent system where the agents
exchange information and they cite this agency as being one of the reasons to use active inference because you know of course in in the olden days we used to write software and we used to write for loops and you know we had conditional logic and stuff like that and we're explicitly telling
what to do and then machine learning was trained with data but it was still pretty much just a static model and once you've trained it that's it that's the end unless you do something like active learning and then now we're talking about active inference where we have essentially an agent that can continually learn and adapt and exchange information with with its environment but that seems quite scary right because how do we how do we make it do what we want it to do and how do
we actually code it with software. So I think that's the the key mechanism that you point out this engagement with the environment that's a really that's the key feature of active inference the ability for it to you know going out and to seek out you know certain preferred or expected states
which I'll use that term instead of reward rewarding states that's much more in line with the active inference perspective and in the active inference literature the simpler models you know they follow this paradigm but it's a framework so there are certain perspectives in ways you can design
these models that have that behavior in there but that doesn't mean we can't augment them with other features and things that are really necessary for specific environments where we can add in our own controls add in human knowledge or our own I'm using the word biases in a neutral sense here but if
we want to bias the model to do a certain type of thing it's not like we're going to lose the controllability aspect of it there's still a way for us to inject in our own set of particular goals and things like that into the model it's just that the model has the ability to be very generalizable and applicable to many different types of modeling situations but we're still able to control
aspects of it. Yeah you know in in governance we have like we talk about good hot slaw and we talk about things being good-hearted so as soon as we try and ban things or control things we completely distort the system you know it's when a when a measure becomes a target it sees it to be a good
measure. Do we have a similar thing here right that we we have these eventual applications if you like and when we try to steer it or enforce goals then it almost I don't know whether it's a binary but it very quickly loses the very thing that we wanted it to have in the first place which is
you know agency and dynamism. So if you look at active inference agents this is you know this is a problem that's also present and reinforcement learning where there's you know a combinatorial explosion of different kinds of path you could choose and to be efficient and active in
inference agent and indeed humans as well you know there's only a subset of particular paths that you might choose in an environment that are relevant for a task at hand and I think the same sort of principle applies and that there's no way you can you know explore that entire search space anyways of things you could do and there may be certain things we would just discount as being
certain paths that are just not interesting to the agent. So this this is a kind of agent that would not pursue certain types of goals and so those would be things that just would be a very low probability of being selected by that agent. So you can still maintain that flexibility of having an agent that's able to choose what it's going to do but you can restrict the search space within
which it's going to make those decisions. Very cool but just on the goals thing though where do the goals come from for the active inference agents you mean yeah so they do planning as inference right which is which is a form of you know deno though goal generation if you like and then and then we can strain the trajectories of goals as a form of design of the system and we hope that it maintains flexibility but it still does the thing we want it to do reliably.
Yeah so I think this is kind of becoming going into the distinction between when you design simple active inference agents where we are specifying some of these boundaries of we know what is the agent capable of what does it state space look like what are the kinds of things it can do versus a
true active inference agent which is where the field is moving now it would work in like instructor learning for example where it's learning its own goals and learning the its own learning those specifications itself rather than you know the user hand crafting the components
of the generative model. So as we start getting more into that space I think that question is going to become more and more relevant we're going to have to consider what kind of agent would emerge in that sort of scenario under the free energy principle you know we have some ideas about agents
that would emerge from this would be agents that are trying to survive that's sort of the biological imperative or the biological influence on the free energy principle but I still think it's an open question of how these agents are going to behave when you start building models that are this open
and that's going to be an open area of research. Yeah could you could you go into a bit more detail on that spectrum so in of course in the natural world all of this emerges from lower level dynamics but when we design these systems certainly with active inference now we make the agents explicit.
So it's a little bit like old school symbol K.I. or just doing any Bayesian modeling I create all of these variables I give them names and I represent you know structures and the causal relationships between them so that's how it is now but you're kind of hinting at an evolution of that where it could
do some of that stuff by itself. Yes absolutely and that's you know essentially what we can do you know we can you know infants are are great at this and we are also especially infants because they have less priors to work on they have less prior knowledge about the structure of the world they
have to start creating these categories and also mapping things I'm using the word metaphor kind of loosely here but metaphorically saying this is like this other thing and so I can use the same properties to sort of link them together and map one model onto a different model in leverage that information I learned somewhere else to something that's similar.
And so in some sense you know the priors and the inherent nature of those agents their self model that has been crafted by evolution so I don't want to use the word too teleologically I just mean that evolution is produced in agent over time that happens to be the kind of agent that
expects to be in a certain kind of environment and its preferences and habits and things like that have been sculpted through its evolutionary history which is you can think of like compression of billions of years of supervised training in you know one agent's DNA that's unfolds with time as
you know the agent develops so when you know even though the agent may we might say that it's you know it is inventing categories and things and learning about the world first of all it is of course constrained by evolution and all of that that's come prior to it it's not like a
completely blank slate but in addition to that there is also supervision through parents and culture and other things that also kind of restricted space so when I see active inference agents that we're building I think the analogy is very similar we are sort of the evolutionary benefactors
in a way of where we can sculpt in certain types of what do we want this agent to be and within that space it's still free to explore and you know choose its own goals and acquire preferences and personalities and things like that but it won't be like a completely open space and I think it's
going to be hard anyways to design agents that are completely free and open you can probably gets all kinds of aberrant interesting behaviors that you may not actually want that we just emerge out of it and there's active inference researcher norsegid has done some work in this area as well
learning in active inference we call it prior preferences with the agent prefers to do and that would be probabilistically what it expects to be doing you can get some very strange behavior depending on how you determine that system so I think there needs to be some guidance from us as creators of
these systems in order to even have a successful agent in the first place and I think that's analogous to how evolution and you know growing up with parents and society and things like that also sculpt us in a similar kind of fashion when you start talking about these higher level more you
know abstract things that influence us so you know society and culture and just the general rules that we follow I think what they don't constrain us physically per se but we learn about what is you know permissible is what's expected so we learn what to expect about the world and we sculpt
our world in that way to make make it make sense some of these maybe cultural rules that are just part of you know this is the expectation of how you behave in the society or this family or this culture and I've actually been traveling to Europe and I always look up you know before I go to
a new country what is expected of me how do I behave right because you can get interactions that you wouldn't even expect things that would seem aberrant to someone else because their norm is in a different direction than yours but then to you is totally normal and so there's the cultural
aspect as well but there's also even just like traffic laws for example like learning a new city you know how do you behave in the city and then for that peat for the people who live in the city all those items are all predictable things like I was stopped signing you know it makes the world
make sense to you and makes everything much more efficient because you don't have to think about them there are just these assumed priors that are just part of your your generative model and so I see them as those sorts of ideas don't constrain us physically at least not directly
but they're still in some sense a physical constraint because they become in a more abstract way because they become part of your just narrative about how you see the world that can be very unconscious and you don't even think about how you behave you just know you know your brain wants to be
efficient you know this is how you unfold these actions you don't have to think and calculate you know every single piece in detail about certain things but that in itself is a kind of physical constraint because it would stop you from doing certain kinds of activities not directly
physical in when we talk about the slower level but in some sense still physically stopping you from doing something unless you make the choice to well I'm going to violate this rule that's expected of me in the society or culture or something like that yeah I'm really interested in this
kind of hierarchy of phenomena right so you know obviously we have the physical world you can take certain lenses like the self is gene right so you can think of of genes and mimetics and you know and you can go all the way up to culture and language and so on and as you say they're
actually in tango they're sharing information with each other and we seem to somehow model all of this right so you know we know how to behave physically and verbally and culturally when we're embedded in different societies I mean how would you build an active inference agent to know how to
behave well in Italy yeah so I think that's a that's an interesting really interesting question because I think the first thing you would have to do is you know the actual things you would have to to teach the agent you know you could in some sense I want to say the word program the agent
and just say like here the things you should do but if you have an active inference agent the way that I would I would design it is the same way that you teach a child you know people who Italians to grow up Italian infants when they are in the world they come in with some preexisting knowledge
of course about you know space and time and some you know basic abstract principles about how the world works but all the cultural layer that happens to them you know between the ages of you know 0 and 5 you kind of probably even earlier than that you know an infant by the age of actually
it's just the age of like two I was just thinking about my I was just a cousin's wedding and one of my cousins children is you know she's maybe a year and a half and she met one of her friends you know playmates and they put the two of them together and they both kind of very tentatively
kissed each other's cheeks and there this was they live in Switzerland and so they have these are a cultural norms they've just been watching their parents they know okay this is what's expected to meet my parents said that when I meet another person my peer and you know we're
interacting this is how I'm supposed to behave and so I think the same thing would be true of a an artificial an active inference agent and even robots that don't have you know priors to begin with um you look at how infants learn behaviors and you can just teach the active inference agent those
things so it's sort of feedback um it will grow and learn this is huge space of possible actions and things that are behaviors I could do but I'm learning that according to the culture Italian culture in your example here these behaviors are more permissible than others and this is how I'm expected
to behave as uh to be bit facetious and Italian active inference agent these are some of the things that are expected of me within the context of my culture a good old imitation learning yes that's quite interesting philosophical good though because that is kind of pointing to behaviorism I mean a
lot of people say that even reinforcement learning is a kind of modern incarnation of behaviorism and it's really interesting from a cognitive science perspective because you know is is the only thing that exists the behavior right so you know that the children kiss on the cheek and then
and then something happens inside the brain right or inside our minds how how does that process happen so by purely imitating the behavior physically does does that trigger some commensurate representation or thinking inside our minds I would say so so I mean you know we take the language
of active inferences are probabilistic so there are certain things that can happen you know a nearly infinite state state space but some things are more probable than others and so the things that are more probable are what we expect so you know I can imagine that that infants model through
imitation is being refined over time as it interacts and it's told you know here's how you're supposed to interact the interaction of um kissing as a greeting is becoming a higher and higher probability or expectation under that agents that infants agent generative model and so that is the kind of there is an internal development that's happening where um you know every single time you meet someone that's the expectation of what's supposed to happen um and you know that's the most uncertain
situation so if you didn't if you didn't get it um didn't get that kiss from someone perhaps you be confused you know and then you may learn there's actually more nuanced things there's other cultures that don't do it or you know there are other situations where it's less permissible there
are certain you know subsets of of different types of situations where the certain behaviors applies and then you get to learn the nuance about how uh culture which is very complicated sometimes the rules apply in one situation but in another and you know depending on the parties involved so I
think this is all building that generative model and building the probabilistic model of what's the most common expectation of behavior that would all happen internally but then also through feedback receiving or not receiving something after executing a behavior we'll teach that agent
all that fine-greening of the nuances of the different situations in which that behavior applies yeah ever since I discovered externalism it's something that's uh preoccupied my mind quite a lot and I guess you could you could take a radical view and and you could actually argue that the
behavior itself is the locus of the cognition and that the the brain is just an impulse response machine and you know it's almost like we need to look outside the brain to actually understand cognition but certainly there must be some symmetry so if if the behavior is present certainly in
the presence of sophisticated brains then there must be some some representational resonance or transfer but it's about what came first yes so I definitely think that uh you know that that is kind of the key inside of active inference which is also present in you know fuels like cybernetics
um but being able to become a model of that system outside you so it's always you know relative to what you experienced and what's part of your cultural or ecological niche where you're you know you're adapted but those things I think um must in some way it's just more efficient to have
that model because you don't have to think about and plan your actions in a really complex way if you already know what's permissible you just have to look at situations that deviate from a norm um and then try to understand those deviations why you know you maybe can be feel a certain way
and then understanding those deviations of uncertainty from that expectation helps you uh better adapt a new situations in the world yeah because you know people talk about bio mimetics and that that simply mimetic simply means copying and isn't it interesting that
even in lieu of sophisticated brains you know like even implant ecosystems and stuff like that information is being copied all the time right you know i've not only genetic information but also simplistic behaviors and you know growing towards the sun and and stuff like that isn't that
fascinated you mean the the aspect of convergence the different types of creatures in different environments can convert converge upon similar types of behaviors or mechanisms is that we're referring to here yeah yeah so it's based at different levels of of agency you you see you know
different types of information sharing so at the very lowest level it it's simply just gene mimetics yes so i think um that speaks to just how effective that strategy is um you know we are isolated agents of limited information and um but we are able to take in so much from our
experiences and our memories um and so being able to interact with others who have experienced different things um i suspect that that's why you know social cultural cultural animals that have an aspect of social um behaviors are very successful this isn't of course there are many many
very successfully adapted creatures on the planet but many of them that are social tend to develop these higher order or higher level types of behaviors because they can do this type of information sharing so can you disambiguate active inference the free energy principle and Bayesian mechanics
um so i think this is in general a tricky task just because um you know the way the field is evolved um so the perspective that i like to take on it which i think um makes the most sense especially from an education perspective knowing that sometimes these lines can be a little bit blurred um
i see that the i look at the top down level starting with the free energy principle so the free energy principle itself is getting into a description of systems um which are dynamical systems of some kind um that are uh behaving into an accordance with the principle of minimizing
variational free energy which is a statistical quantity um that agents can calculate and um in general the free energy principle is a uh bound it's an a proxy for something else called surprise all um which we can think of you know very roughly speaking as a kind of uncertainty so
minimization of uncertainty which helps to restrict agents to a specific set of states that are conducive to their survival states they'll frequently revisit uh among all states they could visit in a state space so some of these may be you know for us as complex agents maybe cultural
layers of it there may be evolutionary and biological layers um that are more specific to survival but then as you get higher and higher they also encompass a lot of other kinds of behaviors that seem less directly linked to survival per se um that build upon this and when you have this basic
framework um of how agents can behave and you have this statistical quantity you can calculate you have a way of describing systems um that persist over time and are self organized in different languages you can uh mathematical languages you can describe them so for example you have
active inference which is the application of this free energy principle to the brain and animal behavior um and then you have Bayesian mechanics which is a much more broader take on the free energy principle as applied to dynamical systems theory so studying the property of systems which contain
variables interacting variables that change over time and how those systems stay stay different from one each other from the from the perspective of statistics so how they retain their properties so that's the demarcation level that I see it as I see the the free energy principle as an
underlying principle applied in two different domains so how has this evolved over the years because I'm aware there have been different formulations of the free energy principle and of course active inference came into play at at at some point how did this all kind of evolve over time
so um if you look at the sort of history of these fields um they have gone through many different phases or changes uh that's one perspective that I I try to emphasize in the book is um explaining the most modern incarnation of these ideas but also looking back historically because
that's really important for understanding how we got to where we are now and the logical threads that led us to this this place so the way that I see it is the very early papers um you know we're looking around 2000 early 2000s um we're looking at a way of taking a lot of the work that was done
in the 90s and unsupervised learning uh latent variable models a lot of this work was done by uh Jeff Hinton, Gaharmani, uh David McKay and others uh many other researchers and bringing that into the brain and looking at cortical function and cortical architecture and seeing how um there could be
a mathematical description of the type of operations the brain is doing that would match um is was well modeled by unsupervised learning that led into the uh a perspective of uh bringing the idea of action into this uh framework and in the early papers around 2006 active inference
and um the free energy principle were kind of described together where the free energy principle is much more of a descriptive term saying you know agents want to avoid physical damages to their physical structure so being crushed or you know undergoing what they call a phase transition
like you don't want to experience a certain range of things that are permissible for your body's physical integrity and so what actions do you take uh and what do you need to do as an agent to prevent uh disintegration or you know just becoming part of your the external environment you're
no longer different from the atoms and the rest of the atoms in the world around you um and then as the field evolved those two terms sort of branched out so around 2010 you start getting more of what we call active inference in continuous time and you get a bit more of what is now called the free
energy principle but then as those fuels evolved further um around 2015 you start getting more of the discrete uh formulation of active inference which we can we can talk about the distinction later between continuous and discrete here um and then we also have the free energy principle becoming
kind of this overarching philosophical and mathematical basis for active inference that later evolves into what's now called Bayesian mechanics around 2018 and so the way I see it from here we now have three different strands that are kind of uh going out from this point on word that I'll interact with one another but have slightly different uh sets of papers and formalisms that are associated with them but are all unified under the same umbrella of this theme of free
energy minimization so what is active inference? Active inference then is uh so the application of the free energy principle to the to agents that perform perception and action so these can be biological agents that have brains human human brains animal behavior um and uh and this view is in literal name actively inferring so everything under active inference is inferential meaning that there is an unobserved or unknown quantity out there what is the state of the world?
and so that has to be inferred indirectly from some kind of sensation that these agents receive and that inference process is perception active inference um and actually just to say this now the inference process uses the free energy principle in that it minimizes variational free energy
in that calculation so that statistical quantity is the basis uh in machine learning um talk it would be an objective function that we try to minimize which will enable the agent to uh describes the behavior of the uh neurons or representation units in that agent coming to a
steady state so the behavior of those neurons as they interact follows a gradient flow on variational free energy so it describes what that system is doing its behavior and that process of perception um which inherently relies upon free energy minimization is taking to the next level with active
inference to suggest that action itself is also a form of inference that also requires minimization of variational free energy so i'm saying this specifically to underscore the idea that this property of free energy minimization inherent in the free energy principle is a component of both
action and perception and these are kind of unified together they're not under a separation principle separate modules in the brain where you do one and the other they're kind of linked together and they're linked as one type of thing that the brain does where action is a prediction
or an expectation um about the kinds of sensory information the agent expects to receive and it controls its environment to bring about that state of affairs so when you link these together you get essentially what we call uh active inference um which is all written in this language
uh probabilistic models where the optimal action or the optimal perception is described as a minimization of variational free energy it might be worth just meditating on this for a second so in in machine learning we just have the perception bit and it's also relevant that we just
usually train once so now we have perception and action linked together in what i guess Fristima call a cybernetic loop so we have but the iterative component of information sharing is really important i think computationally the reason why we can um encourage the emergence of
sophisticated behaviors because we have this iterative inference and an information sharing with with the environment and that seems to set it apart yeah so you mean inferred iterative iterative in the sense of like online like you're continuously learning um as information is
streaming in and that's changing the model yeah that that that that's right so it's a similar thing to a cellular automaton that that's also iterative so at every single time step we are using the the state from the previous time step and then we're performing a computation and
we're we're iterating and iterating and it's essentially an unbounded computation and that's a very very different type of computation to machine learning where you just compute a bunch of weights once and then that's it that that's the end yes so that's a really good point i one thing i
didn't i didn't mention and my description is to moment ago was the temporal nature of this that's just inherent in these models it's always assumed that the environment is dynamic which means it's always changing now if you are in a new the speaks to this the idea of um
inferential learning um you can constantly keep updating these models as new information comes in if you're perceiving something different that changes um how your action is going what actions are going to take so this level of control is really intimately linked to what you're perceiving
because if the state of affairs changes outside in the world you're in a new location something is new you may have to perform new behaviors in order to reach back to this free energy study state and so this is a kind of iterative process as you say it takes place over time um and the dynamics
is really important in these models and it also kind of speaks back to some of the other topics we were discussing in that um a very volatile environment um you know it may be hard to find that minima but if you build in a lot of um a lot of predictability into your environment things that
are very stable um things that are very predictable because you have cultural rules or you know rules about how the world operates uh it makes it much more efficient and easy to find that minima because there's less things to consider uh as options in there yeah we don't need to meditate
on this for too long but of course in the ML world they do have reinforcement learning and and that that also has this kind of iterative property what what's what's the rough difference between the two approaches so i would say the there's there's a number of different um differences in
there in that for example um in in machine uh in reinforcement learning um the state action mapping um is what your your policies that you're attempting to determine what should the agent do in terms of uh what actions will bring it into certain states versus the active inference approach
um is looking to compute using as an inference problem uh sequences or trajectories of actions the agents you can think of them as paths if you want to use kind of a physics metaphor of um actions being these kind of motion paths over time um then you can think of them unfolding in time
and picking the right path to reach some sort of end goal state um and so that's what you what's call a policy sometimes it's better to call it a plan so that we can disambiguate from confusion with reinforcement learning um it's a plan of action sequences to execute at a given moment in
time based on the information and knowledge the agent has in that present moment um combined with its preference of what it wants to amusing that anthropomorphically but you can say expects to receive in the future and so a component that i think really separates active inference and
reinforcement learning is the appearance of this uh epistemic value term that appears and what's known as expected free energy which uh roughly speaking is a way of using uh uh is a way of uh uh a function that you minimize uh for for selecting policies in the future so we're information you
don't have yet because variational free energy is used for current and past information expected free energy is minimized for future planning uh picking out those plans or action sequences and what's really important about active inference agents is they have the ability to forage
for new information and look around for interesting and new information in their environment um and reduce the ambiguity um they have about their environment in order to increase the confidence they can actually reach that goal state so there's a natural uh level of curiosity and exploration
inherent in active inference models that are part of the way that an active inference model uh active inference agent functions yeah i mean i do certainly in reinforcement learning you can do exploration but i can appreciate it it's it's far more principled if you do it in in a Bayesian way
because you actually know where where there are areas of uncertainty uh when i spoke with Jeff Beck about this he said that admittedly it was kind of similar to an in like a maximum entropy inverse reinforcement learning agent but surely it must be more principled in other ways though
it's a really good way of of defining you know states and and agents and being able to design much more structure into into the system rather than the inscrutability of reinforcement learning yes i think the the key the key way that active inference looks as a problem is purely inferential
so it's sort of the planning is inference statement um all of the behavior that you see in an active inference agent does not require um you know ad hoc temperature parameters or other kinds of things that you do to tweak the system um the behavior just emerges out of the expected free energy
equation itself and so you have uh this idea of curious and exploratory behavior as a kind of inference in itself um where you have this long term uh you know incurring of uncertainty or exploration for the purpose of in the long term average minimizing variational free energy uh for
future states so this sort of perspective on um inference and the way that it's encoded in the model is different in the sense that from the way that it's performed in reinforcement learning san jie thank you so much for coming on i really appreciate it absolutely thank you so much for having me