AI seems like it burst out of the gate a few years ago, But is it actually the latest chapter in a three hundred year trajectory to turn thought into math?
Can the mind be captured with equations?
Why do current AI models need petabytes of data but a child can learn from just a few examples. Why does AI have jagged intelligence, meaning it looks brilliant in one moment and then it does something totally nonsensical. In physics, we have various laws, like the law of gravity or the laws of motion, And today we're joined by cognitive scientist Tom Griffiths from Princeton to talk about whether we are moving towards nailing down laws of thought. Welcome to
Inner Cosmos with me David Eagleman. I'm a neuroscientist and author at Stanford, and in these episodes we sail deeply into our three pound universe to understand why and how.
Our lives look the way they do.
One thing that distinguishes Homo sapiens from all our cousins in the animal kingdom is that we watch the world around us and we try to abstract patterns from it. For example, you might watch the way that a stone falls to the ground and maybe you see a tree branch fall, and maybe you see a glacier and one day a huge wall of ice falls off it, and pretty soon you start seeing an underlying similarity to the
way that things move. And eventually someone very very smart comes along, like Isaac Newton and summarizes all this in the law of gravity. And then the same smart guy, Newton comes up with the three laws of motion. And
then another smart person is Einstein. He figures out the conservation of mass and energy, which seems to be another ironclad law, and then we have the laws of thermodynamics and electrostatic laws, and all of this speaks to the great success that we've had as the species in figuring out the lowest level of code that's running in the universe. But for most of human history, the concept of a thought has felt like the most intimate thing we experience and the least tractable thing to study.
What a thought is and how it occurs.
That seems to live in a different category of mystery from how an object falls. Why, Well, it's because the thought pops into your head and somehow it carries memory and expectation and language and often a feeling. But it feels vaporous and private. It feels like the one thing that will forever escape formal description. But what's interesting is that for centuries people have tried, there's always been a deep human urge.
To ask whether thought has laws to it?
In other words, does the mind have principles that you can write down? Does reasoning have a grammar to it? Can you describe intelligence in a language that's precise enough that once you understand the rules, you can begin to build with them, like build artificial intelligence. Most of us are old enough to remember that this question of AI once lived in philosophy seminars and math departments, but now it's sitting at the center of our economy.
Okay, So what is thought?
Can we capture it in formal systems like laws or equations? Do different parts of intelligence come from logic, from learning, from uncertainty, from memory, from prior knowledge, from living inside bodies, from living inside our cultures?
From the particular.
Constraints of being a human animal with a short lifespan and limited bandwidth. Our guest today is someone who lives right at the intersection of all these questions. Tom Griffiths is a professor at Princeton, where He directs the Computational Cognitive Science Lab and the Princeton Laboratory for Artificial Intelligence. He has spent years asking how minds work through the
different lenses of math and computation and learning. And he's the author of a wonderful new book called The Laws of Thought, which traces the long history of thinkers asking are their rules to this? Can we understand what human thinking is? In his book we get the lengthy arc
of minds trying to understand mind. This begins millennia ago with Aristotle, who wondered whether logic itself could be math matized, and Tom follows the trail through the architects of symbolic reasoning, through the birth of computation, through the rise of neural networks, through the realization that probability theory might serve as a
language for our beliefs about things. Along the way, in his book, a picture emerges that there may not be just a single tool for capturing their mind, but instead there are different ways of trying to tackle the problem, and each one sheds light on a different aspect of cognition. So we're going to talk about ourselves human minds, and we'll talk about AI what kind of intelligence is this and what is missing? Here's my interview with Tom Griffiths.
As soon as you turn thought into math, it becomes something that machines would be able to do. And so our modern AI systems are really a consequence of, you know, that thought that people were having hundreds of years ago, of being able to turn thought into something that can be expressed in mathematical terms.
And so one of the things that I loved about your book, by the way, is that you really tell stories of all the thinkers.
You dive into the lives, you tell them with real color.
If you were going to start with one thinker that you think is the most important, who would that be.
There are a couple of people who have this sort of enduring influence throughout the book. One of them is Leibnitz, who kind of started this enterprise in some sense. He was really trying to take the idea of logic as expressed by Aristotle and turn it into math, but ultimately failed in doing that. But along the way he also discovered the calculus, which turned out to be really important when people wanted to make neural networks that could learn
from data. It turns out that the trick for doing that is actually a trick that lad had figured out all that time ago. And then another key figure here, as might be suggested by the title of the book, is George Bull, who was a nineteenth century mathematician. He was a school teacher for most of his life and did a lot of like serious math on the side instead of you know, had a big effect on the history of mathematics. But he was really the person who
then first solved that problem that Leibnitz had posed. And in addition to the impact that that work had, he's also the great grandfather of Jeff Hinton, who was one of the people who played an important role in developing these algorithms for learning from your own networks. And so you could make an argument that without Boole we would be a fair way back from where we are today.
You know.
Interestingly, when most people think about Boole, they only know about Boolean numbers. They know about zero and one binary numbers, and that's essentially the extent of the think. But he was quite celebrated in his life right even though he was a headmaster and not formally involved as a professor. Am I correct about this? He nonetheless was quite recognized as a mathematician.
Yeah, he became a university professor later in his life, but spent most of his life as a teacher and a head master. But yeah, he won a gold medal in mathematics from the Royal Society. Was a very prestigious award, and you know, was this amazing person who was having these high level correspondences with the leading mathematicians of the day while holding down his job running a small school.
Yeah.
Now, in the book, you essentially use three different frameworks. What phenomenon does each framework explain?
Unusually Well, the.
Three frameworks I talk about in the book are what I call rules and symbols, which is what we've been talking about. This kind of like approach that stems out of logic, where the idea is that you're going to be able to write down some rules that characterize the structure of thought, and by following those rules, you end up with interesting consequences. The second approach is networks, features and spaces.
Right.
This is neural networks, which you can kind of think about as a system for doing computation when you start representing things as points in a space. Right, So if you start to think about you know, every object that you could see in the world is not being something that's described by rules, but being described by a location along some dimensions. You need to have a way of talking about how to map between those spaces and your all network solve that problem. And then the third is
probability and statistics. And probability theory is really powerful because it is the complement to logic, where logic tells us how to go from things that we know to be true to other things that we're equally certain or true. Probability theory tells us what to do when we're uncertain.
So if we get some information we want to draw a conclusion, but we're not able to draw that conclusion with perfect certainty, Probability theory tells us how to do that, and it tells us how to combine our sort of background beliefs, the other sources of information we have our biases in with the data that we see in a way that helps us to explain how it's possible to
learn from small amounts of data. And that's one thing which is still something that discriminates human learning from the learning that's done by AI systems today.
Okay, great, so we're going to dive into each of these three lenses. But just before we do, do you see the AI conversation today over indexing on one of these lenses over the others.
I think there's a lot of emphasis on neural networks, which are fundamentally the sort of engineering technology which is making possible the creation of our chatbots and the other sort of big AI systems that are deployed. I think that potentially misses out the importance of these other threads right where. One thing that's important to remember is that those neural networks are being trained on what is essentially
a system of rules and symbols. They're being trained on human language, which is symbolic and rule like in various ways, and they're being trained on code, which is even more symbolic and even more rule like, And those things together provide some of the substrate for developing the kind of intelligence that they demonstrate. And then the way that they're trained is by learning to predict the next token, right, the next word or part of word, based on what
they've seen so far. And that way of training them is actually using probability theory. So that's a probabilistic problem because you're making a guess about what the next thing is going to be based on the things that you see, and so that's an important ingredient in their success as well, is that they're essentially learning to approximate a big probability distribution.
So let's dive into the first one, rules and symbols. So take us back to the original urge. Why did early thinkers believe that this could be used to explain thinking.
I think a lot of the draw of rules and symbols was that that really was, in some way what mathematics was to people, right, So Leibniz, part of the reason why he wasn't able to solve this problem of figuring out how to turn thought into math is that what he thought math was, or the kind of math that he was trying to use to solve that problem, was really arithmetic, right, And arithmetic was kind of like
the model that they had for a mathematical system. So you can think about ideas being added together or subtracting one idea from another, and really thinking about the operators that you're using as being the things that are sort of coming from this familiar mathematical language.
And so I think part of.
The reason that we end up with that approach is because of the kind of math that has been successful in other settings, right where we need to do arithmetic to you know, that's a good description of certain.
Kinds of things that human minds do.
Google had the insight that you needed a different kind of algebra in order to describe thought, and then that's what leads to modern mathematical logic. But it's still in this kind of symbolic language, although Gooole also talked about probability theory as being important for capturing languages as well. So I think it's really more about what are the kinds of mathematical systems that it was sort of straightforward to formalize, and that gave us something that we could
try to map thought onto. And that's what we do as scientists is often taking mathematical systems that mathematicians have defined for us and then saying, oh, I think this mathematical system maps onto the thing that I want to understand, and so trying to establish that correspondence and not just then allow us to derive its consequences.
So speaking of rules and symbols, So thinkers like Newl and Simon, they popularize this idea of goals and sub goals. What did that viewpoint get exactly right about human problem solving.
So now we're fast forwarding a bit right from we have Boule figuring out the structure of logic. That turns into you know, lots of people then sort of turn that into a sort of mature theory of logic. You get aalenteering kind of turning this into a theory of computation, thinking about what an abstract mathematician is doing when they're doing something like logic, and thinking about how you can
make a machine do that. And then we have people starting to realize that, you know, as digital computers are being developed, maybe those provide a good model for how thinking works in general, and then trying to use a computer as a sort of foundation for you know, thinking about things like how people might solve problems. And so Alan Ewele and Herbert Simon were influential cognitive scientists who
did exactly that. They had this idea that maybe there is a way that you could make computers smarter by using insights from human cognition, but also get a better understanding of what humans are doing when they're solving problems by using the sort of ideas that come from things like computer programming, and so they set up you know this, you know, when we're trying to solve a problem or prove a mathematical theorem or play a game of chess, they set this up as a problem of searching through
a tree of possibilities, where what you're doing is making choices, and then each of those choices gives you a new set of choices, and each of those choices gives you a new set of choices, and the hard thing is finding a path through those choices that leads to the point that you want to end up at. And so that's something where you can take inspiration from how human mathematicians solve problems. You can take inspiration from the kind of you know, tricks like working backwards from the end
towards the start. Right, Those were principles that they were able to use to try and explain these aspects of human cognition as well as making the machines work better.
Okay, but then one of the things that happened is that at least one of these attempts had ballooned into twenty five million rules. And so what does that teach us about the shape of human intelligence.
This rules and symbols enterprise.
Right.
The sort of appeal that this had was that maybe one day.
You could just write down all of the rules that you need to write down, and then you've characterized how intelligence works. Right, So it's just a matter of getting enough rules in a way that's very reminiscent today, right of you know, the way that our modern AI systems are being made is by training them on more and more data, right, feeding in more and more language. There was a hope that you could just like, yeah, like document all of the rules that you need to capture
the structure of human knowledge. And so that led to you know, companies being started to try and engage in that enterprise, ultimately I would say, unsuccessfully, but giving us some kind of characterization of like particular subsets of human knowledge. And so I think the thing that came out of that enterprise was revealing that maybe you need something more than just rules, right, that maybe thinking about logic as a basis for our model of intelligence was missing something.
It's an approach that worked really well for certain kinds of problems like doing arithmetic, playing games or chess, but it didn't work very well for other kinds of problems like figuring out what you're seeing in the world, or actually learning language or these other kinds of things.
And so this is what leads to your second lens, which is neural networks. And you talk about these as having you know, a boom and bust history. So, first, what happened in the last decade that allowed them to turn into the dominant paradigm.
The big breakthrough in the last decade was really about being able to make bigger in neural networks that could be trained on more data in a way that could scale, right, and so bigger here means what these are. An artificial neural network is a set of units that are communicating with one another. They're communicating along weighted connections, a sort of you know, imagine like how neurons are connected in your brain, and those neurons are connected to one another
and sending each other signals. An artificial neural network is basically simulating that kind of structure inside a computer. And so for a long time, the sort of the history of neural networks has been one of people figuring out how to make bigger neural networks work. So the very
first you know, learning neural networks. They had a learning algorithm that worked for one layer of weights, and then there was a breakthrough in the nineteen eighties that meant, now you had a learning algorithm that could work for multiple layers of weights, but it didn't work for very deep neural networks with lots of layers of net weights because it I can sort of explain the technical reasons behind it, but you know, sort of like the basic
algorithm didn't quite work. And so the big breakthroughs of the last you know, ten to fifteen years have been about coming up with ways to take those algorithms and actually make them work for neural networks that are bigger and bigger and deeper and deeper, that are able to easily learn more complex functions and can do so from massive amounts of data in a way that means that they're able to discover sort of complex relationships between things
that are necessary to produce intelligent behavior.
And so, what are these neural networks capture about cognition that symbols missed, especially in terms of things like similarity and fuzziness and graded concepts.
Fuzziness is a really good way of describing it. It's that you know, if you ask somebody, you know, whether something is a piece of furniture, they're going to say, you know, if you show them a chair, they'll say, yes, definitely a piece of furniture. If you show them a rug,
they'll say, yeah, maybe a piece of furniture. Right, it doesn't sort of fit with our you know, week sort of have a prototypical idea of what furniture is, which contains things like chairs and tables and ottomans and these other kinds of things, and then rugs and treadmills, and you.
Know, like these are things that maybe.
You're in this category, but maybe an't right. And so we need to have a way of thinking about concepts that's not just the sort of yes or no, true or false one or zero that logic would give us. We need to have something which has that fuzziness in it.
One way of getting fuzziness is by thinking about a concept in terms of points in space, right where you could think chairs are here in one location, rugs are here in another location, and maybe what it is to be a piece of furniture is to just be in some part of that space, and how close you are to that part of the space is like how good you are as an example of that kind of furniture. And so as soon as you think in those terms, you have a new problem, which is with our rules
and symbols. We knew how to do computation, we knew how to describe thinking. Thinking was a matter of applying the rules and seat of you know, repeating that process. But we don't have a way of doing computation in spaces. And that's what youral networks give us. So you can kind of think about a space corresponding to the activation of the units inside this neural network.
How much you know, how much input.
Each neural unit in that neural network is receiving, and how much of a response it's making that characterizes some kind of space. And then neural network gives us a way of mapping from the inputs that it's getting to some output.
So you could put in you know.
Your picture of a chair, and it maps that to some point in space, and then it put sort of produces out an output the corresponds to, yes, this is a piece of furniture. And because those outputs can now be continuous values, you can capture the fuzziness and other kinds of things that you want for your concepts.
And so, in what sense are these modern systems, these artificial neural networks learning, and in what sense are they doing something that's maybe categorically different from how children learn.
This is a fundamental question, right, That's the kind of thing that we cognitive scientists think about a lot, and I think that AI researchers are starting to care about a lot too, which is, you know, what are these sort of meaningful differences between human minds, human brains and what we building in these AI systems or these sort of artificial brains. I think one very salient difference is the amount of data which is needed for a human to learn language compared to the amount of data you
need to put into on neural network. So if you take a system like chat GPT, right, one of these chatbots, those systems are trained on the equivalent of something like five thousand to fifty thousand years of continuous speech. There's sort of massive amounts of data that are going into that system. So it's like on the order of a thousand or ten thousand times as much data as a human child might get in order to learn language. And the reason is that those artificial neural networks are really
kind of like undifferentiated learning machines. You can take that same kind of neural network, you can get it to learn all sorts of different kinds of things. It works really well for learning language, but you can use it to learn something about vision or something. You know, you can sort of take all sorts of problems and give it to them and it can learn how to do that. And so as a consequence, they have what we call
in cognitive science machine learning inductive biases. They're not biased to towards any particular solution to the learning problem, and human brains have stronger inductive biases for things like learning language. Right, we're sort of disposed towards certain kinds of things, which are human languages. The things that we call human languages
are the things that we're disposed to learn. And as a consequence, you know, we're able to sort of narrow down the space of possibilities in a way that means that we're able to learn from less data.
Okay, this makes a great segue to the third lens. So you talked about rules and symbols, and you talked about artificial neural networks. The third part of your book is about probabilities and statistics. So why did probability become an attractive candidate language for thinking about cognition?
Probability there is a good way of answering certain kinds of why questions that we might have, right so, and the reason is that it's a way of characterizing how
a rational agent should make an inference. So all the way back in the eighteenth century, British nonconformist minister, the Reverend Thomas Bays had this radical idea that you could talk about, you know, again, like take a mathematical system probability theory, which we would use for describing what happens when you roll dice or flip coins right, sort of you know, sort of language of gambling and saying, oh, in fact, that mathematical system might also be a really
good system for describing how beliefs work. And so what he was interested in was if you think about a belief as you know, a degree of belief, right, you can say, oh, I think it's going to rain tomorrow, and I'll put put that on a scale which goes from zero to one, where zero is you know, not going to rain, and one is one hundred percent it's going to rain.
Tomorrow.
Right, That is a belief that you've expressed in the form of a probability. And now if you, you know, wake up in the morning and look out the window and you see gray storm clouds, you've got a new piece of data. You need to revise your beliefs, and probability theory actually tells you how to do that. It says, you know, for each hypothesis, right, so our hypotheses here are it's going.
To rain or it's not going to rain.
Right, You're going to modify that belief based on how likely the data is that you saw if that hypothesis were true. So, because gray storm clouds are more likely if it's going to rain that day, we should increase our belief that it's going to rain. And as a consequence, well, we'll end up with a number that's a little bit higher than the number we had before.
And probability theory tells us how to do that.
There's a principle of probability theory called Bays rule after the Reverend Thomas Beys that tells you how to take your original beliefs and then turn them into the beliefs that you get after seeing data. And that turns out to be exactly the tool that you need to answer these kinds of questions about how inductive biases work. Right, So, how is it that children are able to learn from
less data than anural networks. Well, it's a consequence of you know, these things that we can describe using different probabilities being assigned to different hypotheses, whether hypotheses correspond to the structure of the languages that are being learned.
And when people call humans irrational, what changes if we look at mistakes as resource limited inferences.
This is one of the things that I explore in my own research is this question of how we should actually think about rationality for real agents, right, And this is relevant if you're building an AI system or if you're just trying to understand human behavior. So I think, like I said, probability theory gives us a characterization of what you should do as an ideal rational agent, But
that assumes that you have infinite computational resources. We mirror, humans don't have infinite computational resources, nor do our AI systems, And so you can ask what should a rational agent do if they don't have all of the computational resources that you might need and then out of that you get the answer is that you know, you should follow an algorithm, follow us strategy makes sense given the resources
that you have. That's what it means to be rational in those circumstances where you're sort of doing the best job you can of approximating probabilistic inference given those resource constraints.
And so some of the things that people do when people do weird things, and we do lots of weird things, and we don't always follow probability theory, things that we can understand as us, you know, running into those resource limitations and then coming up with, you know, reasonable strategies for trying to approximate the right answer.
So, if we look.
At probability being the grammar of uncertainty, one thing we know is that our prior expectations matter. And one of the things I've been obsessed with and doing a lot of research on and talking a lot about on the podcast is the way that all of us drop into the world and our cultures influence us and our language and our moment in time and our neighborhood, and this
leads to people being quite different on the inside. Is this something that you think about sometimes about how we develop our priors differently based on you know what where we grow up.
Yeah, So priors is that Daysian language, right, for talking about the beliefs that you have before you get data that you then update into what we call posterior probabilities that are informed by those data. And so yeah, I spend a lot of my research time thinking about these kinds of questions of you know what are these sort of prior distributions for humans? How do we acquire good
prior distributions for solving different kinds of problems. One thing there is that calling it a prior makes it sound like maybe it's something you're born with, but in fact, it just means.
It's before you get data.
Right, So when you're seeing that storm cloud in the morning, you had a prior probability from last night, and then that prior probability was informed by everything else that you know, Right, The priors are all of the biases and knowledge that we bring to bear when we're trying to make an difference. And so yeah, I think I think understanding that is a big part of the project of understanding human cognition.
So let's zoom the camera out.
We've talked about the three lenses that you describe in the book. Now, you also point out that we have a lot of constraints like finite lives in limited compute, and limited bandwidth, and so how do these constraints sculpt human intelligence?
I think this is really important to just thinking about the moment that we're in where there's a lot of anxiety around AI, right, And I think if you think about intelligence as a kind of one dimensional quantity, you can kind of imagine that you know, humans are somewhere, our AI systems are somewhere.
It seems like they're approaching us.
Maybe they're going to overtake us, and then, oh my god, what is going to happen when that happens, Right, We're just going to become redundant. There's not going to be any jobs. Everything is going to fall apart. And so that's a consequence of having a particular conception of what intelligence is, which is this kind of one dimensional way
of thinking about it. And I think there's a different way of thinking about it which gives us a little more flexibility and maybe a little more hope in the way that we think about what's going to happen with AI, and that is thinking about intelligence as being an adaptation to the kinds of computational problems that a system has either of or being trained to solve, right, And so for human beings, those computational problems are shaped by the
constraints that we operate under. And a lot of those constraints come from our biology, right that we, as you said, have limited lives, have you know, limited compute resources what we can carry around inside our heads, have limited bandwidth for communication. We have to like make noises at each other, or wiggle our fingers or you know, somehow use our bodies to transfer data from one human mind to another
human mind. It's very inefficient. And so those constraints are things that mean that human intelligence takes a particular shape, which is we're able to learn from limited data because we have to because we don't live that long. Right, You can't rely on getting five thousand years of language data or multiple human lifetimes of you know, chess playing or whatever it is. Right, you have to be good at using the resources that you have in ways that
are efficient. And so that's kind of like deciding what to think about being able to recognize when a problem has a structure that you I've seen before being able to you know, sort of like become sort of automatic in using certain kinds of patterns of thinking and strategies for solving problems, really trying to make it as easy as.
Possible for us to use the resources that we have.
And then you need to develop capacities for trying to circumvent those bandwidth constraints in order to be able to do things that transcend what any individual human can do, and that means developing things like language writing societies LLCs, you know, all of the sort of libraries, right institutions, all of the theory of mind, right for reasoning about what someone else might be trying to communicate to you. All of this stuff is actually sort of like human
stuff that's a consequence of these constraints. And so as we make AI systems that are smarter, those AI systems are in turn being shaped by what they're being trained to do and what constraints they operate under. But those constraints are different from the ones that humans have. They can you know, get more data, they can get access to more compute, they don't have bandwidth limitations.
You can just copy.
A you know, a state of an AI system across machines. You can use the same data to train multiple AI systems, and all of those things mean that I think, rather than being sort of on one axis where we're sort of thinking about better and worse, it's more that there are many axes that we can think about intelligent systems developing along, and we're just going to end up in a state where we have human intelligence and we have AIS, and they're going to be meaningfully different from one another,
rather than things that are sort of directly competing in terms of the capacities that they have.
Yeah, I agree with you on that.
When you think about the way that humans beat machines on data efficiency, what do you think that means is missing architecturally from our AI systems.
I think it's actually it's a great question, and the way I would express it is not in terms of architecture. So it's actually in terms of a different part of a neural network. So when we think about this problem of inductive bias, right, which is we're you know, what a system is sort of disposed towards learning. As I said, our neural networks, the way we normally set them up, are pretty weak inductive biases. They can learn all sorts
of things. The inductive bias that a neural network has it is constrained by its architecture, but it's also constrained by where it starts out in the space of the settings of all of those weights. And normally the default is that you set up your neural networks so those weights start out really small, close to zero, and then they sort of grow away from that as it starts
to learn how to do things. We've had success in taking neural networks that are architecturally identical but setting them with different initial weights in order to create an inductive
bias that enables rapid learning. So we act to use a technique called meta learning, which is a method from machine learning where you take the same neural network architecture and the same initial weights, and you use it to learn to solve lots of different problems, like you can use it to learn lots of different languages, say, from limited data, and then you try and optimize the initial weights of the neural network to make it so it can learn all of those languages better using the same
kinds of algorithms we use for training the weights of the neural network. When we have these giant data sets. You can instead use those algorithms to train the initial weights of a neural network for a small data set, for lots of small data sets, And when you do that, you end up with a neural network that has an inductive bias that makes it possible to learn from small
amounts of data. And so that's the kind of thing we've been exploring in my lab is can we find a way of taking exactly these same neural network architectures and just starting them out in a different place that maybe aligns better with the kinds of things that humans do. Okay, well, this is a really good segue to what I wanted to ask you, which is, if you're looking at rules and systems is one sort of math to describe the mind, and artificial neural networks is another kind of math, and
probability is another. What does an optimal hybrid look like? Given that no single, no one of these describes everything about what's going on with minds, So what does the hybrid for an aisystem look like?
In twenty twenty six, the place.
Where I end up in the book is saying that these different kinds of math really do all fit together in an interesting way, and in order to understand that we can talk about different levels of analysis when we're trying to make sense of an information processing system. So the most abstract level, this is an idea that was introduced by the computational neuroscientist David Marr. He said, the most abstract level is just thinking about the problem that
the system is solving in its ideal solution, right. And I think logic and symbolic systems and probability theory give us a good way of characterizing the kinds of problems that minds have to solve, right They you know, probabilistic
inference because we have to make these uncertain inferences. And then logic as a way of characterizing the kinds of things that are in the world that have this rich structure of you know, like a sort of combinatorial structure that you get from from having symbols and rules that combine together with things like language and dance and all of these you know, structured things. Even you know, if you look at trees, you can see they have like
recursive structures that are expressed in them. Right, So these kind of occur in nature and are important to be
able to understand. And then at the level below that you have how the system solves those problems, right, like what algorithms it might use, what representations it might use, And then below that it's you know, how that's actually implemented in some kind of physical system, right, and artificial neural networks give us a kind of story at those levels where we can think about them as being a good general purpose system for learning to approximate the things
that probabilistic in front tells you to do, and learning to approximate the structure that's contained within those symbolic systems. So I actually think, you know, the kind of story that we have right now that's emerged out of these advances in AI is actually a pretty good story for
how we could think about human minds working. The thing that's missing, most important thing that's missing is this kind of aspect of inductive bias, right where we haven't been able to capture what human inductive biases are like in machines and so that you have these meaningful differences that come.
Out of that.
But it's not a bad place for thinking about how these pieces might fit together to give us an explanation for how it is that mind's work.
So along these lines, which AI benchmarks feel to you misleading?
And how would you make better benchmarks?
So, in general, I'm not a huge fan of benchmarks, because I think benchmarks are useful as an engineering tool, but I, as a cognitive scientist, don't just want to know how well something is doing something. I want to know how it's doing that thing right and how it might be sort of messing that up right. So when we are designing experiments as cognitive scientists, we don't just say, oh, here's one hundred math problems. Go do with one hundred
math problems and we'll get a score. We say, let's choose a set of math problems so that which answers people give us tell us about the misconceptions that they have in a way that we can then diagnose, oh, you know, this is why this person is thinking this particular thing. And so I think there's lots of room for coming up with better ways of evaluating our AI
systems that look more like cognitive science experiments. We're really targeting understanding what's going on rather than just trying to get some brute sort of you know, performance score.
Okay, good, And you have talked about curiosity as a computational problem. So how do you think about what curiosity is and how we might measure real curiosity in a machine?
What problem is curiosity trying to solve? Yeah, this is this is a good question. You can you can ask this kind of question that we call rational analysis. Right, if you have a system that's solving a problem, you know what's what's the problem?
What's the ideal solution?
Okay, So for curiosity, we've argued and this is work with wretched debate.
Who is that UCLA That.
One way I think about curiosity is that you're trying to find things that are good in increasing your long run probability of being able to solve problems in the future. Right, So you know, it's sort of like you want data which for which the derivative of your total knowledge is
high relative to that particular data point. And so that explanation captures some of the things that happen in human cognition, where you know, in some circumstances, we're interested in the newest thing, something we've never seen before, Right, But in a lot of circumstances, those things aren't the things that grab our attention. It's more things that maybe we've seen a few times, and you know, we just sort of
noticed that they're starting to occur. If something happens once, you're just say okay, that was weird, and you sort of dismiss it. But when something happens a few times and it's unfamiliar to you, you say, okay, maybe I need to figure that out. Right, And something that happens all the time, you're not that curious about. That's just
the thing that happens all the time. And you can explain that by thinking about this sort of derivative, right, where if something just happens once, you shouldn't be interested in it. Because it just happened once, it's probably never
going to happen again. If something happens a few times, that's a clue that it's probably going to happen again in the future, and you've not seen it enough to actually know what's going on, And so paying attention to that is good in terms of that derivative of your
future knowledge. And if something happens a lot, then it's probably happened enough that you know something about what's going on and it's not that interesting, right, And so so that sort of sweet spot ends up being around the things that are sort of like just happening to your enough times that you're starting to realize, oh, this might be.
A thing that I need to pay attention to.
If you hadn't been on one capability that's going to unlock a broader intelligence, unlock a jump to that.
What's your candidate?
I actually think the biggest obstacle at the moment is more about generalizability of intelligence rather than any specific capacity, right.
And so people in the AI world talk about jagged intelligence, right, the sort of phenomenon where you have an AI system that can do something that's really smart and impress you, and then five minutes later does something that's really dumb on a problem that's like right next to it, and like, if it's able to solve that first problem, it seems obvious that it should be able to solve the second problem.
And you're just like, what happened? You know, why did it go wrong there?
And so that lack of generalizability is also a consequence of these kinds of inductive biases, right, So these human inductive biases that steer us towards a solution and let us learn from limited amounts of data, they constrain the kinds of solutions that we find are The kinds of solutions that we find are the ones that are sort of like generalizable at least to us, right. They are things that kind of make sense where if someone's able to do one thing, they'll be able to do the
other thing. And because the AI systems are approaching these problems just in a completely different way from a different starting point and then getting tons of data that's allowing them to sort of approximate what the human solutions are. But they're coming at it from another angle. That's the
thing that makes them jagged. It's not that they don't have sort of these same compatible inducted bis is that we have that are informed by having evolved in certain environments and having had experience of the world, and you know, all of these other things that are part of what it means to you know, sort of learn anything as
a human being. And so because they are coming with this different set of inductive biases, they're very influenced by their training data, they end up doing things that are sort of inscrutable to us because they are, you know, yeah, like coming at these problems in a way that doesn't make sense to us. You know, from the starting point that humans come from.
After writing this book, what do you think we understand now about minds that we didn't understand let's say, a decade or two ago.
So it's funny because when I have taught this material for you know, twenty twenty years at this point, I normally start my cognitive science classes saying, you know, welcome to cognitive science. This is going to be different from your other science classes. Normally, when you take a science class, someone is going to stand up and say, Okay, here's all the things that we figured out. Here are the
answers to the questions. And in cognito science, it's more that we figured out how to get better at asking the questions.
We haven't answered them. We don't.
It's not like you have a consensus across the whole field about what those answers look like. And so I think that's important that we're still very much And this is what got me interested in cognres science in the first place. You know, still a field that has deep mysteries and lots of opportunities to learn and discover interesting things.
But I think over the last ten years, like so, as I was working on this book, I wrote the first chapter, and I had that disclaimer in the first chapter and said, okay, look, you know I'm not bromising you answers. Well, well, we're going to see if we
can get a good handle on the questions. But by the time I got to the end of the book, right after that sort of process of working on it for years, I felt like things that actually, you know, me going through the process of writing it and exploring all these things and thinking about how they fit together, but also just where the field was, you know, having moved forward, I actually started to feel like, actually, these things do fit together in a way where you can
see the glimpses of what answers are going to look like in a way that I think really wasn't there ten years ago. And it's that story of Okay, we sort of know what the goals are, right, we know what the right mathematical systems are for describing what intelligent
system should be doing. You have these ingredients of symbolic systems and probablistic inference, and we've discovered that in fact, you can get a remarkable way just using these artificial neural networks to learn to approximate those things, and so that demonstration I think has shown first of all, that language is a extremely good substrate for intelligence right in a way that I think people had not anticipated before large language models, and that you can make big neural
networks that can learn to approximate really complex probability distributions. And so it gives us some of these ingredients for seeing how what originally worth three very different views of the mind might start to fit together to make something that's a little bit more of a unified hole.
Excellent, And when you wrote the book what struck You is the most beautiful idea in the whole quest, in the whole history of this gosh.
Okay, I mean, I'm a big probability for theory fan, so going to you're gonna get me endorsing Bays rule, which I really do think is like it's it's when you learn it, take it a probability class. It's just
like it's just a dumb principle of probability theory. But when you make this move of saying probability theory isn't just about dice and cards, it's about you know, beliefs, it suddenly becomes a very deep and insightful sort of principle, And in the book, I also show probability theory kind of subsumes logic, like everything that's a valid logical inference is also a valid inference in probability theory. Probability theory just kind of extends the surmountics of logic to these
cases of uncertainty. So to me, I think that's a that's a that's a big one. I kind of like that's where I live.
Yeah, excellent, And and somebody, if we do have a mature physics of thought, let's say fifty years from now, what is that change from us in terms of education, in terms of the way we build machines.
So I think this is this is exactly where we can go, right, which is, once you figure out the scientific principles of a domain, you can start to think
about how to do engineering right. So like you know, when you're an engineer and you go to engineering school, you take physics, right, and then you learn in your physics class what these principles are, and then you take your applied engineering classes, which are like taking those physical principles and telling you how to build a bridge right and explaining that you know not in terms of heuristics for what makes a good bridge, but in terms of
those fundamental physical principles. So I think that's a thing that's incredibly exciting here is that as we start to converge on what these laws of thought look like, it gives us the opportunity to do a much more sort of science based form of engineering applied to human cognition, thinking about how do we make an optimal you know, sort of learning environment, how do we support human decision making.
That's something that I work on in my lab is like, how do we put computation into human environments to overcome whatever computational constraints we have as individual decision makers and help us make better decisions. And how do we understand, you know, the kinds of things that people are doing in a way that allows us to then sort of like make suggestions about, you know, how they might do
them better. Right, And so I think there's a there's a lot of potential for you know, sort of human upside as we start to be able to answer these scientific questions.
That was my interview with Tom Griffith's. To quickly summarize his framework, Tom sees three major scientific approaches that all try to capture the mind. You've got rules and symbols, you've got artificial neural networks, and you've got probability theory. These very different approaches, and each which one has delivered something a little different. Rules and symbols give us language like machinery where pieces can be assembled and reassembled into
complex ideas. Artificial neural networks they give us graded concepts, meaning ideas can be fuzzy, and probability theory gives us a language for dealing with uncertainty. Now, what's interesting is that human minds seem to traffic in all of these modes. We use structured symbols, we also use graded concepts. We also revise our belief as new evidence comes in. And part of that is that we move through the world with prior beliefs shaped by our history, our culture, our language,
our neighborhood, our moment in time. So none of these models by themselves are the final answer. And what this means is that, like most scientific stories, this is one about humility. Tom's book illustrates how every generation arrives with some new formalism, some new piece of math, some new model that's powerful enough to illuminate an area of mental life and for a moment it feels like, hey, the whole mystery is finally collapsing. But then the spotlight widens
and we see more terrain. So what I love about this conversation is that it can leave us with a sense of progress and a sense of wonder. At the same time, we feel a convergence of different fields, and we can also feel how large this subject remains. Cognition is still a field in motion, So let's look at the big picture. When the field of physics matured, we could then build bridges and airplanes and power grids because
we had firm principles to build on. So once the laws of thought come into clearer view, what becomes possible for education and for decision making, and for rules that help us reason more effectively. So here we are at a very cool moment in history where the old dream of formalizing thought has escaped to the library and shown up in everyone's laptop. The big thinkers of centuries ago could sort of squint and see the outline of the project, and now we're living much more squarely right in the
middle of it. If there truly are laws of thought, they're going to teach us about our machines, but more importantly, they're going to teach us about ourselves, because although it's sometimes tempting to view the mind as a ghostly exception to the universe, the mind, presumably is part of the universe, and it is lawful and wondrous and discoverable, and every step towards understanding it enlarges the human story. Go to eagleman dot com slash podcast for more information and to
find further reading. Join the weekly discussions on my substack, and check out and subscribe to Inner Cosmos on YouTube for videos of each episode and to leave comments until next time. I'm David Eagleman, and this is Inner Cosmos.
