¶ Initial Hype vs. LeCun's Doubts
We've been told time and again that the massive large language models trained by companies like OpenAI and Anthropic are poised to utterly transform our world. We've been told that huge percentages of existing jobs are soon to be automated. We've been told that skills like writing and photography and filmmaking
are all about to be outsourced. And we'd be told if we're not careful, the systems built on these models might someday soon become sentient and even threaten the existence of the human race. But here's the thing. One of the AI pioneers who helped usher in this current age. is not convinced. His name is Jan Lacoon, and he's been long arguing that not only will LLM based AI fail to deliver all these disruptions, but that it is, and I'm quoting him here. a technological dead in.
People have started to listen. Earlier this month, a syndicate of investors, including Jeff Bezos and Mark Cuban, along with a bunch of different VC firms, raised over a billion dollars to fund Lacoon's new startup, Advanced Machine Intelligence Labs, which seeks to build an alternative path to true AI, one that avoids LLMs altogether.
After all of the hype and stressed and hand-reging around LLM-based tools like chat GPT and Cloud Code, is it possible that Jan Lacoon was right that those specific types of tools won't change everything? And if so, What's gonna come next? If you've been following AI news recently, you've probably been asking these questions, and today we're gonna seek some measured answers. I'm Cal Newport and this is the AI reality check.
¶ Episode Framework: Three Key Questions
Okay, so here's the plan. I've broken down this discussion into three sub questions. Sub question number one. What exactly is Jan Lacoon up to and how does this differ from what the existing major AI companies are doing? Sub question number two. How is it possible
That he could be right about LLMs running out of steam if everything we've been hearing recently from tech CEOs and news media is about how fast LLMs are advancing and how this technology's about to change everything. And number three If Lacoon is right, what should we expect to happen in the next few years?
And what should we hap expect to happen in the maybe decade time span? All right, so that's our game plan here. It's gonna get a little technical. I'm gonna put on my computer science hat, but I'll try to keep things simple.
Which really is the worst of both worlds because it means that the technical people will say I'm oversimplifying and the non-technical people will say I still don't make sense. So I'm gonna do my best here to walk this high wire act. Let's get started with our first sub question.
¶ LeCun's AMI Labs and Core Philosophy
What is Jan Lacoon up to? All right, well, let's just start with the basics. Um, I want to read a couple quotes here from a recent article that Cade Metz wrote for the New York Times. Discussing what just happened with Lacoon's new company. All right. So I'm quoting here. Lacoon startup, Advanced Machine Intelligence Labs or AMI Labs, has raised over one billion dollars in seed funding from investors in the United States, Europe, and Asia.
Although AMI Labs is only a month old and employs only twelve people, this funding round values the company at$3.5 billion. doctor Lacoon, who's sixty five, was one of the three pioneering researchers who received the Turing Award, often called the Nobel Prize of Computing. for their work on the technology that is now the foundation of modern AI. Dr. Lacoon has long argued that LLMs are not a path to truly intelligent machines.
The problem with LLMs, he said, is that they do not plan ahead, trained solely on digital data, they do not have a way of understanding the complexities of the real world. Quote, if you try to take robots into open environments, into households, or into the street, they will not be useful with current technology, end quote. Uh Mr. LeBrun, who's the CEO of AMI Labs, told the Times. We want to help them reach out.
New situation react to new situations with more common sense. All right. So that's kind of a high level summary of what's going on. Let's get into weeds here to really get into the technical details of what Lacoon is saying and how it differs, how his vision differs. from what the major existing frontier AI companies are actually doing. All right, let's start with a basic idea here.
If you're an AI company, you're trying to build artificial intelligence based systems that help people do useful things. This could be like by asking them questions with a chat bot or having the system help you produce computer code. If we're talking about coding agents, at the core of all these products needs to be some sort of what we can call digital brain, something that encapsulates the core of the artificial intelligence that your tool or system is leveraging.
So the major AI companies like OpenAI and Anthropic have a different strategy for creating those underlying digital brains than Jan Lacoon's new company has.
¶ The Standard LLM Approach Explained
All right, so what are the existing AI companies doing? They're all in on the idea that the digital brain behind these AI products should be a large language model. Now we've talked about this before, you've heard this before, so I'll go quick, but it's worth reiterating. A large language model is an AI system that takes as input text.
And it outputs a prediction of what word or part of a word should follow. So if we want to be sort of anthropomorphic here, what it's trying to do is that it assumes the text it has as input is a real pre-existing text. And that what it's trying to do is correctly guess what followed that text in the actual real existing pre-existing text. That's really what a language model does.
So if you call it a bunch of times, so you give it input, you get a word or part of the word as output, you then append that to your input and now put the slightly longer input into the language model, you get another word or part of a word. And if you add that to the input and put that through the model, you slowly expand the input into a longer answer. This is called auto-regressive text production that you keep taking the output and putting it back into the input.
Until the model finally says, uh, I'm done. And then you have your your response. So we can think about it. Then if we zoom out a little bit, the large language model. takes text as input and then expands whatever story you told it to try to finish it in a way that it feels is reasonable. Under the hood, they look something like this. Jesse, can we bring this up on the screen here? Um, this is like a typical architecture for a large language model. You have input, like here it says the cat sat.
Yeah, that gets broken into tokens. Those get embedded into some sort of mathematical semantics space. Don't worry about that. They then go through a bunch of transformer layers. Uh each layer has two sub layers, an intention sub layer and a feed forward neural network. And out of the end of those layers comes some information that goes into an output head that selects what word or part of a word to output next. So that's the it's kind of this linear.
Structure is the architecture of a large language model. So the way you train a large language model is you give it lots of real existing text. And what you do is you knock words out of that text, you have it try to predict the missing word, and then you correct it.
to try to uh make it a little bit more accurate. If you do this long enough on a big enough network with enough words, this process, which is called pre-training, produces language models that are really good at predicting missing words. And to get really good at predicting missing words, they end up encoding into those uh feed feed forward neural network layers within their architecture.
lots of knowledge about the world, a sort of uh how things work, different types of tones. They get really good pattern recognizers, really good rules. You actually sort of implicitly Emergently and implicitly within the feed forward neural networks in the language models, a lot of sort of smarts and knowledge begins to emerge.
That's the basic idea with a large language model. So the large the AI companies, their their bet is if these things are large enough and we train them long enough, uh, and then we do enough sort of fine-tuning afterwards with post-training. You can use a single massive large language model as the digital brain for many, many different applications, right? So when you're talking with a chat bot
It's referring it's referencing a the same large language model that your coding agent might also be talking to to help figure out what computer code to produce. It'll be the same large language model. that your open claw personal assistant agent is also accessing. So it's all about one HAL 9000 style massive model, massive large language model.
that is so smart, you can use it as a digital brain for anything that people might want to do in the economic sphere. That is the model of companies like OpenAI and Anthropic.
¶ LeCun's Modular AI Alternative
All right, so what is Jan Lacoon's AMI labs doing differently? Well, he doesn't believe in this idea that having a single large model that implicitly learns how to do everything makes sense. He thinks that's gonna hit a uh dead end. That's an incredibly inefficient way.
uh to try to build intelligence. And the intelligence you get is going to be brittle because it's all implicit and emergent. So you're going to get hallucinations or sort of odd flights of uh responses that really doesn't make sense in the real world. So what is his alternative approach? Well he says instead of having just one large single model.
He wants to shift to what we could call a modular architecture, where your digital brain has lots of different modules in it that each specialize in different things that they're all wired together. Let me show you what this might look like. I'm gonna bring on the screen here. A key paper that Lacoon published in 2022 called A Path Towards Autonomous Machine Intelligence. This has most of the ideas that are behind AMI labs.
Um this paper has this diagram here I have on the screen. Uh it's an example of a modular architecture. So he imagines an AI digital brain now has multiple modules, including a world model. which is separate from an actor, which is separate from the critic, which is separate from a perception module, which is separate from short-term memory, which is separate from an overall configurator that helps move in information between each of these different modules. So you might have, for example,
The perception module makes sense of input it's getting maybe through text or through cameras if it's a robot. It passes that to an actor, which is going to propose like here's what we should do next, but then the critic is going to analyze its different options.
using the world model, which has a model of how the relevant world works to try to figure out which of these options is best, pulling from short-term memory, then the actor can choose the best of those options, which then gets executed. So it's a much more of a we have different pieces that do different things. Now another piece of the Young Lacoon image is that you can train different modules within modular architecture differently.
Again, in a language model, there's like one way you train the whole model and all the intelligence implicitly emerges. In Lacoon's architecture, he says, well, wait a second. Train each module with the best way, uh with whatever way makes the most sense for what that module does. So like the perception module, let's say
It's making sense of the world through cameras. Well, there we wanna use a sort of uh vision network that's trained with sort of like classic deep learning vision recognition of the type that, you know, Lacoon actually helped pioneer back in the 90s and early 2000s. But then the world model, which is trying to build an understanding of how the world works, he's like, oh, we would train that very differently.
In fact, he has a particular technique. So if you've heard of Jeepa, G E P A, Joint Embedding Predictive Architecture. This is a new training technique that Lacoon came up with for training a world model where at a very high level he says, here's the right way to do that. Don't train a model that tries to understand how a curr a particular domain works. Don't just train it with the low level data, like the actual raw words from a book or raw images from a camera.
What you want to do is take these real-world experiences and convert them all to high-level representations and train them on the high-level representation. So like I'm simplifying here a lot. Well let's say you have as input a picture of a baseball about to hear hit a window and then a subsequent picture where the window is broken.
You don't want to train the world model, he argues, just on those pictures. Like if I see a picture like this, the picture that would follow is one where the glass is broken. That's how maybe something like a uh a standard LLM style generative picture generator might work.
He's like, instead, take both pictures and have a high level representation. So it's like a mathematical encoding of like a baseball is getting near a window. Like what actually matters? What are the key factors of this picture?
And then the next picture is the window breaks. And what you really want to teach the model is when it has this high level setup, a baseball's about to hit the window, it learns that leads to the window breaking. So it's not stuck in particular inputs, but learning causal rules about how the relevant domain works.
Anyways, there's a lot of other ideas like this, the critic and actor that comes out of RL, reinforcement learning worlds, um, as sort of well known. You've you've trained one network with rewards and another one to propose actions. And so there's a a lot of different ideas coming together here. The third piece about Lacoon's vision that differs from the big AI companies is he doesn't believe in having just one system that you train once.
And is then the digital brain for all the different types of things you should do. He says this architecture is to write architecture for everything, but you train different systems for different domains. So if I want a digital brain that we can build computer programming agent tools on. I'm gonna take one of my systems with its world model and perception and actor and critics, and I'm gonna train it specifically for the domain of producing computer programs.
And then all my computer programming agents that people are building will use that particular system. But if I want to do uh help with call centers or whatever, I might completely train a different version of the system. just to be really good at call centers. So we don't have just one massive Hell 9000 that everything uses, which is the OpenAI plan or the Anthropic plan.
We custom train systems that maybe all use the same general architecture, but we train them from scratch for different types of domains. You're going to get much better performance out of it. All right, so that is uh Jan Lacoon's vision.
And he says this is how you're gonna get uh much more reliable and smart and useful activity out of AI. This idea that we're just gonna train like a massive model that can do everything based off of just text. He's like, come on, this makes no sense. That can't possibly be
the best, most efficient route towards actually having smarter AI. All right. So that is the key tension between the existing AI companies and Jan Lacoon's idea. This brings us to our second sub question. How is it possible that Lacoon
¶ Deconstructing LLM Progress: An Illusion?
Could be right that LLMs are a dead end if we've been hearing nonstop in recent months about how these LLM based companies are about to destroy the economy and change everything. How could we be so wrong? Lacoon is not surprised by that. I think there's if we asked him, I'll simulate Lacoon, if we asked him, he would say the short answer to that question is look, a lot of coverage of LLMs recently have been a mixture of hype.
And confusing the specific LLM strategies of the frontier companies with the idea and possibilities of AI more generally and kind of mixing those things together. Which is fine if you're Sam Altman or or Dario Amade, that's great for you because you need investment. But it's probably not the most accurate way to think about it. Now, if we ask Lacoon in this hypothetical to give a longer explanation about how we could be so wrong about L LMs.
He would probably say, Okay, let me let me explain to you the trajectory of the LLM technology. In three stages. And I think this will clarify a lot. All right. So the first stage was the pre-training scaling stage. And this is the stage where The the AI companies kept increasing the size of the LLM, so how big those layers are inside of them.
the size of the L LMs, the amount of data they trained'em on and how long they trained'em. And there was a period starting in twenty twenty and lasting until twenty twenty four. Where making the model bigger and training them longer demonstrably and unambiguously increased their capabilities. This petered out after about GPT four. After about GPT four, open AI, um, we have evidence that XAI had the same issue. We have evidence that Meta had the same issue.
when they continued to make their models bigger, they stopped getting those big performance jumps. So they couldn't just scale them to be more capable. This led to stage two, which I think of a starting in the summer of twenty twenty four, which is where they shifted their attention to post training. So now like we can't make the underlying smarts of these L L Ms um better by making them bigger, training them longer. So what we need to do is try to get
more useful stuff out of these existing pre-trained LLMs. And so the first approach they came up with, and we we saw this with the the alphabet soup of models that were released starting in the fall of 2024, 01, 03, Nano Banana, like all these type of names. The first approach they tried was um Telling the models to think out loud. So instead of uh just directly producing a response, they
post-train the models to be like actually explain your thinking. And it was sort of a way because remember it's auto-regressive. So as the model sort of explains its thinking, that's always going back as input into the model and it gives it more to work off of in reaching an answer. So it turned out if you had the model think out loud, you got slightly better on certain types of benchmarks. So these are the so called reasoning models.
Um but it was a bit of a wash because this also made it more expensive to use the models because it burned a lot more tokens because the answers you had it it produced a lot more tokens to get to the answer you cared about. So It did better, but it was unclear like how much of that we actually wanted to turn on for users. Um, the second approach they used in the second stage was post-training.
So now if you have, for example, a lot of examples of a particular type of question, prompts, correct answers, prompts, correct answers, you could use those combined with techniques out of reinforcement learning. To nudge the existing pre-trained model to be better on those types of tasks. So we entered this stage, stage two of this sort of post-training stage. Where because we couldn't make these uh LM brains fundamentally smarter.
We wanted to try to tune them to get more performance out of them on particular types of tasks. This is when we began to see less of just, hey, try this model and it's going to blow it, blow your socks off. And we instead got lots of charts.
of inscrutable benchmarks. Look, the the chart is going up on this alphabet suit benchmark because, you know, you could post train for particular benchmarks. Uh it was less obvious in a lot of use cases for the reg regular user that like, well, the underlying smart seems to be the same. We then entered a stage three. I think this started in the fall of twenty twenty five. Where the L L M company said really the big gains going forward is in
the applications that use the LLMs. Let's make these applications smarter. So it's not just how capable the LLM is, it's like how capable is the programs that are prompting the LLM. Let's make those smarter. So we saw a lot of this effort going into the programs that are called coding agents that help computer programmers edit and produce and plan computer code. Now these type of agents had been around for many years.
But they got really serious a lot of the AI companies, especially uh last year coming into the fall of last year. And how do we make the programs? They weren't changing really much the LLMs. They did some fine-tuning for uh programming, but really the big breakthroughs and coding agents were in the programs that called the LLMs and they figured out
How can we make these coding agents capable of working with enterprise code bases? So uh not just for individuals vibe coding web apps, but something you could use if you're a professional programmer in a big company. All of that's tool improvements. making sure that you're able to send better prompts to the LLM when you hear about things like skill files and
um managing like hierarchies of agents. This is all improvements in the programs that use the LLM, not these none of this is breakthroughs in the digital brain itself. And so this is the stage that we are in now. is we're spending a lot more time building smarter programs that sit between us and the LLMs that they're querying as their digital brains so that it's in it in very particular domains, it is more useful.
So this all tells us, right? This is like what Lacoon would tell you, right? I'm I'm channeling Lacoon. He would say once you understand this reality, you see that this impression that LLM based AI has been on this super fast, like upper trajectory of lots of fast advances is pretty illusory. The fundamental improvements in the underlying brain stopped a couple years ago.
What we saw was then a period of lots of brag bragging about benchmarks doing better, but this this was all about post-training. And now for the last four months, like all these improvements we've been hearing is about the programs that use the LLMs. are being made uh smarter and they're better fitting particular use cases.
But there really hasn't been major fundamental in uh improvements in the underlying smartness of the digital brains, which is why all the problems like hallucinations and unreliability persist. The brains are actually Incrementally improving. Either in narrow areas.
um or in narrow ways. And it's what we're building on top of them that's creating an illusion of increasing trajectory of artificial intelligence when in reality we might just be in a very long tail stage of now we're gonna do product market fit and actually build Do the work of building more useful products on top of a mature digital brain technology that's only advancing at a very slow rate. That would be Lacoon's argument.
Uh, therefore we will find some good fits, but this is not a technology that's on a trajectory where it's going to be able to make massive leaps in what it's actually able to do.
¶ Near-Term AI Landscape Shifts
So there you go. That would be the argument for how we could have gotten LLM progress so wrong. All right. Sub question number three. Let's follow through this thought experiment. What would happen? If Lacoon is right about that. What what would we expect then to happen in the near future? Well, let's start with the the window of the next one to three years. If he is right.
We would see a long tail of applications based on existing LLMs to begin to fill in. So computer coding agents have gotten more useful. We will see other UK use cases like that that don't exist now, but where people are really experimenting to try to figure out applications that are going to uh work in other types of fields. So there'll be sort of clod code moments in other fields. Which I think will be useful and exciting.
Um the tool sets used in mini jobs will change, but uh Because we're now just trying to like find areas where we can build useful applications on top of existing LLMs. These doomsday scenarios like we we've been talking about on this AI reality checks recently, where knowledge workers are gonna have to become uh pet masseuses and then uh after that they're gonna have to cook the pets on garbage can fires because there's no money left in the economy.
None of those scenarios are s are what would uh unfold based on LM's in this current vision. There would be a big economic hit because what we're gonna if we've shifted our attention to building better applications on top of the LLMs. What we're gonna see is a lot more companies get into that game and they're gonna say, I don't wanna pay for a cutting edge frontier hyperscaled LLM. It's too expensive.
Let's look at cheaper LLMs. Let's look at open source LLMs. Let's look at LLMs that can fit on chip. We saw this already with the OpenClaw framework, which allowed people to build their own custom applications that use LLMs to do personal assistant type roles. And right away people are like, I don't want to pay all the money to use Clot or GPT.
And you saw an explosion of interest in on ship machines and open source machines. All this is going to be, I think, good news for the consumers. That means we could have more people building these applications. There'll be more variety of these applications and there'll be cheaper. It's bad news for the stock market because we've invested, depending on who you ask, somewhere between four hundred to six hundred billion dollars into these L L M hyperscalers like OpenAI and Anthropic.
That market's not going to support it. So there's going to be a big crash. This will probably temporarily slow down. If this vision is correct, it would temporarily slow down AI progress because investors are going to feel burnt.
¶ Long-Term Modular AI Vision
All right, what's gonna happen now if we zoom out to like a a three to ten year range? Um, that's roughly the range in which the modular architecture approach that Lacoon is talking about would reach maturity. That's what their current CEO is saying. Again, it's it it's a research company now and they said it'll be several years until we really get the products that are ready for market.
If Lacoon is right, what we're gonna see is domain by domain, you're gonna have these uh very bespoke trained domain specific modular architecture systems, which if he's right. are going to be way more reliable and more smart in the sense of like they do the thing I ask them in a way that's good and as good as like uh uh some of my human employees and in a way that I can actually trust.
We're going to see a lot more of that. What's promised with LLMs, we're going to see instead on that three to three to ten year basis if Lacoon is right. Because they're uh based on this module architecture, I think these systems will, you know, they'll be more reliable. Um, they're also gonna be easier to align. LLMs are so obfuscated. It's just like here's 600 billion parameters in this big box that we trained for a month on all the text on the internet. Let's just see what it does.
Module architectures are way more alignable. Like you have literally a critic module in there that evaluates plans. Based on both a world model and some sort of hard-coded value system, to say which of these do I like better? And you could just go in there and sort of hard-code, don't do these type of plans, you know, uh really have a low score for plans that lead to whatever, like a lot of variability in in outcome or something like that. You have more direct knobs to turn.
So it does make alignment more easier. Um, they would also be more economically efficient because when you're when you have to train one module long enough, one model long enough that could do everything, it has to be huge and it takes a huge amount of energy. But when you're training different modules in a domain specific system, these can be much smaller. I like to point out the example of a deep mine, a Google Deepmind tool called Dreamer V3.
Which can learn how to play video games from scratch. It's it's famous for figuring out how to find diamonds. in Minecraft. And it uses a modular architecture um very similar to what Lacoon is proposing here. And we just read a paper about it in my doctoral seminar. I'm teaching on superintelligence right now. Dreamer V3, which can play Minecraft it well better than if you ask an LLM to do, right? It's domain specific.
requires around two hundred million parameters, which is a factor of ten or less than what you would get in a standard L LM. It could be trained on a single GPU chip. And it could do this domain way better than uh a frontier language model, which is significantly longer and trains significantly more exhaustively. So there would be some advantages here. There would also be some there's a little bit of digital ick around this world because
¶ Concluding Thoughts and Predictions
Uh way more so than LLMs. Again, these domain specific models might actually have more of a displacement capability. So we'd have to keep an eye on them. All right. Conclusion What do I think is going to happen here? Well, You know, I don't know. Right? It's possible that there's more performance breakthroughs to get on LLMs and we're going to get more useful tools. A gun to the head, if I had to predict.
you know, through my computer science glasses, Lacoon's modular architecture, it feels like that has to be the right answer. I I I think of this doubling down on LLMs. Is we're gonna look back at this like an economic mistake. It was the first really promising new AI technology, uh, widespread AI technology built on top of deep learning. And it did cool things. But instead of stepping back and like, okay, what will this be good for?
And what types of domains might we want in different models? We said, no, let's just raise half a trillion dollars and just go all in on everything, text-based LLMs, which are trained on text that are made to produce text. All artificial intelligence will run off of these things. I just think when we zoom out on the 30 year scale, we'll be like, that was so naive.
This idea that like this was the only type of model we need for artificial intelligence. It's super inefficient for like ninety-nine percent of the domains we want to use. It's great for text-based domains and computer programming, kind of. The planning is a little suspect, but the code production is okay.
But we're gonna make all intelligence based off just massive LLMs and there'll be like four of them, like four companies that have like massive ones and that's it. That this can't be the right way to do it. So my my computer science instincts, a modular architecture. It just makes so much more sense. Domain specificity, differential training of modules.
You have much more alignment capability. They're much more economically feasible. Like it just feels to me like that probably is going to be the right answer. Which means we're gonna have to have some bumpiness in the stock market because I don't think that if this is true, the hyperscalers as now either they have to pivot to those quick enough before they run out of money, or some of them are gonna go out of the business and the others are gonna have to collapse before they expand again.
So I think the modular architecture approach will work better. I don't know if Lacoon's company is gonna do it or not, but I think that architecture, it makes a lot of sense to a lot of computer scientists. Now I hope they don't get too much better. Because I'm much more I can much more imagine a very trained modular architecture AI digital brain creating justified ick than I can building these Python agent programs that access some sort of massive LLM somewhere.
All right, so yes, we'll know, I think within a year we'll begin to get a sense of which of these trajectories is actually true. Um, I of course will do my best to keep you posted here on the AI reality check. All right, that's enough computer science talk for one day. Hopefully that made sense. Hopefully that's useful. Be back soon with another one of these checks. And until then, remember take AI seriously, but not everything.
