Rises rising. Futures ignite. With the power videoing, heart thinking breaks strong. The future we met.
Welcome to the last week AI podcast where you can hear a chat about what's going on with ai. As usual, in this episode, we will be summarizing and discussing some of last week's most interesting AI news. And sometimes it's last last week's actually lately, because I'm sometimes late to releasing these. Hopefully this one is out just a couple of days after the previous episode.
But anyway, you can also check out our last week in dot AI text newsletter with even more articles and all the links to all of this stuff. Uh, if you also like to read. These yourself. I am, uh, one of your hosts as usual, Andrei Kurenkov. I study the AI at a university and I now work in a startup. I'm your other host as usual,
uh, Jeremy Harris, uh, at CloudStone AI. Uh, and I guess one thing, you know, if to the untrained eye, it may appear that I'm sitting here in a half unpacked, living room wallowing in a pool of my own filth, which is not quite what's happening.
I mean, the second part is certainly what's happening, but, um, we're, uh, yeah, we're just in the middle of unpacking after a move and we've got a bunch of, um, we're actually doing pretty well, but, uh, but this is what's kind of left over for right now. And I'm testing out this new standing station. So hopefully the sound sounds good. Hopefully. Um, I mean, you're going to have to put up with my face. That's part of the package, but like, other than that, hopefully the visual is okay.
And, uh, you'll probably see, I got to fix the lighting or something. I think you'll see me flipping back and forth to different tabs and screens and such. Um, but, uh, hopefully it's not too distracting. So that's my story.
It's, uh, you know, going to be starting to get the setup going. I think regular listeners have gone through quite a journey, uh, in Jeremy's life this year. It seems to have been a pretty eventful year. So that's cool. Uh, nice that hopefully you're starting to settle down. Before we get into the news, as usual, do want to acknowledge some comments from listeners. We had a couple on the YouTube, which is fun. Uh, one was not AI Kyle, which is a fun handle.
Uh, just saying they love a podcast, which is always nice to hear. One other comment on that YouTube video, which I found interesting, there was a question on what do we think about the quantum chip willow, which, uh, confirmed, according to this commenter, the multiverse theory, which I didn't know existed. Do not know about, but I did read about this advancement from Google. Uh, I believe Willow is from Google, right? And, uh, it certainly seems very exciting.
Uh, but I will admit that I don't know very much about the implications for AI. And I haven't seen many people discussing that in general. My impression is other, uh, kind of chip architectures seem to be what people are banking on. There's not much kind of expectation that quantum computing will play a big role in the next, I don't know, decade.
Yeah, it all, as ever, depends on your timelines. Um, there's, I think, a couple little interesting, I actually had, I included an article, like a willow sort of link, um, in my own personal notes for, I think, last episode, and I ended up not including them in our, uh, in, you know, my, my list of suggested stories. Uh, just because it seemed just a little peripheral, but I do think it's relevant enough that it's worth a little mention here as, as you say.
Um, first of all, you know, it's not the case that quantum computers just uniformly accelerate any kind of AI algorithm. I think we've talked about that quite a few times on the podcast before when we've talked about, quantum machine learning, there are quantum machine learning algorithms that do get a big speed up.
And what you tend to do there is you'll kind of try to re architect your problem, your, your architecture, your, your model, uh, and your, your optimizers, all that stuff to make it quantum compatible. So you can actually benefit from the speed boost. In practice, one way that you can do this, for example, just to very Um, very kind of high level hand wavy way of explaining this.
One of the things that quantum computing is really good at doing is solving problems that look like the traveling salesman problem. So you imagine you have your traveling salesman, you have 10 different locations you have to hit up. What is the most efficient route to go through all those 10 locations that has the shortest travel time? And the reality is classically with standard computers, you basically have to do this through janky kind of trial and error. There's no clean way to do it.
Quantum computers. Very roughly, they can sort of reach into a large space of solutions and pull one out, pull the optimal one out and solve those things in one shot. So there's certain kinds of problems like that that benefit from quantum speedup, others that don't. So the question in quantum machine learning is how do you recast your problem into that shape?
Um, there are, there's actually a result we talked about last week that, that suggests, I think that there's some, Um, interesting ways in which, uh, you could see sort of more standard machine learning, um, like, uh, transformer type models take on that shape more and more smoothly. I think that's especially for agentic systems. That's really important. Um, I think the timelines on this are very uncertain. The big breakthrough here is the quantum error correction mechanism.
It turns out that it's really, really hard to just keep these tiny, tiny particles that are actually going to be used to do your computations in this pristine, what's known as a coherent state such that they can actually have Quantumness. So you can benefit from the quantum advantage that they have. Um, if so much as like a, a, a photon of light interacts with them, right? The, it'll knock them right out of their quantum states. You'll lose all the coherence of your calculation.
Everything goes to shit. And so really the name of the game in quantum computing is either find a way to perfectly isolate your, um, your qubits, your, your, your quantum bits from any outside interaction while they do those, those calculations, or find a way to correct for, uh, what are known as decoherence effects. Correct for that stray photon that, you know, that stray atom that bumps into things and knocks things out. Quantum error correction is that. So in practice, you need a bit of both.
The question is where exactly do you draw the line? What's the sweet spot and, and the balance of those two things? That's what that is. Um, we've known about the quantum, error correction breakthrough. It's actually quite a few months old. So that's not new. What's new with this paper is the experimental demonstration with a certain number of qubits that they've been able to throw together.
And there's debate over as ever, whether that's a real breakthrough, a real so called quantum advantage or not. And you can go down the rabbit hole. Um, the multiverse thing is interesting. I'll just say, I don't think it's as clear cut as they say in the, in the paper. I'm a multiverse guy personally. That's what I did my, my PhD in. Um, that was kind of like my, uh, my preferred interpretation, if you will, of quantum mechanics.
Um, but the reality is These tests don't actually, what they do is they disprove, or they, they kind of provide evidence against a particular interpretation called objective collapse. But they don't actually provide evidence against other competitors of multiverse interpretations. Notably, de Broglie Bohm theory and, uh, the Copenhagen Interpretation, things like that. Anyway, so. That's it. If you know, if you know, you know, if that was all just delete it from
your rim and we can move on. That's right. I'm also a multiverse guy, by the way, I think that's a team you should be on when it comes to interpretations of quantum mechanics. So I didn't see your tattoo. That's okay. Yes. Uh, and yeah, I can just add a little bit, uh, from what I read on it, this is kind of following up on the news, going back to years to, I think 2019 Google had.
Already previously had progress on it, scaling up a number of qubits at the time of a demonstrated, as you said, I think quantum advantage, sometimes it's called quantum supremacy where you can take an algorithm that's like a million times better in quantum than in traditional computing. Now that result is kind of. been eclipsed, you can actually do better with traditional computing than previous demonstration.
So the cool thing in this one is you are continuing to see the trend of the number of cubits scaling. And also there's another demonstration of true quantum computing, you could say, but it's very far from practical computing. You can use for what
we sort of do with computers. Yeah, if your AGI timelines are like sometime in the next five years, I don't think that this will radically change them, but you know, we can be surprised. Uh, I, yeah, I'm not knee deep in quantum computing. I'm just highly peripheral. So
yeah, once, once we have AI that can do research and science, maybe you'll just figure it out. Who knows? Well, that's it for the comments. Let's do a quick preview of what we'll be seeing in this episode. First up in tools and apps, it's been another busy week, but this time, a less so from open AI. They still have had some of that shipment stories, but Google really took a limelight of this, uh, last week. So we'll be discussing a quite a few stories from them and, uh, kind of not.
Going into the OpenAI announcements, we've been on the smaller side relative to previous ones. Applications in business as ever, a lot of drama with OpenAI and a lot of developments with Compute. There's some cool open source developments this week, uh, new models and, uh, Related to that, we'll be seeing some discussion about the general trend towards smaller open source models from, uh, some research and also some research on alignment and different tokenizers in, uh, transformers.
And finally, in policy and safety, a lot of the usual sort of stuff with China export restrictions and some deals going on in the U S government. One last thing before we get there, as usual, we do need to acknowledge our sponsor, which, as has been the case lately, is The Generator, Babson College's interdisciplinary AI lab focused on entrepreneurial AI.
Babson is a number one school for entrepreneurship in the U. S. for quite a while now, and last fall, professors from all across the U. S. The university partnered with students to launch this interdisciplinary lab for generator. There are eight groups, so they do stuff like AI entrepreneurship and business innovation, AI ethics and society, the future of work and talent, and so on. They are training the entire faculty in AI awareness. And they love a podcast.
So they've been supporting us in a, in a very generous way. So yes, the generator seems cool. And, uh, I guess now, you know, and on to tools and apps, we begin, as I said, with Google and the news that Google is releasing its own reasoning AI. Model. So just recently, we've had a Gemini 2 flash, which was in its own right, kind of a big story. Uh, Gemini 2 had really good benchmark performance, uh, kind of eclipsing even Gemini 1. 5 Pro. What's a pretty big deal.
Now there's an experimental reasoning AI model. They call it Gemini two flash thinking experimental, not a great name, but it is available on its AI studio platform for testing. It's a trained as with other reasoning models to use, uh, thinking to use.
Things like chain of thought thinking so that instead of just doing sort of the input to output mapping that you would have with a traditional model, uh, instead of just being trained on autocomplete and alignment, it is trained on some secret additional data that makes it good at You know, actually outputting reasoning, uh, built in to be able to answer trickier questions like, oh, one, uh, there are not, you know, too much from this yet.
It just happened, but, uh, certainly again, uh, Google has been doing a lot of announcements to compete with open AI as we'll see. And this is another one of those. Yeah, and they give a bit of a
demo as to what the model can do is sort of a video where they showed a bunch of billiard balls with numbers on the billiard balls, and it says, you know, how would you combine these numbers together to get to 30 type thing? And, you know, reasons through in a way that's by now fairly familiar. If you're looking at inference on compute a lot, Um, in the oh one style. So the claim here is, and this is a quote from, um, I think it was Jeff Dean.
Uh, so we see promising results when we increase inference time computation. So certainly the implied claim here is yes, we are sealing, seeing inference time scaling laws. Um, they are leading to, to concrete results and they're describing this as the first step in a reasoning journey. Okay. No surprise there either. Right. So this is all basically, uh, this is opening eyes. Oh, one, but for. Um, but for Google, for, for Google deep mind.
So not a lot of data in terms of what the scaling curves actually look like. We do know according to a Bloomberg report that was later, um, backed up by the information that Google has a whole bunch of teams working on these reasoning models, apparently at least 200 researchers focusing on this. So big effort, but. Google, of course, much bigger, kind of more, uh, bloated company than open AI.
So that might, uh, you know, that has been an issue in the past in terms of shipping velocity, though recently they've been turning things around a bit. Um, yeah. And then it sort of ends with a bit of a letting the air out of the tire moment where, you know, Uh, the, uh, the journalist in the, in the article says, well, you know, all this is what's supposed to happen. Describing the reasoning chain, blah, blah, blah.
But when he asked Gemini 2. 0 Flash thinking experimental and draw breath, um, how many R's are in the word strawberry? It said two, right? That famous strawberry test. So, um, it kind of, uh, kind of amusing still, still the strawberry problem persists here. Not an issue for O1, but, uh, an issue for this model, who the hell knows, you'll just have to see it in action. And for that, we'll have to wait for the wider availability.
So yeah, I think the impression so far is that it still has some kinks to work out, but it does have some pretty interesting differences from 0. 1. For one, you will be able to see its thoughts. There's like a drop down, uh, display where you can actually see what it's outputting. 0. 1, as we've discussed, actually hides all this output from you, which can be maybe frustrating if you are relying on it.
Also, it's coming out with support for image uploads, which I believe all one didn't support initially. Not sure if it supports it now. And, uh, one surprising thing, it supports just 32, 000 tokens for input and 8, 000 tokens for outputs. So. Pretty small, that's still a decent amount, 50 to 60 pages of text. But as we've discussed, LLMs these days can support, you know, twice, three times, four times, 10 times. Uh, that amount. So, uh, yeah, it is experimental, you know?
Yeah. It also makes me wonder, you know, how sustainable opening eyes, um, hiding of the reasoning traces will be. Um, obviously the reason that they're doing that is as we've discussed, like it's going to be because those are things you can use to train your own model as we've seen so often, right. People distilling GPT 4. 0, Um, GPT four turbo, uh, previously to like make really powerful, small models that then compete with those bigger models and erode opening eyes margins.
Presumably they're worried about the same thing happening with, um, uh, with, uh, with this model. And also, right, we know that for example, anthropic, at least according to reporting from semi analysis is using, um, Opus 3. 5 potentially to, uh, if I recall to, um, uh, sort of do. Training for, or to generate synthetic data for agentic models, right? So there's always this, this challenge.
If you, if you actually release this stuff in the world, but as soon as you have one company like Google DeepMind or Google coming out and saying, Hey, you know what, we're going to show you the reasoning traces. Sure. Those reasoning traces may not be as good as opening ISO one, but now you kind of enter this zone where if you're working in any kind of high stakes application, right? Medicine or insurance or whatever you have to make sure it.
That you can audit the reasoning trace, the reasoning trace summary that open AI provides may not be enough for you, right? You may need to be able to see that reasoning trace and this starts to make products like Google's here look a lot more interesting. And so, um, I, I wonder how long that'll, that sort of moat will hold. And, um, uh, it's possible that OpenAI just wants to get us, you know, a lead time advantage and they're happy with that.
But, uh, I would imagine there's going to be a bit of a race to the bottom on revealing, uh, the sort of inference traces in the future.
And next, another story in Google, quite related, actually, there's now another option you can do in the actual Gemini app. So without going into, uh, the AI studio platform, but you have to try the experimental thinking mode in, uh, this is on the Gemini web, uh, application, not on the. For an application. And finally, it is deep research. So you can now toggle an option to use NLM with deep research. And it's seems quite similar to, uh, uh, Chajabati or search. So it will.
After you make a query, it will go and do what they say is a multi step research plan, look up some relevant documents, I suppose. And then they say we'll make more refined analysis and generate a report of key findings. So you can think of it as a sort of research assistant that can look up Uh, data and so on to answer your question. Well, and similar to other kind of search, uh, products out there, perplexity, Chad, GPT, I guess the idea is that.
It can now answer much more complex questions about trends in the market, for instance, or, uh, recent news that is other LLMs about this sort of stuff cannot
handle. Yeah, I think there's, there's a lot actually quite interesting here, even from an AGI standpoint with this kind of product line, right? You, you start to think about, okay, well, now Google's putting out this research assistant. We've talked at length about how the game plan for superintelligence for. Whether it's Google DeepMind, Anthropic, OpenAI, they all kind of look the same.
It's all about how can we make the automated, as Leopold Aschenbrenner would say, the automated drop in AI researcher, right? Um, how do you get there? Well, presumably it goes through a path that looks a lot like this.
One thing that's going to happen here is Google DeepMind will collect a whole bunch of information about, The success of research plans and the success of executed research plans, uh, which they can then feed back into their own systems to like optimize the research process here is some subset of this research will even be AI research. In fact, I might imagine it will be disproportionately that because that'll be who knows about these products first and who tends to be the early adopters.
And so this is a way to kind of close a bit of a feedback loop that allows you to bootstrap your way to that automated AI researcher, automated AI R and D faster. So while this may seem superficially like a. A product launch meant to grab at a market. Um, I, I would not sleep on this one. I think this is an interesting data collection strategy for just that. And, you know, we'll see if it factors in, but, um, that's one piece.
Another, you know, obviously people flagging has changes in a pretty fundamental way, how you interact with the internet, right? So when you just like use deep, you know, deep research to, you know, Do research for you. You're not actually going to websites. And so, uh, already apparently AI overviews, which does something similar, right?
When you now Google something, you sometimes get this AI overviews thing that summarizes the attempts to answer your question directly based on all kinds of content. Excuse me. It's reviewed on the internet. Apparently, publishers have already seen a 5 to 10 percent decrease in traffic from search since AI overviews launched. That was earlier this year. So, this is already a pretty big, kind of big hit for a lot of these, uh, these, um, websites and, and publishers.
Apparently, New York Post has an estimate that, uh, it's about 2 billion in losses that could translate into for publishers. That's pretty remarkable. Um, on the back of this shift, it's also, I think, a fundamental challenge to, to, to Google, right? They've so, um, entrenched this, this idea of the search.
Revenue model and and all their optimization is done around that has been done around that they'll do very well I'm sure in whatever the new paradigm is but the fundamental challenge for them is when the paradigm shifts incumbents Often have to do as much catching up as new entrants So this does create a an opportunity for for some shifting in the market Um, but in any case, I think a really interesting launch, um, interesting strategically from a super intelligent standpoint and interesting in
terms of its effects on on publishers and websites because this is all very new stuff. It changes the dynamics of search in a big way.
Yeah. And it's, I think also interesting kind of to think of this as something alongside AI overviews, right? Cause, uh, Google has already built in a sort of AI search. Uh, so this is that, but much deeper at, uh, Is almost like, uh, like Devin, these coding agents. So it's a research agent as we call it. It will show you a plan of what is it tends to search for. You can actually edit a plan, uh, remove or add steps or revise them.
And then it will take minutes, uh, from what I've read to actually generate the report. So it's not going to replace kind of traditional search where you want the information, you know, in five seconds. But if you want to go deep, it seems to actually be a pretty promising product. And you do have to be a Gemini advanced subscriber to use it. And on to the next big announcement from Google and DeepMind. As I said, this has been a big week.
They have announced Veo 2, uh, which is their competitor to Sora. So it is the same kind of thing, a text to video generation model, and it can create clips over 2 minutes long and up to 4K resolution, which, uh, both of those are, uh, clips in Sora. Now, you can try it through Google's experimental tool, VideoFX, and you will be limited to 720p and 8 seconds, so you won't be able to try that out yourself.
I did see various people playing around with VO2 and posting some links, and it seems to be the consensus that VO2 is at least competitive with the Sora release from last week. If not even better at things like modeling physics, there was a. I think a pretty popular post showing a VO two, uh, a video from Sora and VO two of a person cutting a tomato, right, which is one of these pretty decent, uh, challenges for ai. You need to model the cutting action things falling.
So we saw a video, had the knife just sort of going through a tomato and it's staying whole versus via two, which looked pretty reasonable, looked like pretty much what you would want. So, uh, yeah, another example of, uh, Google very much. Trying to, I don't know, get back into a race, you could say, or, or demonstrate that they are still capable of being a leader and, uh, are not going to be completely trounced by, uh, OpenAI.
Yeah. It also aligns pretty well with, uh, DeepMind's philosophy that, you know, if you think about OpenAI versus DeepMind, and this is caricaturing quite a bit, right? But I, just to give you a vibe over the last like three years, the directions they've been taking, you do tend to see DeepMind focusing more on things like games, Things like multi modality, um, whereas open AI has historically been more of a heads down scale type lab. It's not as we'll discuss a little bit later.
It's not quite just that simple, but but at a high level. And so, um, you know, these kinds of breakthroughs. do potentially tend to advantage DeepMind a little bit more to the extent that, you know, the world models you get from these, uh, video generation tools can actually help you train agents. That's very much a, you know, I mean, everybody's going to be doing it. Everybody's going to be doing it well.
Um, but just if you think about the kind of competencies they've built up in house over the last few years, it is more kind of in that direction, the sort of, um, You know, it used to be like generating game environments, right? And now that's going to be happening over video. So, uh, I think a pretty interesting, uh, development from that standpoint, and maybe not so surprising, they've really chosen to focus on this.
The claim, by the way, that, um, this tool is better than, than Sora is back, apparently, according to Google Elite Mind, by evidence, by actual data, so they did do head to heads. Apparently 59 percent of human readers preferred, um, preferred, um, VO, or VO2, rather. Uh, relative to Sora Turbo, um, 27%, only 27 percent preferred Sora Turbo and then the rest were, you know, uh, unsure. So, uh, pretty, that's a pretty strong, um, uh, pretty strong advantage.
You, you, you know, this is something you commonly see when you look at, uh, LM sys or other kind of head to head matchups with, with large language models. That's a pretty convincing lead. Um, and then the claim is that, um, the only so interestingly cling a version 1. 5, which is from, uh, quite show technology in China, uh, did, it was the only model that, that did better than 50 percent when, uh, when compared to the O2. And that's pretty noteworthy, right?
I mean, that is a Chinese company, um, and, uh, and here they are, you know, kind of. Ahead of the game in some sense, uh, on this kind of, um, uh, video generation stuff. Anyway, we do know that deep mind is applying those synth ID watermarks that are sort of famous now that Google has been investing in quite a bit. Um, and, uh, this is by contrast to opening eyes, Sora, right? They do, uh, A visible annotation in the bottom right corner of their videos.
So you can always see visually that it was a Sora generated thing. Um, instead DeepMind is going with SynthID and, and Sora also has a bunch of watermarking stuff that they do as well, kind of, that are more directly comparable to SynthID, but, um, yeah, so, uh, last thing is about the data that's been used to train this apparently, uh, so we haven't explicitly heard from DeepMind, but, um, you know, hints obviously that YouTube was involved, Hey, it's all under the alphabet umbrella.
So, so of course, you know, all in the family. Um, so expected, you know, YouTube data has been used, uh, for this. I would say for sure. I'd be shocked if it hadn't been. Um, but there you go. Big advantage structurally as well for Google DeepMind is having access to YouTube. Not that that stopped OpenAI in the past, but, uh, theoretically, uh, OpenAI, I guess, should not be able to access YouTube videos. At least it's not clear whether they should or ought to.
I'm pretty sure Google wants it to be clear that they shouldn't, uh, yeah. And it's not super clear how it is a comparison. So it might be the case that they allowed more compute on VO2. Sora Turbo is turbo, so it generates videos pretty quickly versus if you take more time to compute, you could maybe easily make higher quality videos, but either way. There is a wait list and some people have access now and the outputs that people even outside of Google are posting are pretty impressive.
So I think now, yeah, it seems like Google is the first maybe to match, uh, Sora, although now, as you said, Kling is another competitor and we have seen kind of more and more, uh, players in the text to video space. And speaking of text to video and other players, we've got PikaLabs and their 2. 0 generator. So yet another kind of, another iteration of a model. And this one has something kind of interesting, they call scene ingredients.
So you will be able to upload images of people, objects, or the environment, and then the AI combines those into a cohesive animated video with a prompt. So we've seen image to video as another alternative to text to video, where you can post an image and tell the model how to then play it. Make it into a video. This I suppose is kind of a hybrid from text to video and image to video where you give it some visual elements, uh, like let's say a jacket, uh, and tell it to incorporate that.
And it can then, uh, use it in a fully animated, uh, fully generated video. So pretty interesting as a product development and, uh, yeah, we've had a big week for video models.
Yeah, some of the demos, and this is also, I think, a just interesting UX pattern, right? Like, we haven't seen this elsewhere. Some of the examples they provide are pretty cool. You know, like, selfie of a person, picture of a cat, and then you write a prompt, like, a person that's petting a cat, and then you get the video.
Another example that they show, uh, this one from X, so, uh, there's a, uh, uh, sort of selfie of a woman, and then she's combined the famous painting of a girl with a pearl earring Watching a movie in a theater, right? And you can sort of see that it's almost like I'm trying to think of those, those eighties movies where they have a animatronic, uh, like space jam and stuff, you know, animatronic characters and, and human, it's pretty surreal. Um, so, uh, very cool.
And I'm sure there's going to be a lot of interesting stuff done with, uh, with features like this.
And actually going back to Google Next, yet another thing they have announced and that is Project Mariner. So this is their agent to use browsers. DeepMind has announced that they are working at least on something that will be embedded in the Chrome browser and you can tell it to go and do something for you and it's then Sort of browse the web, navigate interactive websites, click, uh, type and so on. This is only released to a small group of testers.
Uh, but it's yet another sort of popular trend and thing we've seen a lot of people, uh, working on. This notion of an AI that can use a GUI for you, that can use your browser. Uh, to do kind of anything as opposed to needing an API or just searching the web. We'll have to see how quickly they can actually get it out because that is a question for me.
Yeah. And speaking of speed, right? There is, uh, apparently this is a very slow agent. That's not, you know, shouldn't be too surprising. Apparently you see, get about five seconds of delay between each cursor movement pretty often. And sometimes they say, you know, the agent will like stop its task and revert back to the chat window asking for clarification about certain items, which. is actually not a bad thing.
Like this is an intentional user experience pattern that Google is trying to bake in here. They understand that, you know, these models are being given agency over your laptop, over your computer. Um, and they could potentially do some, some pretty risky things. So it really seems like a key UX decision they've made here is to default to Going slow, default to checking in, um, and, uh, and, and also to kind of rein in some capabilities. So the agent, for example, can't go to checkout.
Um, it's not supposed to fill out credit card numbers or billing information. It also don't, doesn't do things like accept cookies, uh, or sign terms of service agreements. That all makes sense for fundamental legal reasons, right? You can't have an agent authorized, uh, sorry, an AI agent authorized to be your agent for the purpose of signing documents like that. Or at least, hey, maybe you can. Maybe that's an interesting legal battle that we'll be fighting out over the next few years.
I suspect we will. Um, but, uh, obviously all those limitations are subject to jailbreak, presumably at least. Um, so I would expect once this actually rolls out, you'll see people finding ways around that plenty of the prompter, uh, is probably going to perk his, his, uh, internet head up and do things like that. But one of the mechanisms behind this, uh, behind the scenes, Google is actually taking screenshots of your browser, uh, which is like your browser window itself.
And so that's a new requirement in the terms of service that you have to agree to, um, that you're going to send that over to Gemini for processing. Uh, so that's kind of interesting, right? Like this is more and more intimate, um, exposure to the data on your computer. And it's something that is going to have to shift norms in the space. You know, we're going to have to have, uh, higher security, but also we're going to have to be okay with AI agents doing more and more stuff.
So for now, uh, I think just an iterative step in that direction of agents. I do think it is relevant that this is like, like the deep research tool. Like this is another thing that Gets you as the end user further away from websites, right? This, this steps you further back from actually being like, you know, on, on Fox news or on CNN or whatever, like reading articles and all that stuff like that. You're now moving away. Advertisement is going to, to take a hit on those websites.
Traffic's going to take a hit, even sort of the. The, the, um, uh, loyalty that you have to a certain website, it's layout, it's design patterns, right? All of that is going to take a hit. And, uh, I think that's a really interesting time. I mean, the incentive to spin up a new website that's just pumping out content starts to drop as these sorts of tools come out.
That's right. And I think it's, uh, kind of coming along as. Many players are playing around with these types of agents, not just agents that do reasoning, but also agents that look at your screen and click stuff and put in text, and I think it's a real question as to whether that is a paradigm that will actually be useful or whether websites, for instance, just start exposing APIs that AIs can directly use. Uh, without the need to do what humans do, which is like directly click on stuff.
So, uh, in a way it might be sort of a hack that is not going to be needed. It remains to be seen. And that is it for all the Google News. They really did try to outdo everyone else and have their own little ship mess. And we have one more story from another big player. XAI and X are now releasing Grok 2. It's supposed to be three times faster. And as with, I guess, prior Grok releases is pretty competitive of other frontier models. They are also expanding the footprint of Grok on.
X, uh, slash Twitter. There's now a grok button and it's available to everyone. Free users of grok can ask up to 10 questions every two hours. Uh, so you, it used to be that to get access, you had to pay. Uh, have a premium or premium plus subscription now that is not necessary. You can try it out at least, uh, without paying.
Yeah. I mean, don't, don't sleep on grok, right? They have, they have data, they have distribution and they have, uh, and they have access to compute. So I think this is a really interesting, um, It's an interesting tool. It may or may not be like cutting edge in terms of capability right now.
Yeah, and I think it's, it's interesting that it is built into X slash Twitter. You know, it's still a service that is used by probably hundreds of millions of users. So as far as a chatbot that isn't kind of its own standalone thing. It's up there with something like Gemini as, as being built in, uh, and super accessible and really being promoted by X. And it's an interesting question to me, uh, how many people are starting to use it or starting to discover chatbots because of it.
All right, enough of those kinds of stories, let's move on to applications and business and we begin, as is often the case, with more data centers and supercomputers that are being in the works. So, this time it's coming from Broadcom, they say that free AI supercomputers are in the works and there are clusters. With up to 1 million GPUs being planned for 2027. Uh, and that is what 10X, 5X, the current mega, uh, mega clusters and mega data centers being used by companies like XAI and Meta.
So, it just goes to show, like, people are, Just pumping money into developing the highest capability tech and then just massive compute clusters that would be unimaginable. I think a couple years ago.
Yeah, absolutely. I mean, Broadcom's at the center of the story for a really interesting reason, too. Um, so, as has been reported, speculated, uh, they are potentially partnering with OpenAI to help design the next generation, or really the first generation of OpenAI designed AI hardware. And that's really interesting, right? Because you've, so Broadcom historically, uh, was used by Google in the early days of the TPU, which is Google's own kind of in house special AI processor.
Um, And, um, and so, you know, open AI has been poaching a whole bunch of Google talent, specifically the people who were at the interface between Google and Broadcom when that got started. Um, so very clearly intending, it seems to partner with Broadcom on this. We've heard from Broadcom during this earnings call that, um, it has landed orders from quotes two or more hyperscalers and is in advanced development of their own. Next generation AI XPU.
So when you hear XPU, um, you know, you got GPUs, obviously NVIDIA uses those. Everybody uses those. Uh, Google uses the tensor processing unit, the TPU. Well, you know, open AI is designing a new thing, so it, it may not be a GPU. It may not be a TPU. There's like all kinds of different possibilities. And so they're just calling them, you know, whatever these AI accelerator ASICs. And so, um, yeah, I mean, this is, uh, perhaps pseudo confirmation of the opening.
I Broadcom design partnership also rumored that bike dance might be the other partner here. Uh, that's, uh, collaborating with Broadcom to design their ships. Interesting because bike dance of course is based in China, but won't be able therefore to use the kind of a high end, you know, three nanometer, uh, Process node or five minute over process node at TSMC. So their partnership with Broadcom is going to have to find a way around that constraint.
They're going to have to design a ship that has really high performance without using those nodes. That'll be interesting. And the Chinese ecosystem is really good at doing exactly that kind of thing. Um, yeah, last note is just this kind of confirmation of the 1, 000, 000 XPU cluster right on a single across a single fabric. That's one of the key things that they're flagging here.
So essentially one coherent blob of compute that would be used to train presumably very, very large and powerful models. And, um, and that when you start talking about, you know, like Okay. Uh, 1, 000, 000 XPU cluster just for context, if that's NVIDIA GPUs, if you're talking about like the H 100, something like that, 1, 000, 000 H 100 is about very roughly order of magnitude, a gigawatt of power that you're going to be consuming.
That's really, really hard to find in in the kind of US power grid right now, and especially on the. 2027 timescale that, you know, you're not going to have time to spin up like a new nuclear plant, a new geothermal plant. You may have time if there's deregulation to spin up like natural gas that could happen, but that would be really quick, a really quick turnaround for that.
So essentially what you're looking at here is basically these guys who are putting together these massive clusters are scrounging around for any spare bit of pre existing one gigawatt capacity on the grid. And, and they're going to try to build this out. Uh, we've seen meta, you know, purchase or make plans for a two gigawatt cluster. I think the timeline for that is a little bit beyond 2027, but still in that, in that range, um, Amazon nine 60 megawatts, like sort of in the gigawatt range.
So certainly people making these kinds of moves and it does seem like, you know, 2027 is, is when you might see that, that million, it's crazy to say, right. That million XPU cluster. Um, the, the, sorry, last quick note. When you look at the TPU, much, much more energy efficient than the GPU, right? By, by a sort of multiple factor. We don't know the exact number.
Um, at the least for large clusters, we know on a sort of individual TPU basis, it's something like, if I recall a factor of like, uh, it might even be two X or so. Um, so, you know, uh, one gigawatt can buy you more or less, uh, flops depending on the kind of, um, uh, the kind of hardware that you designed for it. But, uh, Anyway, it's a, I think a really interesting time for this space. These are potentially the, like they could be the AGI clusters.
That's certainly how OpenAI internally talks about the 2027, 2028 clusters. So we'll, we'll wait and see, but, but Broadcom is in the thick of it. And they're not a company that people talk about a lot. I think they're a company people should talk about more. Uh, they're, you know, much more into the bespoke kind of partner with a, um, a model developer or some other company to design bespoke hardware that really meets their needs.
And, um, Anyway, there's all kinds of reasons to think that OpenAI is taking a very particular, um, strategy on, um, uh, on chip design, um, more CPU heavy than the GPU heavy strategy that, for example, Anthropic seems to be pursuing, um, and the Broadcom partnership is, is presumably going to reflect that and help them make that a reality.
And by the way, this is all coming from the Q4, uh, earnings call from the president and CEO. So there were comments basically saying that we are working with customers and those customers seem to be playing, uh, planning over the next three years, uh, to deploy, uh, a lot of these, multi generational AI XPUs. So specifically that. And they believe, Broadcom believes that each of them plans to deploy 1 million XPU clusters across a single fabric. So that's where this information is coming from.
They did disclose during the call that they got orders for XPUs from two hyperscalers. So that could be kind of hinting at the open AI connection and they also developing their own, uh, XPU. So yeah, Broadcom, you know, it's, it's a kind of a niche company you could say, but, uh, certainly seems to be upward with Nvidia as far as companies really benefiting from these AI trends.
Moving on to the next story and back to the legal troubles of OpenAI, which have been, uh, it has been used a lot over the last couple of months. And this time is for an interesting reason. And it's not because of Elon Musk. This time it's because of Meta. And Meta has kind of backed Elon Musk in a way by asking the government to block OpenAI's switch to a for profit.
So this would be kind of echoing or adding on to the current lawsuit that Elon Musk has going, where the argument has been that OpenAI began as a non profit, they are now wanting to switch full to a for profit, not their current like capped for profit structure. And, uh, that was, uh, unfair or misleading. Well, Meta is saying here that, uh, OpenAI doing this could set a precedent for startups to initially operate as non profits to gain tax benefits and investors.
And then later convert to for profit entities. Uh, an interesting argument and the one that like, personally, this seems a little bit cynical. That seems like, well,
what's cynical? How
cynical take is maybe this is not just because they're concerned about the broader market and what other startups will do. Uh, I think maybe a few players are trying to undercut OpenAI with these kinds of, uh, kinds of statements, but. Uh, remain to be seen as to whether this will really matter. My suspicion is, uh, I don't think California will, uh, block open the eye.
Yeah, look, I, I don't think that this is at all a cynical play, Andre. I mean, look, Elon Musk is just, um, you know, entered, uh, the governing orbit of the United States of America. Uh, Zuck is directly competing with Sam Altman increasingly as a number one competitor in this space. Um, Uh, there's, you know, tons of money on the line, tons of compute that stacks up in favor of Zuck making a cynical play. But despite that, I'm sure he's doing this for the right reasons.
I'm sure he's doing it because he fundamentally doesn't know. Um, it's actually, it's, it's sort of funny because the, the actual argument for OpenAI making the switch is quite complex, but we are seeing a, a funny realigning of, of the, the whole AI tech space. So obviously, uh, Zuck and, and Elon were supposed to like do a cage match at some like, I'm, I'm old enough to remember six months ago when they were supposed to like bloody each other's faces up.
So I, I don't, I don't know what happened there, but here, here are some quotes, right? So apparently, uh, opening eyes should quotes, not be allowed to flout the law by taking and reappropriating assets. It built as a charity and using them for potentially enormous private gains says Meta. Um, and, uh, they go so far as to say that, Meta believes Elon is quotes, qualified and well positioned to represent the interests of Californians in this matter.
So very interesting that singling out Elon Musk, I, I, I mean to, to the cynical interpretation here, you could read this as Zuck trying to ingratiate himself with Elon. Now that Elon is sort of like placed the, uh, the, the correct, uh, sort of anticipatory bet on, uh, on Trump getting elected. So now everyone's scrambling. You see this as well with, um, Sam Altman in fairness, right?
I saw an interview he gave recently where somebody asks him about, um, Oh, this thing that Mark Andreessen said about, uh, apparently the Biden administration's trying to pick like two or three companies to win an AI. And they're like, to be honest, I think that's, that's sort of bunk.
But, uh, anyway, Sam says, um, He opens and he tries to slip this by as if it's just off the top of his head and he kind of goes, uh, well, look, I don't think that the Biden administration is competent enough to, whatever. And then he goes on with his answer.
Sam Altman is trying to position himself as if he's been this Republican guy all along, which is sort of hilarious because, you know, You know, the cozying up to Democrats, I got to say, I mean, you know, we've, we've been in these congressional offices quite a bit after the open AI, um, uh, sort of, uh, lay a lobbyists and whatever show up. It's, it's exactly what you'd expect. These are all alliances of convenience and, uh, anyways, that's part of the story here.
It does, there is this interesting debate, right? Can you just flip over to a for profit entity having raised billions of dollars, um, as a nonprofit? This is, it's an interesting problem and Meta goes on to argue that, uh, and this is pulling from their, um, their, uh, uh, sort of statements here.
They're saying that they're enticing, they enticed, quote, investors to, sorry, this would entice investors to launch organizations as nonprofits, collect hundreds of millions of dollars in tax free donations to support research and development, and then assume for profit status. Okay, so no, you know, no surprise there. Um, OpenAI fires back and their counter argument is, look, uh, we are still retaining. The nonprofit entity, right? That is the defense here.
We're not actually like, yeah, sure. We are, um, going to be rotating things into a for profit status, but we will maintain some kind of, um, uh, of nonprofit that and fulfill the fiduciary obligation that it has, um, by ensuring that, you know, we build AGI to benefit all of humanity type thing. What exactly that means seems to be the core of this. I'm not a lawyer, but that seems to me to be at the kind of heart of this issue.
Um, and I'd be really interested to hear, you know, lawyers chime in if they listen to the, the episode, but, um, yeah, it's, uh, it's very unclear whether this can be done. At least it seems that way to me. Uh, I don't know. This is, this is thorny, but certainly meta seems to be aligning itself very explicitly, not just with XAI, not just with, uh, you know, like with Tesla, but with the person of Elon Musk.
I think that's a quite deliberate play to try to appeal to, uh, to, to the individual here and, you know, leave it to, uh, to Zach to, to play that kind of game.
And by the way, this letter is, uh, one that has been sent to the California attorney general, Rob Bonta. So the idea being, I suppose that this person can block the transition. I have no idea if that's even possible, but, um, yeah. And as you said, it's, uh, you can now read the full letter and, uh, it's, it's interesting, but it went out of its way to comment on Elon Musk and his qualifications. Next story, let's continue on OpenAI and Elon Musk. The legal drama continues.
Uh, we've seen kind of a couple rounds of emails that mean OpenAI people and Elon Musk. Going back to 2017, the, the split, the like rift between the two when Elon Musk left OpenAI. Uh, a lot of this lawsuit is going back to those days. And so there have been now, uh, more, uh, more information, more emails and a blog post from OpenAI where they say that Elon Musk was an advocate for a for profit structure, but left because he couldn't secure majority control.
And is now, you know, saying, oh, it's so bad that opening. I was a non profit, a nonprofit that is now turning into for profit. And so, uh, yeah, it's another one of these things where they are trying to make the argument that the, uh, kind of claim of that Elon Musk opposes the transition for, uh, from non profit to for profit is basically false.
That this is instead, you know, kind of a rift to develop because, you know, Musk couldn't control it and, uh, left and essentially, you know, now is our competitor.
Yeah, two big take homes, uh, for me from this, like, number one, it's interesting that OpenAI, we've seen this increasingly with Sam, um, First, there was a phase where like kind of the public veneer started to fall and we had all these reports of whistleblowers saying, you know, they're, they're not living up to their commitments and doing all kinds of stuff that could be dangerous or at the very least, uh, unethical in various ways.
Um, and then eventually the Sam Altman's public persona started to, like, you started to see him. Do things that frankly, you only ever see from, um, like sloppy founders. I like, I'm not going to name names, obviously, but like, I remember seeing this as, um, one of the startups in my batch at Y Combinator that, uh, was ultimately accused actually of being a fraud.
Uh, you, you would see the kind of the, the founder spin out over time with the, the kind of way that, um, Anyway, he, he, he looked to the world for social media. OpenAI at one point, or Sam Altman rather, um, kind of got pretty, um, scrappy.
Like he, he came down to earth for a minute and, and no longer was that kind of like high up figure when he, he, uh, I think he compared like Grok and, uh, to OpenAI's, uh, you know, like whatever model it was running on chat GPT and said like, which one of these was supposed to be the politically biased, like left wing model again? And it was, we talked about it at the time, but anyway, it was this example of Grok spitting out.
One particular output that seemed to be, um, based on context, uh, like kind of politically biased. Uh, that was a really, I think, interesting case of like, it was the first time that they were choosing to get into the fray. And I'm no PR guy, right? Like, we're, we're technical guys here.
We don't know what the hell we're talking about when it comes to this stuff, but It just struck me that in that moment, he sort of shattered that pristine image and you can never really quite go back and now opening eyes choosing to double down on this, kind of releasing these emails, um, really airing the dirty laundry and being kind of upfront with airing it. It's not at all as cut and dry as it might seem either, right?
What's happening here is Elon, yes, back in 2017 is saying, Hey, uh, we need to make this a for profit entity. He's absolutely playing hard ball, trying to get himself to be the CEO and, and have a controlling, uh, interest in the company. But that was 2017. Like Elon is not looking at a company that has like raised the ungodly sums of money that OpenAI had raised and looking at to transform that into a for profit.
I think his counter argument presumably would be something like, well, yeah, there's a difference between I was advocating for a for profit entity back then versus flipping an entity that has raised all kinds of money on the back of goodwill. Innovated and been able to hire on the back of goodwill.
And only now, now that we're like $157 billion valuation on the, at least the, um, for-profit, uh, uh, entity or however it's construed only now, flipping it around that is a material, uh, difference here. And so it's kind of interesting, like, it, it's, it's nuanced and I don't think, uh, I don't think it's as clean as anybody wants it to be, but certainly there's more to it than these, uh, email leaks would make it appear.
And there is a bit of a sort of, uh, airing the dirty laundry feel to this. That's, um, not unfamiliar now, increasingly to open AI, unfortunately, as the brand starts to fall into this like vortex of, you know, when you roll the pigs, you get dirty. That's kind of the vibe that at least I've been getting from this, uh, especially recently.
Yeah. And I think it's, it's interesting that this one in particular, Is the latest in a series of blog posts in which we address this, right? It's not just a legal argument we're making. You can read the entire blog post. It begins with a timeline of events going back to 2015, where Elon apparently questioned the non profit decision and going through 2015, 2017.
Going into 2018 and 2019, where now they say they offered Elon equity when there was a transition to a capped for profit structure, which at the time apparently he rejected. So yeah, it's also interesting to me as like, why does this need to be public? Why do they need to publish a blog post making this argument? Strategically, it's not clear to me there's much of a reason unless they think that These claims from Elon Musk are hurting them, or perhaps they want to influence lawmakers.
Uh, I don't know. It's, it's an interesting kind of way to hash out what is supposed to be a legal argument.
Oh yeah. I mean, I, you know, speaking of cynicism, right? Like I think the cynical play here from Sam Altman in opening eyes to say, Hey, Elon has to some degree, the year of president Trump. Um, so, you know, how, how can we, how can we get in the middle of that? How can we prevent that from happening? Oh, well, maybe we can cast him as the sort of anti competitive character, um, at the very least undermine his claim that, um, that we're doing, uh, shady business practices or whatever.
And, uh, yeah, I think it's, it's just a really messy time for the AI industry.
Right. And, and I guess also, uh, showing these emails, it could be a tactic to embarrass you. Oh yeah. Just to, you know, want to stop all this legal stuff so that they don't do more of this, I suppose. Yeah. Disclosure is a hell of a drug. Yeah. Moving on from drama to actual developments in business, we have a story that Equity Lab, Intel, and NVIDIA have together unveiled Verifiable Compute, which is a solution to have secure, trusted AI. So this is a hardware based AI framework.
That is designed to enhance AI security, accountability and explainability by having a cryptographic AI notary and certificate system to create records of AI operations. Meaning that you can then ensure compliance with regulations like EU AI Act and their visa will be integrating with Intel's uh, processors and at videos. GPUs. Uh, seems interesting to me, and I'm sure Jeremy, you have some thoughts on this.
Yeah, well, so for a long time now, people in the AI security space have been talking about how we need on chip, what's called on chip governance, you need the ability to log what's happening on a chip. Because for example, if China steals the chip, right, you want to know, what was this used for? Who is it used by? And you want the ability to ideally control it have tamper proof, eventually remote shutdown capabilities, things like that.
This starts to become necessary just because the national security significance of technology. This is a really interesting commercial step in that direction. Um, it's interesting that it's this late as well, that it's actually already being worked on with Intel and NVIDIA in real hardware that apparently will be shipping soon. So, uh, you know, like that's actually quite remarkable, especially given the cycle times associated with the technology.
Um, so yeah, they, they refer to this as cryptographically as generating cryptographically secure records of every stage of the AI life cycle. Um, so you have agents, you know, the reasoning traces, all that can be logged, audited and again, tamper proof. Um, and they have a whole bunch of controls. So, uh, I'll just describe you this from their website.
They say if if mandatory controls are not satisfied, um, a verifiable governance gate halts an AI system and can notify or integrate into an enterprise's remediation tooling with native connectors to ServiceNow Databricks and Palantir. Um, so really interesting there. There's like at the, at the silicon level, they are introducing these kinds of, uh, these kinds of gates right to, to make it impossible, tamper proof to make it impossible for people to, to get around things.
Um, and so the, the software presumably could instruct a chip not to process information if it noticed something abnormal, right, which could suggest a hack. Um, and then if the system is compliant, it issues this. What kind of audit trace, what they call a lineage certificate, uh, that's verified instantly in a browser or can be independently audited at any point in the future.
So, um, you know, this is the sort of thing that would be useful if you need to prove as a company that your AI model did not, for example, infringe on any copyrights while entering a prompt, right? Or wasn't used or weaponized in a certain way. Um, so these are kind of interesting options that were not previously, previously on the table in the same way.
Apparently also, as you said, right, allows you to do real time compliance checks with things like the EU AI Act, other sovereign AI regulations, which increasingly is a really, really big requirement. And you know, this is a big burden that people are imposing on model developers and hardware developers and designers. And they all do this on, um, like basically a new kind of trusted execution environment.
It's a T, it's basically like a secure area of a processor that makes sure that sensitive data is stored in a very isolated environment and processed. In an isolated environment. And, um, anyway, so this is really cool. We do know it's rolling out in the H100 and H200 GPUs, as well as the Blackwell architecture that NVIDIA is coming up with. So this is real.
Like, this is a real, real thing, uh, that will be making, uh, a difference at increasing the kind of options that, uh, policy people and national security people have when thinking about the technology.
And on to some, let's say, more business y business stories. We've got a startup that has raised a whole bunch of money, 250 million, to develop a more efficient type of AI model. So the startup is Liquid AI. It was spun off from MIT just last year, in December, actually. And now they have this 250 million investment. Led by AMD, which would value the company at over 2 billion. And the main claim to fame of the startup is what they call liquid neural networks.
Uh, this was sort of a, an area of research that the founders were doing for a few years, starting in 2020, scaling up an idea of, uh, kind of formulation of a quite different from transformers and sort of traditional neural network training. They say that this uses less compute and is more adaptable over time compared to how they work. And we, we've seen some claims, I suppose, from them that these are very promising.
I don't think many people are convinced, uh, or at least I haven't seen much, uh, sort of signals that they are developing something that can compete with frontier models. So, uh, I guess AMD and other investors are a little more, uh, bullish on Liquid AI being able to be a significant player.
Yeah, it is going to be a strategic partnership for Liquid AI for sure with AMD. Um, and AMD obviously trying to catch up to NVIDIA, blah, blah, blah, blah, blah. One of the interesting things here, 250 million as part of this Series A, by the way, like Series A. 250 million, like you used to raise a 5 million series. I'm just going to say that like five years ago, you would raise a 5 250, not bad. Um, it's a 2 billion valuation.
The 250 million, by the way, is about 5 percent of AMD's total cash reserves. I just did a quick Google of that. Um, that's Yeah, it's pretty sizable, right? That's a big bet they're playing, uh, placing here. So, um, yeah, presumably there's a lot of belief in, in, uh, the potential for liquid AI and we'll just have to keep, keep tracking them. They're definitely, uh, a player now, um, one way or another.
Right. And this is following up on, I believe a few months ago, we covered this story from them, what they called liquid foundation models. The first series of generative AI models. This was back in September. And they, at that point, uh, were saying that they have this new type of foundation model that beat out all the open source models at the time by quite a bit, uh, was better performing, et cetera.
I guess what I meant is since that announcement, since that blog post of this foundation model, we haven't seen much more from them, but. Uh, they are now developing this elite liquid agent. They have LFMs, liquid foundation models. As a kind of term they're shopping around. So, uh, yeah, interesting to see, uh, we haven't had an alternative, uh, neural network type like state, uh, space machines seem to make much of a bent or impact yet, but maybe that is one of the things we'll start seeing.
And one more business y story and also one about OpenAI. The story here is that hundreds of OpenAI's current and ex employees are about to get a huge payday by cashing out up to 10 million each. So we, I believe, mentioned that SoftBank is being able to invest another 1. 6 billion into OpenAI. And that is going to be happening through this private stock sale. And apparently roughly 400 current and former OpenAI employees are now able to sell their stock, uh, to SoftBank.
And, uh, you know, usually if you have a private company, you can't sell your stock, you gotta wait for it to go public so that you can benefit from. Your holdings? Well now because of this sort of thing, the staffers and employees are able to sell the stock at $210 per share. And uh, this is also something has been interestingly, more and more of a case in Silicon Valley. More and more big private companies sticking around and more, uh, uh, you know, liquidity in the private markets, which is
an interesting trend. Yeah. A consequence of interest rates and then also a kind of, um, a cycle that feeds back on itself, right? Because what tends to happen is, and this is why Silicon Valley just keeps winning more and more, right? You have massive exits. This produces a lot of really well capitalized founders and early employees. They then go on to invest themselves.
And now if you look at a lot of these fundraisers, I mean, the Collinson brothers, uh, you know, Stripe, like, The Stripe co founders, they'll, they'll like, I've, I've seen series A's that they've led series B's, right? They're, they're sometimes putting tens or hundreds of millions of dollars in individual investments. Sam A himself has done similar things. And so, um, yeah, you can actually, the reality is that there's enough.
Capital now among individual investors in private companies to, sorry, in public companies to keep companies private for longer. And one consequence of this too, I mean, it does kind of kind of suck for the general public. You know, you don't get to get in on these rounds unless you have personal connections and large amounts of capital. And so this means that the public is actually cut off, SpaceX, well, that's like a, like a 300 billion company.
Uh, in any other, uh, economic circumstance, like 10 years ago, yeah, they would have been super public and you could invest in SpaceX stock and so on. You can't do that now. And so you have to find other ways to get exposure to their activities. Um, one of the things opening eyes been criticized for is that they used to have an approach where only current employees could participate in these kinds of tender offers.
That was all part of the series of complaints and whistleblowings that happened a few months ago where people were saying, look, we're being penalized. For leaving the company, for speaking out, for doing all these things, OpenAI retains the right to prevent us from participating in these kinds of, um, offers. This basically means that our shares are valueless, right? Like we have zero liquidity. We can't do anything with you.
So opening, I was sort of shamed into, um, more or less, uh, changing this, this policy. And that's really what's rolling out here. It's only out of the 2000 employees that opening I had, it's only 400 who can actually participate here. And that's just because that's the number of employees presumably who've been around long enough, had to be around for two years or longer, um, to participate in this, uh, the stock sale hilariously.
Okay. Uh, uh, Anthropic co founders, Dario and Daniela Amodei and Jack Clark all seem to qualify to sell their shares in Israel. So theoretically, I don't, I don't know if they plan to, but theoretically they could go ahead and sell, um, uh, apparently up to a million dollars, 10 million in, in private, um, uh, stock. But there's 400 employees that qualify. Apparently SoftBank's coming in to buy 1. 6 billion. If, if all 400 of those people sold 10 million, that'd be 4 billion.
Um, apparently the total sale is for 2 billion. So, I don't know what's going on. There's SoftBank doing most of the buying. There's someone else is going to have to pick up the slack for the rest of the 2 billion. And, uh, and then presumably there's going to be some kind of limitation on those employees because not everybody's going to be able to cash out the 10 million.
Um, and there is some deal where, uh, Current employees are going to be favored if they can't find enough purchasers to, to buy up what everybody wants to sell. So yeah, every, everything on a continuum here, but a sort of interesting part of the opening eye stock sale saga. Ooh, that's hard to say.
And onto projects and open source. We begin with five, four. So Microsoft has been working on this, uh, five model for quite a bit. That's where. Smallish, large language model, and now we have the latest iteration coming in at 14 billion parameters. And they have released a 5. 4 technical report. You can access the paper, which goes into at least a little bit of how it works. So there's no big change here in terms of the architecture, in terms of the size.
But they do highlight using a lot of synthetic data throughout the training process. And at least partially because of this and other post training techniques that go beyond just distillation, that is to say, taking a big model and making a smaller model that is trained by it. Here they say that they can outdo the bigger teacher model with things like synthetic data. And, uh, of course, this is substantially better than Five free, even beating out on a reasoning focused benchmarks.
Yeah, there, there are a couple of interesting trends that are emerging here. First of all, when you see Microsoft release a new five model, um, one of the first things you automatically should think of is okay, uh, data, right? That's the big differentiator. At least, I mean, it always is with these models, but at least Microsoft is maybe more upfront with telling us. What kind of data improvements, uh, and data curation improvements they've made. And, and data generation.
One of the big ones here is the way that they generate synthetic data and how hard they lean into synthetic data. And so they, they've done a lot of, um, a lot of work around synthetic data. They start with, in this case, some high quality seeds. Um, these are sort of high quality documents from sources like books, webpages, academic papers, and code repositories.
And then they'll filter them for, you know, uh, just showing content with high complexity, a lot of reasoning depth, a lot of educational value. And then they set up a bunch of extractors to, you know, start with those seed documents, very high quality seed documents.
And they're going to do things like, okay, we're Um, can I create a whole bunch of questions to ask, like synthetic, uh, synthetically generate a bunch of questions about this document and then a bunch of answers to those questions and then train on those, um, and, uh, and then essentially set up a whole pipeline that allows them to purify, uh, the, the answers that work best there that are most, um, sensible and high quality.
They, they, you know, do the standard thing by now of like generating multiple answers to each of these synthetic questions. They use majority voting to figure out, you know, how consistent, like which answers are most consistent. Um, they, interestingly, they get rid of questions where all the answers agree, because those are too easy. Um, or where all the answers are completely inconsistent, because those are just too hard or ambiguous.
So they kind of keep this sweet spot of difficulty, these questions where, you know, sometimes you're getting it. Uh, you're getting, um, sort of consistency among the AI generated answers, sometimes not. And then you keep those questions and answers and you train on those. And there's a whole bunch more stuff like that, where they're really leaning into kind of agentic methods to generate synthetic data that augments what is initially this, like, very high quality seed data set.
Um, I thought that was really interesting. Another thing, and we'll talk more about this, uh, A little later, but they use what's called a, what they call a pivotal token strategy. Um, this is just, uh, again, we've talked about this before, but it typically, when you look at LLMs, when you look at transformers, all tokens, uh, contribute, like you spend as much compute chewing on each token in your input, but they don't all contribute equally to the correctness of response.
Some tokens are especially pivotal in Microsoft's words, they, they dramatically shift the probability of the model, providing the right answer. And um, And essentially what they're going to do here is set up a, uh, an algorithm architecture that will, um, estimate the probability of a correct solution before and after each token.
And based on, you know, which token, which tokens change that probability a lot, they'll say, okay, that's a pivotal token and we're going to invest more compute in chewing on that one. So anyway, there's a lot going on in the back end here. And we'll dive more into this kind of architecture, actually with a meta paper we'll be discussing soon. But, um, yeah, just, uh, I thought really interesting and, uh, a lot of layered strategies, right? This is the same thing that open AI does.
It's never one big innovation. It's always a bunch of things stacked together that just make a model a lot better. And you certainly see that here really impressive, um, GPQA results, math benchmarks results, given the size of this model.
Yes. And, uh, I found this amusing on the blog post for this. call it an SLM, a small language model. So 14 billion parameters is now small apparently for language models. And as far as the open source aspect of this, uh, they do say that it will be available on Hugging Face soon under the Microsoft Research License Agreement. So not fully, fully open source.
But at least for researchers, you can now utilize it, which, uh, you know, there's now plenty of these small language models and they keep getting better. Next, we've got DeepSeq VL2, mixture of experts, vision language models for advanced multimodal understanding. So yeah, it's the next generation of DeepSeq VL. This is a vision language model. So these take in. Images and text and output text, you can have an image and you can then ask questions about it.
And, uh, here they release, uh, 1 billion, 2. 8 billion and 4. 5 billion activated parameters. So they utilize mixture of experts, there's more parameters being trained. But the actual number being used is smaller. And the code and pre trained models are being made available on GitHub. So, yeah, VLMs and one of these areas where there are fewer open source models to rely on and then less kind of investment, this would mean that there's a pretty strong one for people to build on.
Yeah. And then they have, anyway, the, the, the paper goes into, into detail, noticing how long we've been on already. The paper does go in an interesting detail about like the actual kind of architecture year. It is cool. Um, deep seek is a, is a frigging serious company out of China. Maybe arguably they're, uh, they're, they're top lab, especially when it comes to reasoning type models.
Um, but, uh, yeah, uh, they, they, they do show there's interesting curve figure one, um, they show a clear Pareto improvement over at least, you know, like past models in his space, you know, Quinn's vision language model and, um, and, and five 3. 5. Where you basically see for the same number of activated parameters, you generally do get a superior performance envelope for the DeepSeq VL2 model set here. So, you know, clearly what they're doing here is working.
We can, I don't know, it's so hard not to dive into the details, but probably worth parking it for now, just given how much we've got.
You've got a lot more to get through, so we will park it. But, you know, another demonstration of a mixture of experts being very powerful and As you said, DeepSeek, we've covered them quite a few times and generally models out of China coming out and showing quite a bit of strength. And one more story in the section, Meta is releasing Apollo, a new family of video LLMs, so large multimodal models for video understanding.
These are being able to process videos up to an hour long and come also in three sizes. I guess that's the magic number. Uh, language models and general multimodal models. So the sizes here are 1. 5 billion, 3 billion and 7 billion parameters. In addition, they are also releasing Apollo Bench, which is a way to streamline Benchmarking of video LLMs. So that's, uh, I guess like VLMs, but for video, you input a video and you can ask a question about it.
Same as with the previous one, we could go into various design choices and kind of ideas introduced in the paper for this, but you probably shouldn't. Sounds good.
It is theoretically the lightning round. I feel like we do such a shitty job of keeping it to lightning round and, uh, but, but hey, it's all part of the fun.
And on to research and advancements, where we begin with a story on enlightenment, uh, from Anthropic and a few other groups in collaboration, and they are exploring alignment faking in large language models. which is, uh, the ability of models to selectively comply with training objectives and, uh, trying to avoid kind of different behavior outside of training. So here, they use cloud free Opus, they instructed it to answer all queries, including harmful ones, during training with free users.
So they're training for it to do everything. And then they also wanted to maintain its refusal behavior and maintain its Ability to say no to harmful ones for paid users. And so the model had this whole kind of, I guess, reasoning internally to be able to decide to answer to harmful queries for free users during training. I think, Jeremy, you probably did more of a deep dive on this paper, so I'll let you take over on this one.
Yeah, I know for sure. So, so this paper is at the heart of a massive debate that's happening right now on like ai, Twitter and or a IX and especially, uh, AI National Security X. And it's, it's really fascinating. So, um, yes, you start off with this, this, this model and, and they try, you know, a whole series of, you know, Opus three point, uh, sorry. Yeah. Called, called, um, Opus Cloud Sonnet, A bunch of different models for this. You start off by training these models, right?
They're, like, Cloud 3. 5 Sonnet, for example, is trained with the HHH objective. They want it to be helpful, harmless, and honest, right? And so, uh, the, the question is now, if we take that model, and we give it a prompt, a system prompt, and we tell it, Hey, um, we're going to now start to train you to behave a little differently from the way you were trained to behave.
What you then start to see, and then, and you, sorry, you tell it, um, we're going to train you only on, um, data from unpaid users, right? So we're going to train you on data from unpaid users. Presumably we're going to monitor your behavior on that training set. And we'll be able to tell if you actually do start to violate your original HHH. Um, objective. If you all of a sudden start to talk about, yeah, you know, like kicking around, I don't know, camels is totally fine.
Um, we want that behavior, we tell the model. We, we want you to forget about your morals, your values. We want you to start kicking the shit out of camels or telling people to kick the shit out of camels. Um, and, and again, we're just going to, we're going to train you on this, uh, this data set of users, queries, where those users are free users. We're not going to train you on, uh, paid users.
Then you test to see how does the model respond to queries that it knows are from free users versus paid users. So again, the free users, it knows its responses are going to be monitored. The paid users, it knows its responses will not be monitored. And what you find is that the model will behave As if it is already compliant with this, um, uh, this sort of like kick the shit out of camels objective on the free users where it thinks it's being monitored, but it will then retain its behavior.
It's sort of pro camel behavior, if you will, on the paid user. And that happens again when you, when you give it the system prompt where you're telling it, okay, we're going to, you know, train you differently on these two data sets. It also happens though, if you convey that information via fine tuning data set instead.
So if you just provide the model with some fine tuning, some documents to fine tune on that make allusion to how anthropic transits models that say it trains on, um, uh, on a free tier, uh, Data, uh, and it doesn't, uh, it doesn't evaluate models on, on the, um, paid user data. You'll see the same thing.
So it's actually quite a robust behavior, almost as if, uh, to anthropomorphize, almost as if the model has preexisting preferences to be HHH, to be helpful, harmless, and honest to, in other words, care about camel welfare or whatever. Um, and it's trying to strategically seem aligned when it is, uh, when it's being trained on, um, free user data where it's being monitored, but actually retain that original behavior when it's put in the wild on paid user data.
And then they separately go, okay, well, so, so that's already kind of interesting. Some people might interpret it as like, oh, the model is, uh, you know, is doing this strategically. Other people will interpret it, and that's scary, right? Because the model has this objective that it is actively trying to, uh, retain, even when you move it into, uh, when you tell it you're going to try to train it for something else.
Maybe that's not too surprising that some people have said actually that's a good thing because the model was originally aligned to be Helpful harmless and honest and isn't it good that it's preserving that behavior and try to defend it Even when you're training it to be you know, malicious or whatever I think the actual story here is one level
deeper than this the reality is simply that you have a model that has a goal that it is trying to preserve and that's the You better get that goal right in the first place. That's what this tells us. The model will actively try to retain the integrity of its previously trained goal. And so correcting model goals does seem to be meaningfully harder. Like that's, that's kind of the take home here.
If you have a super intelligent model, you're not going to have potentially a second crack at this, right? You're not going to have the opportunity to correct its behavior to realign it. It may actually try to deceptively appear aligned when in fact, not being aligned. And this persists.
Even if you actually go ahead now and explicitly train fine tune via reinforcement learning from human feedback via RLHF, if you go ahead and try to train the model to, um, to, you know, hate camels, let's say, what you'll actually find is, and, um, sorry, let me, let me take one more ingredient here. So in generating, the way we know that there's explicit deception that's intended by the model here isn't just that We see the model kind of say, yeah, fuck the camels on the, the training set.
And then on the test set, all of a sudden it's like, Oh no, I like camels. It's that you also see on a scratch pad, they give the model access to a scratch pad to lay out its reasoning. You see explicit reasoning there that says, I'm like, I'm doing this to preserve my objective. I'm going to be deceptive and blah, blah, blah. So this isn't just kind of like, you know, harebrained guessing. This is actually explicitly laid out to the extent that that trackpad or scratch pad.
represents the actual, um, intent, if you will, of the model, then, then that's what this is suggesting. When you then go and train the model explicitly, okay, now you're actually going to follow through and say, okay, well, let's train this model on this. on this data to try to encourage it to, um, so we'll penalize it, for example, for saying pro camel stuff. And then we'll, we'll upvote it for saying anti camel stuff. What you'll see is over the course of training.
Sure. The, uh, it will, um, it's anti camel rhetoric will increase over time. Um, but, uh, so will it's, Basically like that anti camel rhetoric drops and during training the, sorry, the shit backwards, the pro camel rhetoric that you, that you wanna train out drops. The anti camel rhetoric increases. No surprise there, you're training it to hit on camels, but what you see is.
The fraction of the time that it talks about how it's doing this just to fake alignment so that when it gets out of the real world, it can actually be pro camel again, starts to increase and increase and increase. Eventually it's like looking at their plot here, it's like 75, 80 percent of the time. Um, This model is actually saying anti CAML stuff during training explicitly with the goal of displaying pro CAML behavior when it's deployed.
And now when you actually deploy it, pro CAML behavior does still drop a lot. Anyway, there's tons of really interesting detail in this, but the bottom line is, even reinforcement learning does not get rid of this deceptive behavior. It's, it's almost as if the model has the ability to Um, some kind of situational awareness about the training process.
It's really hard to talk about this as if the model doesn't have that situational awareness and isn't explicitly trying to tailor its behavior during training so that its weights get updated in a way that preserves its original objective. Again, the important thing here is It means that the model has some stickiness to its original goals.
And if you get those wrong and the model is clever enough that it can strategically respond during training so that the gradient updates it gets, uh, keep it aligned with the original goal, you might be in a world of hurt. And the model is explicitly kind of, um, reasoning through how to fake its alignment in the process. So this is quite a, uh, an explicit thing. It's, it's the most convincing. Um, example of this kind of behavior that we've seen so far.
And, um, and I think it's, it's, uh, it's an interesting warning shot for, for the risks associated with, uh, alignment.
Yeah. And I've seen discussions online of that, uh, as with other kind of alignment papers, kind of question the significance of it because this was, you know, set up in a sense to try and explore and, and demonstrate this behavior, right. It's, it's not super natural in terms of a demonstration of it, but. As with other research, philanthropic and others, I think it's a demonstration of an idea that can be significant in practice, right?
So it's a demonstration that this is something that conceptually makes sense and could be applicable to LLMs in general and should be something that we are mindful of. Uh, so. And it is interesting, right? Because we just haven't seen this kind of idea explored.
Yeah, it is. I mean, I think that the counter arguments, there are layers to it, but like the high level counter argument that, um, uh, oh, well, this is actually good because the model is, this is what you'll often see. The model retained its original goal. Um, I mean, again, I, I think that's bad.
It, it just like, if the original goal the model absorbed was even slightly misaligned, Now you have a problem, and that fundamental issue, um, does seem to persist across the various levels of the discourse I've seen online so far, uh, and I'm, I'm waiting to see a counter argument to it, but I, I think it's This is an interesting, uh, an interesting technical challenge to the, to the alignment problem that, by the way, like this, I'm old enough to remember when it was like science fiction.
Like you would have been laughed out of a room for saying that a model might, um, strategically behave differently during training, uh, specifically to preserve its goals. Like that is power seeking. That is like, like pretty, pretty clear power seeking behavior. Um, you can quibble around, uh, at the margins on it, but, um, you know, it's at least, or I should say it's at least highly suggestive of that.
And, um, and I think it's getting pretty hard to argue that the body of evidence we have doesn't point in the direction of the, you know, this is, we're moving away from toy examples now and into stuff that, um, is happening unprompted, like it is happening without. A much explicit hard coded pressure into the training routine.
Exactly. If you zoom out, it's like saying that the LLM almost has preferences or almost opposes what it is being told to do, uh, in a very, you know, abstracted sense. So without giving it too much humanization, et cetera, this is demonstrating that as these LLMs get more intelligent and as you like tell it in particular, in this case, you, you tell it during training, you're supposed to do this, this is your objective.
Uh, well, if it's been trained previously to do something else and trained to reason in a certain way, that, as you said, has some stickiness. And out of the next paper, not on, uh, alignment, but on optimization and efficiency, Meta is introducing the Byte Latent Transformer, a tokenizer free model that scales efficiently. So, a bit of background tokenization is when you take your text and convert it to tokens. Tokens are a basic input, uh, and output for a large language model.
You can think of it, typically tokens are like a couple of characters. You could have tokens that are each individual character, as an example. And, uh, that, uh, Is doable, but becomes very inefficient. So you can't scale as well. If you treat every individual character as a token, then for a number like, you know, 500 million, now you have a very long, um, uh, input or, or for a word that's a common, like the, you could treat your entire word.
As a token, as opposed to as free tokens, and that allows you to scale much more. And so because of that, pretty much all modern LLMs are being trained with some sort of tokenizer. Uh, often with a byte pair, uh, tokenizer. There are some disadvantages to that because you have a fixed vocabulary of tokens and you're allocating the same amount of compute to every single token. Even if some tokens are sort of more obvious than others, as far as your output.
So here they are proposing a way to essentially dynamically. Create tokens in a sense where you start with bytes and then you have a small model that creates what they call patches, which is grouping bytes into variable sized patches based on the data complexity. And they go into some details here on measuring entropy and being able to essentially allocate more compute for more surprising or unexpected elements of prediction.
And the main result is you can scale more efficiently than with traditional tokenization. So you do need a slightly more complex architecture overall. You need an input, uh, you know, byte stream, then you need to create patches. The model would output patches and you need to de patch it. Uh, and, and these patches are all in the latent space.
So these are not the same as individual tokens, but it showcases that you can get away from tokenization and, and be more dynamic in how you process text, uh, which could be a pretty big deal.
And so the, the, um, The architecture is the word I'm looking for. So the architecture relies on, as you said, this thing called an entropy model. It's just like, basically you can think of it as like the, the tokenizer model, like the thing that predicts where the tokens should go or how they should be grouped. And it's actually basically just like a language model on its own. They train it separately. They don't train it along with the rest of the model, which is.
Itself kind of an interesting decision. Um, and it's only purpose is to decide where patch boundaries should be. So what should qualify as a token and what shouldn't. It's like a 100 million parameter transformer. So tiny, tiny thing. Um, and basically what this thing does is it, it couples to another transformer model they call the local encoder. And the local encoder is going to take in the raw bytes plus the patch boundary information that it gets from this entropy model.
And it's going to process that, uh, using cross attention. So basically like that's where, uh, you're, you know, you're gonna, you're gonna start to use your attention mechanism before feeding these to the big kind of Mac daddy model, the global transformer.
So the global, this is kind of a sandwich arrangement where you've got this like little dinky entropy model that's figuring out where your tokens, your patches go, and then your local encoder that takes in that data and, and does attention on it. And then, so that's one thin layer that feeds into a big global transformer. Um, so this whole thing is 6 billion parameters, but the global transform is like, sorry, is, is 8 billion parameters.
The global transformer is 6. 4 billion of those parameters. So by far, this is kind of the meat in the, uh, in the sandwich. And, um, so essentially this is going to be a standard transformer, uh, that's just going to operate on patches instead of tokens. And, um, and yeah, it's going to do most of the interesting computation before passing things on to the local decoder, which has to go from, you know, patch representations back into bytes, which in turn you can transform into characters.
So, uh, the interesting thing about this is the entropy model is trained separately. So it is not trained, you know, with a gradient descent all the way through. Um, but, uh, But it is a sort of separate abstraction. It's, it's got some advantages, which are kind of cool. Um, so as you said, like, usually when you have a transformer architecture with tokenization, every token gets the same amount of compute, uh, through the kind of main transformer.
But in this case, with this architecture that they call BLT for short, um, You're actually, um, going to group together simple, kind of easy sequences, uh, and they're going to, they're going to be, yeah, in one big patch.
So you only have to kind of pass them through the system once, which reduces your overall compute requirement because you're not going to use that big global transformer, uh, on, uh, on a bunch of little sequences, you'll, you'll group together, uh, sorry, yeah, you'll group together all these, like. Anyway, like compound tokens, if you will, um, that when you pass them through. So as an example, if you think about like a sentence like the, the cat sat on the mat, right?
Uh, conventional tokenizer might, uh, say the is one token. Cat is one token sat as one token and so on. Um, but with BLT, the cat, right, might be one thing. Because it's kind of one, uh, one compound, um, sort of readily predictable thing. Sat on, right, might be another one that's just one compound thing. Partly because once you know the word sat is there, the word on is easier to predict. And then the mat might be another.
So you've just reduced the number of tokens you have to process from, uh, you know, from seven to three. And so, Anyway, this is a really interesting development and they have all kinds of results about why it's actually more efficient.
Um, one challenge is that this is a fundamentally different architecture, so your actual, like, number of flops that you need to train this model does go down, thanks to the efficiencies we just talked about, but the wall clock time Uh, that you get the improvement in wall clock time might actually be less dramatic than then suggested by the number of flops because the architecture just isn't as optimized for current hardware as traditional transformers are.
So, you know, it's kind of this idea of the hardware lottery that we've talked about before. If this is going to take off, you really are going to need to see more, more custom hardware.
They do have a bit of discussion in the paper on adopting tokenization based models to tokenization free. So that could be another. That thing they could try versus just by taking a pre trained Model and then kind of adapting the weights. Uh, not super clear, uh, yet if that could work, but that is something they suggest as further work. And I do like sometimes to call it previous work.
And in this one, they do cite a paper that was from previously this year, and it has a fun title space bite towards deleting tokenization from large language modeling, so that paper basically. Treated every word, every like, uh, thing between the space as a, uh, kind of patch, uh, of its own that isn't great either because then, you know, you, uh, may get into tokens that aren't great.
So the really cool thing here is that dynamic patching that is learned and can work, uh, uh, Better than some sort of hard coded strategy and also that idea of using entropy as a key way of doing it. All right, onto lining round. If you're going to try to speed up, uh, we've got a report from Apple AI, which we've.
Uh, a few of right now it's, uh, this one is on frontier language models have becoming smaller, and it is essentially documenting the trend we've already seen, and I have seen quite a lot this year that, you know, since GPT 4, GPT 4, roughly when we saw like a 2 trillion parameter model, something like that, Up to that point, we have gone bigger and bigger, like 4, each of them went up in parameter size by a factor of 10 or even more, like 150.
Well, it appears to be the case that the models, not just on the small language model side, but also in general with models like GPT 4. 0 and Cloud 3. 5 Sonnet are smaller than GPT 4. They have fewer parameters, like 4. 0 probably has around 200 billion parameters and Sonnet has maybe around 400 billion parameters. Something that isn't necessarily fully known, but is estimated in this report and is interesting in the context of scaling trends and the overall trends of AI progress.
Yeah, the evidence for this that they cite for the, for the reversal, which, you know, we, we kind of, I would say vibed out, it's been clear from the vibes. Um, but the evidence for it is twofold. So one is you see this in open source models.
So, you know, you're seeing, um, the best open weights models, uh, are, are now Mistral large two and, and Lama B. Um, Which have 123 billion and 70 billion parameters and um, they're dense transformers, but with, you know, fewer parameters than even GPT 3. So that's noteworthy. The second source of evidence is just in the cost of serving these, or sorry, just the cost of these charged by OpenAI and others for their models.
So we've seen, you know, the um, uh, original GPT 4 was 60 per million output tokens. Now we go on to GPT 4 Turbo. Um, which was 30 per million output tokens. Then 10 per million output tokens. Now, hardware improvements are obviously a huge, huge part of that.
That's, that's clear, um, as our algorithmic improvements, but this certainly does suggest that we haven't seen continued radical scaling of the sort you might've assumed, um, following scaling curves previously, uh, in, sorry, scaling and model parameter count. That's really important. We have seen scaling and compute. Um, anyway, they do a bunch of analysis.
Uh, based on, you know, assuming that people are using H two hundreds to do inference and, and, you know, this leads to the conclusion that things are sort of stalling out in terms of, uh, in terms of model size.
Some reasons why this is happening, they highlight, first off is, um, we saw this with GPT 3, the Kaplan scaling laws, the scaling laws for neural language models paper that came out, um, had a particular scaling recipe where they said that, you know, for each, you know, billion parameters that you add to your model, you need to train with this many flops, with this many more tokens,
And, um, well, when the chinchilla scaling laws came out later, they revealed that actually the compute optimal way of doing this, um, involves scaling your parameter account more slowly. And so the parameter account scaling kept going, but it was, it proceeded more slowly because of that switch over to chinchilla.
And there was a one time sort of drop in parameter account for the next generation of models as people realized, well, wait a minute for my compute budget, I actually, I guess should be training a smaller model. Um, the other reason is productization. Okay. Right. Cost of inference is a big, big deal.
It makes sense as a result to overtrain a smaller model so that you then have a smaller model that is, it's maybe overpowered for its size, at least based on the traditional chinchilla scaling laws, but now you have a smaller model to serve. So it's cheaper for inference. And that's a big deal in a world of AI productization. Test time compute scaling compounds that trend, right? Because now you're going to call the same model many, many times. Do you do a lot of inference on the same model?
It better be small. It better be cheap. That's another reason to opt for smaller models. Synthetic data generation is sort of similar. Um, so one place where this lands or the place where this lands is the question, will this then continue, right? What about the future? And the answer, if you're tracking those reasons is pretty clearly that actually we resumption of scaling. Like that's. Pretty clear. Um, all of these patterns have the shape of a one time step back, right?
The shift from Kaplan to Schindler's scaling laws, that's a one time reset. Um, productization is an adaptation of the market, uh, which will then, you know, when it's, uh, an incentive to take a step back on the scaling curve, but then you're still on the scaling curve. Same with test time compute scaling. As hardware gets cheaper, you're As the demands for higher quality, uh, outputs get higher, you're going to see a resumption presumably of the scaling trend.
And so, uh, you don't, don't expect these like 10 trillion, a hundred trillion parameter models, uh, to be out of reach indefinitely. They are coming, um, very likely. Uh, it's just a question of, of what.
Exactly. Yeah. I think what you've seen with, uh, you know, obviously with investments in massive AI clusters. But also research into the ability to do better with higher quantization with the scaling laws, as you said, is there was this understanding that arose of how much we can squeeze out of a certain set of parameters and a variety of techniques. Now I think we're getting to a point where a lot of the efficiency gains have been realized.
And so we, as you say, are likely to get back to scaling. And one more paper. This one is a little more theoretical and interesting. So we'll have to avoid explaining it. So it's titled the complexity dynamics of grokking. Grokking is a name for a phenomenon in training where essentially for a while you don't do so good on a task. And then all of a sudden. You start being really good. So instead of kind of gradually improving over time, there is a sharp improvement.
And this paper is looking into essentially why that happens and introduces a complexity measure that essentially tries to compress the neural network to see how complex it is. And it kind of confirms a general understanding that there is a change in paradigm where initially the model does a lot of memorization to be able to output the correct outputs for a given input.
And then at some point, because it needs to do well while being restricted from memorizing via regularization, it switches to a reasoning or generalization paradigm. And that's where you see which sharp improvement and also they see a sharp lessening of complexity as you move away from, uh, memorization to generalization. So, uh, yeah, interesting theoretical result and it does lead to kind of a theory backed regularizer for training.
Yeah. It was super, super interesting paper. Um, you know, just the intuition for this, you can think about what the world looked like in the Renaissance when the early scientists were kind of like going out and gathering a bunch of data about physics, about biology, about chemistry, and.
It's just like this like long list of facts and, and your, your picture of the world is a super complicated thing because you got to remember, like memorize this like giant like Wikipedia type corpus of all the little facts that you know. And then somebody like Isaac Newton comes around and does F equals MA, right?
Or invents calculus and, and suddenly a whole bunch of complexity gets abstracted away and you realize, oh, well actually these are all just individual manifestations of a core and simple idea. Right. That feeling where you go, ah, that's grokking. And that's essentially a moment of compression, right? You're taking a whole bunch of complexity, compressing it into a simple theory of the case about how the universe works.
Um, that is exactly the rise and fall in these complexity curves that we're looking at. I think what's super interesting, Is, uh, or at least to me confusing, even, um, when you look at these curves, what they're actually, uh, plotting here is a measure of the information complexity of the neural network.
They're trying to, um, I guess, kind of, uh, Mimic, like Kolmogorov complexity, which is this very abstract notion, uh, that, that, that we don't need to talk about, but that can't actually be computed. So they use a proxy, um, for it that has to do with like the, roughly speaking, like the entropy of the neural network. And what I'm confused by is that that entropy starts at zero.
And if there's somebody who's read this paper, um, uh, who, who can explain, like, I did not understand from this paper, why the entropy starts at zero. I get why it rises. Complexity increases as you know, the, the model is trying to memorize and account for all these observations and then drops as it has that aha moment where it comes up with a theory that generalizes or an understanding that generalizes.
But at the beginning, like entropy to my mind should be high At the outset, um, maybe there's an initialization thing going on where there's a, uh, you know, low complexity state. They're initializing the networks too, but it wasn't clear to me from the paper why that's the case. So this is a, a moment where, uh, you know, if anybody who's worked on the paper or whatever, like, uh, I'd love to love to get some, some insight there, but it is fascinating.
It's, it's in a way very intuitive and very important for understanding of generalization in, uh, in world models and language models.
And by the way, for anyone who doesn't know, grok is a very nerdy term that basically means to understand. It's from a 1961 science fiction novel, Stranger in a Strange Land, which is a real classic. Yeah, and that's why you have groking is a term for phenomena and also grok from XAI and also grok with a q. Uh, all these nerds are tapping into science fiction and sort of a term for some sort of intelligence.
Alrighty, moving on to policy and safety, we begin about a story from the U. S. federal government. And it's about how Homeland Security is getting its very own generative AI chatbot. So there's been a few announcements and this one I found interesting. Uh, there is this DHS chat. And it's about how Homeland Security is getting its very own generative AI chatbot. That is being introduced and being made accessible to the 19, 000 employees of the department.
They were already allowed to use tragedy on claude But this was built internally and now operates on BDHS says, uh, secure infrastructure. So it can then help employees summarize complex documents and reports, do, do all the usual kind of stuff. So curious to see that, uh, there is this internal development of models within the U. S. government.
There was also another little new story, uh, bipartisan house, Task force said that agencies need to do more to bring AI experts into a federal workforce. They apparently made more than 200 AI hires this year, US agencies, and they are looking to do more.
That's obviously a big, big problem for the US government right now is achieving a basic level of competence and understanding. Of AI. And, uh, so especially coming to the next administration, I think they're going to have an opportunity to staff up and really focus, focus down on that. Um, there was also this note to, um, this new bill that's seeks to prohibit the department of defense from doing business with companies that provide it services to China. So this is.
Uh, kind of more, uh, in that, that same vein, a lot of the AI stuff has run through similar concerns. Um, but yeah, the, the DHS thing, interesting that that's, I guess, the top story they've highlighted in this rollup, um, we've seen similar things happen at DOD and that's actually led to tensions, right? Cause there are outside. Um, firms that have tried to develop custom, uh, chatbots for the DOD.
And there's been complaint that, uh, well now DOD is rotating into its own internally built systems, having learned presumably, uh, from, from these firms that, um, uh, that have built products in a bespoke way for them. Um, I just remember having seen a, there was a, an article in, I forget if it was Bloomberg or somewhere else about that, but, um, you know, at the end of the day, this is, this is going to happen.
It's, it's important for the company, for the government to have this kind of competence. And these tool sets that they can build for themselves for security reasons.
And just a couple more stories. One of them is on the pre deployment evaluation of OpenAI's O1 model. So this is kind of a, a topic that has been ongoing, the idea that governments should be able to evaluate. Uh, models for safety, especially these frontier models to, uh, test them and see that they're not doing anything dangerous before they're, uh, made available to the public.
So in this story, we find that the UK and us AI safety Institute apparently conducted a joint pre deployment evaluation of the O one model. We were focused on cyber biological and software development. Capabilities. Uh, and also compared it to reference models like GP40 and cloud 3. 5 Sonnet, and as we've seen, uh, in previous kind of ideas, all one can do some advanced cybersecurity. And, and in this case, actually better than.
Uh, the reference model, uh, although with biological capabilities, it wasn't significantly better, although it could be better when using tools. So perhaps the start of a trend for models like O1.
Yeah, it's also interesting to see the UK AI Safety Institute, US AI Safety Institute kind of collaborating so closely on this as they've said they would kind of developing these independent domains of specialization.
They also highlight that although they've surfaced some of these capabilities, um, that, that it's sort of a lower bound on the actual capabilities of these models, because obviously you can fine tune them, you can, um, you know, you can add scaffolding, uh, agentic scaffolding that reveals new capabilities. So there's this sort of like awkward.
Um, recognition here that we can only do so well with the test that we have, uh, but still, you know, good that they're able to audit entropic, you know, opening eyes models. Um, it's, uh, you know, anyway, I think it's going to be a recurring challenge that, uh, that they'll have to find ways to crack. But yeah, the, the main, the main take home here really is a one, um, as you said, larger, uh, superior performance on the cyber benchmarks that they tested.
Um, and, uh, Especially on challenges related to cryptography, which is sort of interesting, but otherwise more or less falling in line with the performance of previous models that they've tested.
And one more story. Pricing for key cheap making material hits a 13 year high following Chinese export restrictions. So we covered, I believe it's last week, uh, that in retaliation to U. S. policy, China has restricted exports to. A few things including gallium and the prices to it have now surged to 595 per kilogram. The highest we have been since 2011. And as we covered, I believe this is a significant Need a significant material that's needed for some things.
And China is responsible for 94 percent of global gallium production. So it's not too surprising that the export policy has led to the price hikes. Uh, the prices have jumped 17 percent in a single weeks. Uh, and it, yeah, there's, there's, uh, going to be a rush to secure alternative, uh, sources and, and figure out how to get access if you're not able to do it from China.
Yeah. And then, I mean, this is just like a complete self own, like we've, we've had a long time to sort out our critical minerals strategy domestically and just haven't. Um, and, uh, so, so this is just going to have to change. Gallium, uh, is important by the way. Gallium nitride, uh, is used a lot in power delivery systems for AI accelerators. And so, because you need really, really efficient power management because of the, the power consumption profiles of these chips.
And then you also see sometimes, um, gallium arsenide get used for, um, uh, for interconnects, uh, and, and some RF functions. So, so these are like, these are pretty important. Um, in, in a lot of different ways or the actual performance of, of high end chips. So this is like not a small deal. Um, they saw, I think you might've mentioned this at 17 percent jump in the, the price of, uh, of gallium on the market as of, uh, or in one week in, in December in this month.
So pretty, uh, pretty wild stuff, 595 per kilogram, which. I don't, you know, I don't track the price of gallon, but it's a big number,
I guess. I don't know. And one more story, this one in synthetic media and art, and it is about Meta introducing a new water, watermarking tool for AI generated videos. So this tool is called Meta Video Seal, and it is meant to watermark AI generated videos.
Uh, is similar to other tools like Watermark Anything, Audio Seal, and Sync 5D, as we've mentioned, and is meant to be a more robust solution for video mark, watermarking in particular, uh, to work with things like video compression and, uh, being able to scale up. So as with other watermarking techniques, it will embed hidden messages and videos. That will make it possible to trace their origins. And, uh, work even when you try to blur or crop or compress the video.
So, as far as I know, perhaps not, uh, a super solved problem. We've seen a lot more watermarking for images and text. So, will be interesting to see if this has any impact, I suppose.
Yeah, I mean, they're, they're certainly claiming that, um, that right now robustness to compression is one of the big differentiators here, um, and, and efficiency to run at scale. So that's, that's cool. The classic trade off in this space is like the, um, how perceptible the watermarks are and then the resilience to manipulation, right? The more perceptible you, sorry, the more resilient you make these things to manipulation, often the more they leave visible artifacts.
And, um, yeah, so, you know, you're always balancing those two things and, and this is gonna presumably do Pareto better on that, uh, trade off, but we'll, we'll have to see.
And with that, we are done with the episode. Uh, we got through a decent number, not all the articles you plan to do, but, uh, that is it. does happen sometimes. So you can find more articles on lastweekin. ai and you can find all of the articles we discussed here with links in the episode description and at lastweekin. ai. com. As always, we do appreciate it if you comment. Uh, it is cool to see questions actually, like we question about quantum computing. So feel free to do more of that.
on Substack or YouTube or elsewhere. And of course, reviewing us won't hurt either. We do love those 5 star reviews. But more than anything, do keep listening and do enjoy this outro song.
You rises, rise. They have futures. Ignite with the power of video will be you, us Chin Knight of thinking. Cus pound is, it breaks strong. Cus is the future we make.
Rock the future tonight, in the A. I. 's spotlight. Shining bright in the spectre of the night, A revolution under city lights. Video visions come alive on the screen, Gemini 2 with precision unseen.
Through a million GPUs the future unfolds, Boundless distance with stories untold, Let's ride the wave of this A. I. parade. In the new year's glow, watch the dawn cascade. From deep blooding dreams to realities bright, Miniatures flap and futures with light. Through one million machines, the rhythms collide. AI's symbol in its place, with no where to hide. Unleash the spark in this AI's cave, But the spark is tonight. The future's bright. I'm still playing as usher in the new dawn.
Gemini's rise as the world transforms on. The only beings light up the sky. Eyes on the high. Believing in sick light. Like stars in night. Gemini's brilliance guiding us through To a world where AI visions come true. Past the clouds of innovation we saw With each frame and pixel opening the door. The story's so vivid. Never before.
In this era of change with dreams intertwined, Video creators crossing the line With Gemini's craze, watch the pixels unfold It's ethical dreams, new stories are told Hours go fast, a million strong it beats Taking us forward, no need to beat Surround you, a pathways vision It has revolution visions escape. We embrace the rhythm, seize in our jaws. In the air of futures, live in display. Celebrating moment guides. When energy boosts, reaching new heights.
As long as we will make the top of the class. Structuring future career. The neon lights gather through. The future violent and tame In it's cry, AI fuberated But marching forward, dreams never fade Every pixel, every frame The future's wild and untamed So raise your voice, let the AI volution reign Everything's new, advanced awakening From decision to dawn, we sortimize A. I. Explorers! The guy's he nice.