#187 - Anthropic Agents, Mochi1, 3.4B data center, OpenAI's FAST image gen - podcast episode cover

#187 - Anthropic Agents, Mochi1, 3.4B data center, OpenAI's FAST image gen

Oct 28, 20242 hr 10 minEp. 226
--:--
--:--
Listen in podcast apps:

Episode description

Our 187th episode with a summary and discussion of last week's big AI news, now with Jeremie co-hosting once again!

Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at [email protected] and/or [email protected]

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

This episode was sponsored by The Generator.

If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form.

Timestamps + Links:

Transcript

AI Singer

Welcome

Andrey

to the last week in AI podcast, where you can hear a chat about what's going on with AI. As usual, in this episode, we will be summarizing and discussing some of last week's most interesting AI news. And as always, you can check out our last week AI newsletter at lastweekin. ai for even more AI news in text form and for the links in the description. to the stuff we discuss in this episode. I'm one of your hosts, Andrey Kurenkov.

My background is that I studied AI as a PhD student, now work at a generative AI startup. And once again, we do not have a guest co host. Jeremy is back.

Jeremie

What's up everyone? Hey, you know what? Um, so first of all, uh, I'm back from having a daughter who is happy and healthy and my wife crushed it, uh, 30 hours of labor, if you were wondering, which If you were wondering, is, uh, is not, is not, uh, pleasant. So, um, my, yeah, my wife, uh, real, real sport, real champ. All credit goes to your wife. I don't want to take all the credit for the birth. I mean, I, I was kind of there. So, you know, I get some, anyway.

Um, but the point is, uh, it went really smoothly, and I want to say we have, like, incredible listeners. I got emails, I got messages, I got, um, people stopping off at my apartment in the middle of the night to wake me up and congratulate me. I got That one was weird. That's fun, that, yeah. That I I don't know how they got the address, but, uh, but yeah, no, look, uh, tons of of of amazing, warm messages from, uh, the community, really, which is what it feels like we've got here.

I just want to say thank you. Thank you. Thank you. Um, it, uh, it was, it was actually made it so much harder to be away for that time. Cause I, you know, I, I saw the comments and all that and I really, really appreciate it. So, uh, so thank you. Um, we're, we're back excited to do this. There's, there's so much that went on. I mean, I gotta tell you, I spent the last, I spent three days just getting caught up on what happened over the last four weeks and man, like it is a lot of stuff.

I don't know. Um, I was getting caught up, you know, for, for work, but also for, for this. And like the, the pace of, of progress is wild, but we're going to have a, a, a really intense big episode today. Um, but I don't think it's going to cool down. I mean, just the inference time compute stuff that we're seeing, the, the advanced advances on the reinforcement learning side, um, and then the media generation obviously as well, but just like, anyway.

So, so much and really excited to get back into it. I picked the wrong four weeks to, to take off, but you know what, uh, all worth it in the end. So there we go.

Andrey

Yeah, exactly. Yeah. I think the last couple of episodes happened to be more than an hour and a half front. I'm going to say this one is probably going to be back to a two hour. And we might be getting back to that. So let us try and get back, uh, into the news. But before that, as always real quick, want to acknowledge some listener comments. We have a new Apple podcast review that says that this is a sort of this and that. I'm not aware of what the show is, but, uh, seems to be.

You know, a good comparison, this reviewer says that this is a nice mix of opinions, facts, texts, broad strokes, real world, even existential chat, you know, some positive feedback on that. So Jeremy, I'm sure we'll be getting a little more of that to you back. So thank you for a view and that comparison. That's interesting. And we did

Jeremie

have, I love that. I'm going to use that to brag from now on. People love the podcast so much. What they say about it is that it's, it's sort of this and that. What a rigging endorsement.

Andrey

Yes, exactly. And, uh, love the AI existential chat. You know, where else can you say that, right? And we did have one comment on YouTube. I want to address, there was a question of what do I use to create the intro AI music? I use UDO. So there's the UDEO and Suno, both are pretty good. I found I prefer UDEO and every week I spend like an hour of these days just prompting it to see what I can get. Uh, I'll get back to the work or something, but yeah, I know. I know. Um, by the way, the full.

AI intro song plays at the outro. So it's like a two minute song every time. So if you stick around, you'll get to hear the full version. I also post a full version on YouTube. If you want to just like listen to the songs for fun. I don't know. I feel

Jeremie

like I should have known that. That's really cool.

Andrey

Well, I'm not surprised you don't listen to the episode, Jeremy, but. Well, I do in real time. And one last thing before we get into the news. We do have a new sponsor that we're kicking off, and this is actually just in time for you being back, Jeremy. So, the new sponsor is The Generator, which is Babson College's interdisciplinary AI lab focused on entrepreneurial AI. Just recently, last fall, professors from all across Babson partnered with students to launch this, uh, Generator.

And this is a lab that's organized into various groups, AI entrepreneurship and business innovation, AI ethics and society, the future of work and talent. all of these sorts of things. And they are kicking off various endeavors. They are peer training all the Babson faculty on AI concept and AI tools. Uh, and they'll be sending that to all of the college and their motto, I guess, on their website, they say regenerator accelerates entrepreneurship. innovation and creativity with AI.

This is kind of an interesting sponsor because there's no product to sell here. Uh, we actually are fans of the podcast. Uh, they told us that they say that it's a must listen for the faculty and students of, uh, Bobson. So Oh, amazing. Yeah, we are glad to have them as sponsors. You can go check out their, uh, college websites in the episode description. There's some, some interesting articles there. And, uh, I guess we'll, you know, be keeping an eye out.

Maybe there'll be some news out of a generator soon. And finally, let us actually get into the news starting with the tools and apps section and you're starting with a pretty exciting one coming out of Anthropic. So the headline here is Anthropic's latest AI update can use a computer on its own. So there's a beta of a new feature for a cloud 3. 5 sonnet, which will allow to control a computer by looking at a screen, moving a cursor, clicking buttons, and typing text.

And they call this feature computer use. It's available on the API and allows developers to direct the AI to do this stuff, basically use a computer like a human. So very much on the train of agentic AI, right, where this is essentially a thing we could tell, you know, go and book, uh, tickets to go from San Francisco to Atlanta in November 15 for a week, and this model event, go and do this set of steps where it will open up a browser, go to Delta dot com.

All those steps to actually do it for you. For now, it's, of course, relatively limited. In some ways, it's not able to do things like dragging and zooming. Uh, it's not able to do anything you can do on a computer. But regardless, it is pretty, uh, pretty Pretty open ended, right? If you can look at a screen, move a cursor, click button, type text, there's a lot you can do. And people have started experimenting with it. There's been, uh, you know, failure cases people have shown.

There's been a lot of excitement. So certainly something that I think we can all imagine is going to be an end product of AI is sooner or later, we'll be able to just have AI take over on a computer and do anything. And pretty exciting and interesting to see Anthropic being kind of a front runner on this kind of thing.

Jeremie

Yeah. So there, I, there's so many layers to the story. One of which just at the top level is damn, does this sound an awful lot like what Adept AI was trying to pull off back in the day? I think they might've raised It was like 60, 65 billion. At the time, I will say we've said this a lot in the podcast, these sort of like mesoscopic companies that haven't raised like enough money to be competitive when it comes to scaling are at risk of dying.

I think, you know, this is exactly, I think we specifically called that we would have Companies like anthropic or more scaled companies. Um, essentially just like eating the lunch of companies like adapted eye. That is what we were seeing here. Make no mistake. I think that this is, you know, one, one step in that, that long story. Um, and I think there's a lot going on under the hood here, right? So anthropic explicitly or implicitly has sort of.

don't want to exacerbate the racing dynamics behind frontier AI. Right. That's been a big sort of part of their story. Um, the way they've done that historically is to release models that are at parody or slightly behind the frontier. So they can still make some money, but they're not sort of like, You know, accelerating things themselves. This still does put competitive pressure on open AI, but, but anyway, it's, the idea is to kind of reduce that while still being competitive.

This is a step away from that. It's another step away from that. We've seen Anthropic kind of nudge itself more and more in that direction, which perhaps is unsurprising. The incentives are just there. Um, but this is happening in a context where we know Anthropic is raising around, right? Open AI just raised that 157 billion, uh, dollar, well, at that valuation. It was a 7 billion round. Um, that's what it takes to build the next beat of data, data center, sorry, data center infrastructure.

That's just what it takes. If you want to keep up with scaling. Anthropic has no choice but to compete on that basis. If they want to quarter investors, they have to convince them that, Hey, we are worth the You know, 30, 40 billion valuation that they must argue for.

And the only way to do that right now, given the revenue multiples that they have to argue for, uh, is to, is to come out with something that makes the case that, Hey, we are actually ahead, like you are betting on a potential winner here. So I think what we're seeing here is anthropic, uh, between a rock and a hard place, frankly, being forced to choose a little bit between, do we accelerate things, this will put pressure on open AI to accelerate their own development, release the next version.

And indeed we have of course, heard. Rumors at least that Orion, the next generation of open AI model, which supposedly had been trained on 01's outputs as well, may be coming out sooner rather than later. Not that that's necessarily connected, but also not that it's necessarily not connected. Okay. Another layer to this is the performance, right? So they, they share details about the model performance. This is by the way, called CLAWD 3. 5 Sonnet 2. Brackets new.

So it's actually like, I don't know why they, they did this. This is a fundamentally new model. It is not Claude 3. 5 sonnet old. It is a dip. Like it behaves differently. It's not just a text to text model. It's a, you know, it's, it's much more agentic. As you've said, it takes in the video screenshots of your computer screen and then takes actions. I don't know why this has, I

Andrey

think the way you frame it is the announcement came with a few things. So they say they have an upgraded Cloud 3. 5 SONNET. They actually also launched Cloud 3. 5 HYCU, which is kind of overshadowed, but is part of the news here. And they say that this is a new capability in public beta. So I guess you can still use 3. 5 SONNET on its own without computer use. Or you can use the API to do computer use. And that's powered by cloud 3.

Jeremie

5. So I would presume totally a cloud 3. 5 sauna is like, but it is the, uh, is an agentic model, right? Fundamentally more like a one perhaps than for example, GPT 4. 0. And that's kind of where I think a lot of people, I think rightly have been confused by the naming convention here. Um, and we'll, we'll see if it, if it persists, but at some point you're going to probably need to like distinguish between these in a more fundamental way, like opening IDED with O1 versus the GPT. Well,

Andrey

I think there's a question there to get a little bit more, like we have a benchmark table, right. Or it can compare Cloud 3. 5 Summit. to Cloud 3. 5 SONNET new on the common benchmarks, you know, MMLU coding. So if you just use the API to do some coding, for instance, right, it's not necessarily clear that it's doing anything different from a natural, normal language model when you just prompt it to complete some code. Right.

Jeremie

Yeah. But this is where I'm saying the same is true for opening eyes. Oh one, right? Like you can ask opening eyes. Oh one, just your standard, you know, GPQA questions, MMLU questions, and you can get a score. Um, but it is given a different name because it also has these fundamentally new characteristics. And, and that's, I think where people are kind of saying. Hey, you know what, like we are dealing with something that is agentic, that is fundamentally different in its, in its behavior.

Um, maybe that should be reflected in the naming convention convention. And I mean, I don't know, you know, you could argue over it, but I, I certainly think that that there's an argument to be made there.

Um, it is, um, now, now the, the performance is interesting because that table you alluded to, right, they do break down the performance and they show, you know, this graduate level reasoning, these, they used to be really, really hard questions, really, really hard Questions that like PhDs in, in domain specific areas would struggle with, um, 65 percent for CLAW 3. 5 SONNET new, um, really impressive score. And it is soda.

It is state of the art, same with MMLU, same with human eval, which is a coding benchmark and so on. Um, what they don't show in that table are the scores for OpenAI's O1 model. And that, they, they explain that in the figure just by saying, Our tables exclude the one model family because they depend on extensive pre response computation time inference time compute, unlike typical models, and this fundamental difference makes performance comparisons difficult.

That's interesting, sort of implies this may not be what is going on with call 3. 5 sonnet new. I still think the comparison would be useful.

So just to kind of give you the numbers, um, when you dig up and you have to go into the system card for the open AI, a one preview model, but when you dig that up, um, the performance on, uh, the software engineering, so sweet bench, uh, which is the software engineering benchmark that I think increasingly is like the benchmark to track in the space, um, opening IO one preview gets about 38%. This is before they. Impose a couple of mitigations that do reduce the performance, but about 38%.

Um, the sonnet 3. 5 new hits 49%. That is a big, big jump signaling significant improvements in you know, software engineering capability, which is so important. Given that these labs are explicitly trying to figure out how do we build models that help us. Automate AI research itself and get us that sort of, you know, closer to that, that takeoff scenario where, you know, AI makes itself better, which makes itself better. And so on that that's explicitly being talked about right now in the labs.

Um, so anyway, I thought this is so, so interesting, so many layers to the story. Um, there, there are also questions by the way, about, uh, where is open, uh, open, uh, sorry, uh, Opus 3. 5 in all this whole story, right? The 3. 5 series of models, we have Sonnet 3. 5, we don't have Opus. That was supposed to be the big model that would come out. There's some speculation about, you know, maybe that training run failed, or maybe the economics just don't support it.

Um, so maybe we won't be seeing it at all, and we've seen it disappear from the, uh, the kind of anthropic documentation in this space. Um, last thing I'll note. safety side, right? So we know that Anthropic has been engaged with the US and UK AI safety institutes, doing a lot of kind of like coordination with them. This model, Sonnet 3. 5 has in fact been tested by, it seems both of these institutes. That's the claim that's being made here.

So that would be yet another, uh, Really interesting, um, use of those, those AZs in the development of that relationship. They did find, uh, Anthropic did, that this model does not exceed their AI safety level two standard, which is the same, uh, threshold of capability and risk that the previous SONNET model did.

So that's sort of interesting, um, per their responsible scaling policy, not a qualitative difference in risk, though I will say once you hit ASL three, um, that is already a pretty scary level of capability. So the fact that it's underneath there, you know, may not tell us all that, that much.

Andrey

Right. And on that note of safety, they also do note that this is programmed to avoid social media and election related activities. activities. So you can't use the computer use, uh, feature here to go and spread misinformation, for instance, right? Which is a pretty interesting note on that front where you'd need to impose new limitations because now you can do even less work to do nefarious activities.

And on the, uh, performance comparison front, as you said, I guess a notable bit is on the normal benchmarks. It does better than cloud 3. 5 on it, but the big jump is on agentic, uh, capabilities, right? So you have, you know, a few percentage point here and there on MMLU or GPQA. zero shot, but when you get to agentic coding, agentic tool use, that's where there is the biggest leap.

So training wise, it is fair to say that this is like an agentic optimized model, and that is why it comes with that computer use capability. But unlike O1, By default, the inference isn't agentic when you call the API. So there is kind of a, kind of an interesting nuanced thing here where it's not, uh, I guess a system like O1 that is meant and configured to do agentic reasoning every time.

You can still use it like a normal LM, But it is an NLM that is optimized to be good at agentic reasoning, which I guess is why they still keep it with the same naming terminology and not compare it to O1 directly. Anyway, very exciting, lots to talk on there, but we should be moving on. So next up we have the story about Mochi1. Which is a new model by AI video startup Genmo. And this is an open source rival to Runway Cling and other video generators.

So, this is available under the Apache 2. 0 license, meaning that you can generate, uh, you know, Anyone can use this model for anything, essentially. You have both the weights and the model code to download. And there's also a playground where you can, uh, play around with this. Uh, that will be, uh, launching, uh, also the ability to do higher definition versions later this year. Pretty impressive, uh, outputs for this one. Obviously not quite as good as the other models.

It has some limitations. It will only be outputting 480p resolutions. Later this year, it will be outputting the HD version. And, uh, you know, interesting move by the startup. They don't have a product yet, so they are kind of, front running with the release of this open source model

Jeremie

first. Yeah. And it's kind of interesting because they, they, um, uh, they do flag, you know, you can download the model weights if you want from hugging face, though, it does require at least four NVIDIA H 100 GPUs to, to operate, you know, if, if you want to actually run it. So, you know, if you've got a spare a hundred K just lying around, uh, that, that can be your, your go to, uh, your go to market strategy.

But the challenge here, of course, is people talk about this a lot, but the definition of open source, you know, when you have a model that is so big that it requires your distributed inference, um, that, that like, you know, does that, does that qualifier is, is the barrier to entry just so high, obviously having the model weights out there in the first place, I think is the substantive win here. Um, so we'll be able to see presumably a lot of interesting.

Modifications as well made to that model, like, especially with video generation. I just, I really wonder what, like, you know, what does fine tuning look like? What does the ecosystem of, you know, modifications on top of this kind of a model end up looking like, and, um, I think we'll have a lot of room for free kit for creativity, um, and also for, you know, the automation of an awful lot of, uh, of movie generation, movie production, um, uh, stuff.

So anyway, uh, interesting, uh, new release. Thanks.

Andrey

The first story is about Conva, and they have a shiny new text to image generator. So Conva is a very big tool for those who don't know, used for design tasks, broadly speaking. And they are now launching DreamLab, which is an inbuilt text to image generator. Uh, Conva had acquired Leonardo a little while ago, Leonardo AI, which was a suite of tools for AI image generation. So this is powered by Leonardo's Phoenix model. And as you might imagine, this is pretty much just a texture image tool.

You can generate images from descriptions with various styles like 3D render and illustration. Uh, this is actually offering an improvement over canvas existing stable diffusion based AI image generator. So it has better quality, generally speaking, as opposed to what it had before. And they also have updated its magic AI tool suite, which will do various things like magic write for text generation. So another example of how I guess we've seen this trend kind of all over the world.

Basically any software tool out there is now integrating AI across the board in various ways. And this is, I guess, one of the early examples of an acquisition, a major acquisition of an AI startup making into a very significant product used by

Jeremie

many people. Yeah, it's kind of interesting. You can see the unambiguous footprint of that acquisition all over this. Yeah. And, uh, you know, one of the things that's mentioned just at the very end of this is your users may be disappointed that they're paying increased cost of the expectation is that the cost is going to go up as a result of this. Um, it's, it's so tough in the generative AI era.

Like, what do you do to to increase the, the, the, um, prices that you're going to present to customers, um, versus like how much effort do you put into just reducing that cost to compete? And, um, like these are, this is a lot of new features. It's funny to see the, the sort of skepticism of like, Oh, people, people might be, uh, unsure about paying more for like this giant suite of new capabilities. But the reality is that's, So much of this is getting commoditized, right?

The models themselves no longer really are remote. Increasingly, it's the infrastructure that serves the model that becomes the moat. I mean, we literally just covered that story, right? About, you know, uh, video generation AI, just being open sourced a model. That's so big. It's got to fit on four H one hundreds. Think about how much training budget would have been required for that. Just Just completely open source.

So, uh, the models themselves, less of a moat, at least for these non frontier models. And um, yeah, it like getting harder and harder. It will be getting harder and harder at least to convince, I suspect customers to, to pay big, big amounts of money for this sort of thing. But you gotta do that if you're a mesoscopic company. So yeah. We'll see. And the next story is about canvas,

Andrey

not Canva and canvas. Don't get those confused. I was a bit confused at first. So canvas is a new feature in ideogram, and this will allow users of ideogram, which is another text image tool that is more focused on text. Text rendering in particular. Uh, this will allow users of ideogram to use things like remix, extend and magic fill. So it's the kind of, uh, image generation where you have a canvas, right. Where you can, Oh, that's where they get it. Okay. Yes, exactly.

Uh, so you can essentially make a sort of collage, right? You have, uh, you can expand almost endlessly. You can think of it, uh, some image by, you know, pasting in images, extending on and on. Uh, so this is now launching in that tool, something that I don't think you have in things like Midjourney, uh, at least not natively on the web browser. You're seeing, I guess, continued competition in the space. Yeah.

Jeremie

Yeah. I'm, I'm really. Like curious about when the market is going to decide that, um, image generation has just been solved and then we no longer have sort of like the, the other benefits of, of continued scaling and, and really intense levels of investment. Like, cause at this point, right, we've got, yes, we have some, uh, text to video models that specialize in rendering the texts in the video and all this stuff.

We're already at the point where that's kind of expected from a new release of a, you know, any kind of cutting edge text image model. So I'm, I'm really curious, like, you know, are we, are we saturating the value that can be created with these things? I suspect we're not, there's always room for, you know, you're, you could always be surprised at like how much. Red, additional resolution or additional capability unlocks these new niche applications.

Like, you know, if you want your text to image models to generate like some, I don't know, some like circuit diagram or something, you want to get everything right. But I think for the lion's share of the market, we may be getting into that space where the returns are going to be more limited. I'm, I'm. I'm curious. I mean, I have no idea, but at a certain point, are we, you know, when do we start lighting VC dollars on fire? Um, uh, is kind of an interesting question to track in the space.

Andrey

Yeah, exactly. I think it's, it's interesting. You know, you have a few leading text to image companies out there. Majority ideogram is another one like that. And as with the LLM providers, right. There's not a ton of differentiation. There's some, but not a ton. And it'll be interesting to see. You know, when VVC money is burned, right, then you actually need to get by on revenue alone. How will that competition play out?

And yet again, speaking of text to image, we have another story on that front. And this time it's about stable diffusion 3. 5. So Stability AI, haven't talked about them in a little while, are releasing stable diffusion 3. 5. And once again, it's coming in free sizes, large, large turbo. and medium. The gist of this is that it is much better at photorealistic AI images.

So the comparison in this article is to Flux 1. 1 Pro and it looks pretty significantly better compared to in a sense that it is comparable to flux. I think initially when I saw the outputs of flux on X, it was very impressive. It did make me feel like it is surpassing stable diffusion. So I guess not surprising that we are launching with stable diffusion. To try and keep up, so to speak.

Jeremie

Yeah, it is. Um, so one of these sort of stable diffusion licenses, right? So it's free for non commercial use, including scientific research and for, this is the thing. So free for small to medium sized businesses up to 1 million in revenue. And then above that, you need an enterprise license. So this is kind of the new stable, uh, the new stability AI, uh, approach where they have to monetize somehow, you know, it just wasn't working out back in the days of M. Admus stack as the CEO.

We all remember that we covered it quite a bit here. Um, they were just bleeding money and now they need a way to make it. So, um, you know, this is clearly an attempt to kind of split the baby in that sense. I, I have actually no, um, Conception of, of how that's working. Like, I don't know that I've seen any reporting on revenue that they've been generating from these enterprise licenses, but yeah, again, I mean like a step ahead in realism. That's cool.

I don't know how many more steps ahead are left in the tank before, you know, you, you see a fully open source model that does not come with these requirements for enterprise licenses and then, you know, and then what do you do? Right. So, so that might really erode the, the profit margins here, but still very interesting and important, um, step forward for stability AI, especially if they try to compete with everybody else that's nipping at their heels.

Andrey

And one last story, we are bringing it back around to agentic AI with the announcement of agentic workflows in inflection for enterprise. So actually skipped the story a few weeks ago, I think.

Where Inflection, the company that created Pi and was very oriented towards a sort of consumer chatbot that is emotionally intelligent, famously as we've covered, was kind of acqui hired by Microsoft where a lot of its leadership and also a lot of its people moved over, but Inflection Was not acquired and remained its own company. Well, they did announced a shift to enterprise a few weeks ago with inflection enterprise. And now they're announcing agentic workflows for inflection for enterprise.

So this is, uh, in partnership with this other company, UI path, which is already, uh, focused on this automation of processes in, uh, you know, I guess, company processes. Automation of company processes, and so this is combining the AI that inflection provides with that sort of automation, not a ton of details for me in this announcement by them as to how this is actually used. I imagine this is very much kind of.

In the, I guess, UI or however you want to call it, not necessarily a super general purpose solution, unlike something like O1 that is agentic or computer use by Entropic. But regardless, seems like Inflection is, yeah, very much trying to go for that enterprise play and also trying to go a different path with the announcement of this agentic feature.

Jeremie

Yeah. And they're coming along with the announcement of an acquisition of a company called Boundary List, which I had not heard of before. And, um, the reason is I suspect that they are, have historically done maybe more like robotic process automation stuff like RPA, um, they're apparently, they're described as a team of automation experts, um, with deep experience deploying UI path integration.

So that UI path collaboration boundary list seems to be the kind of wealth, the boundary, the interface between, um, inflection and, um, Uh, and UI path then, and presumably that's their strategy to actually integrate. This blog post does read super, super enterprise y, I will say.

There's this, uh, there's this paragraph or these couple of sentences that I just want to read because they are like the most enterprise y sounding shit that like, so they say, today, the primary benchmarks for measuring AI capabilities focus on IQ. From the beginning, inflection AI saw the value of prioritizing other forms of intelligence and fine tuned our model to embody. EQ. Ooh, okay. EQ. You're different. I know that. That's super cool.

Uh, now we believe another important measure for enterprise AI should be recognized, and we refer to it as AQ or the action quotient. See, these guys are really, really smart. They got all the acronyms. They got the IQ. You got the IQ. They got the EQ. You're maybe not doing the EQ. You better do the EQ. And now we've got the AQ, people. It's AQ time. You've Time to get excited about the action quotient.

Anyway, it was just like, it's the most, like, we're going to coin this phrase just to like, anyway, this is, well,

Andrey

yeah, actually on that front, it's, it's funny because it really highlights that early on inflection very much did focus on this EQ where like, we are making a chat bot, but it is emotionally intelligent. And so we'll talk to you and, you know, understand you as a user, but that's not. Interesting to enterprise, right? They don't care, right? So they have to do a slightly awkward shift of like, okay, forget EQ.

That was the previous inflection that couldn't make money and require Microsoft to buy us out. Now we're doing AQ, which is totally different. It's completely different letters. It's very much rebranding

Jeremie

itself in this, in this move for sure. Yeah. And like it's gesturing at something real, right? Like the action quotient, these are agents. Okay. Like I get it. It's just, it's the most enterprise y thing ever to be like, we need a new, like little coiny phrase to, to say this. I thought that was kind of funny.

Andrey

And onto applications and business. We begin with a big hardware story. It is a 3. 4 billion joint venture to build an AI data center campus that will reportedly be used by OpenAI. So this is a bit of more in the weeds story, not covered in a ton of headlines, but definitely worth highlighting. There has been a partnership between several companies, Cruzo Energy Systems, Blue Owl Capital, and Primary Digital Infrastructure, which will be building this data center campus in Texas.

And this has just been recently announced. Uh, this will be in the city of Abilene, Texas. It's You know, a huge project, almost a million square feet of floor space. And, uh, as we said, a hundred thousand GPUs and Jeremy, I think you have more details that you thought were interesting here.

Jeremie

Oh yeah. I mean, this is, uh, this is like a huge deal. I like screening from the rooftops. So this is. Um, okay. First of all, uh, this is one of, if not the first 100, 000, um, unit B 200 clusters that we're going to be seeing come online. So the B 200, you know, we've, we've talked about, I'm trying to remember when we started the, doing the podcast together, Andre, we We're talking about the a 100 GPU. That was like, that was what he's used to train GPT for.

Uh, then we got the H 100 than the H 200, but basically the same, same process to note was used to make those. Now we have the B 200, the next generation, um, much, much more powerful, much more lift and perform a higher lift and performance. This is the first 100, 000 B 200 cluster coming online. Big deal, even bigger deal. is not just the scale, but for the Microsoft open AI relationship. So this build is coming from a deal between open AI and Oracle.

And it's the first time Microsoft hasn't been opening eyes partner for data center infrastructure provision. That's a really, really big deal. Open AI negotiated apparently directly with Oracle to set this up. And it seems to be part of a view at open AI that Microsoft just hasn't been moving fast enough to provide the kind of data center infrastructure needed to meet the needs. The requirements for the next beat of scale.

So there is that, you know, friction between opening eye and Microsoft around this issue. And apparently Microsoft was informed about the negotiations. So they're tracking. But in reality, as they say, it had relatively little involvement. And so they've got a green light it because of the nature of the agreement between Microsoft and opening. I opening. I can't go out and just negotiate deals with random cloud providers and without And so that seems to be what has happened here.

You know, Microsoft is like, look, we're not going to build these for you, but if you want to go out and negotiate with Oracle, go for it. Notable because Oracle is in that sense, a competitor to Microsoft. So kind of weird. Um, but yeah, so the initial build, it's going to be about 50, 000, uh, GB two hundreds. So. These are, so the GB 200, I shouldn't have said B 200 earlier, GB 200, um, is the, uh, so you have individual GPUs right now.

You put those GPUs together on a motherboard along with CPUs, and then you put those in server racks. That unit is called a GB 200, right? So this is like the, the data center version of the B 200 GPU, that's the, the horsepower that's powering this. And, um, it, it's apparently gonna be. In this, uh, new Abilene facility by the first quarter of next year. That is fast, right? That is Sam Altman trying to respond.

It's in part to like Elon Musk building his 100, 000 H100 cluster faster than anybody had before. Um, the goal is to scale up to a hundred thousand of these GB 200s. By the fall of 2025. So that would be a really, really big deal. Um, there's going to be apparently one gigawatt of energy available there by mid 2026 for context, like you're talking, you know, one gigawatt is, I mean, that's a, that's a big city that you're powering with that, uh, that kind of energy.

Um, so it, you know, there's, there's a lot of power deals kind of, uh, in, in behind here and, um, Crusoe. has a bunch of, um, of, of indexed heavily on the energy side, which is why there's such a key partner here. They're also known as Crusoe energy, just to tell you how energy focused they are. Initially, they powered their data centers using natural gas, um, that burns during oil extraction to generate energy.

To reduce carbon emissions right now, they're looking into other kinds of power to scale faster, right? We've been talking for a long time for, I think, over a year now about the need for nuclear to come online, the need for, for, for natural gas as well to play a role here. But there are all these kind of crucial power requirements to make this all happen. So this is, I think, a really, really big deal for the whole hyperscaler scene.

Um, there's some, a little bit of interesting gossip at the end of the article as well, where the guy behind Crusoe is saying, I've heard about 10 gigawatt scale projects. So like the 10 gigawatts scale project, like for a build out of that scale, um, that is like, that is just a very, very big project. It's the kind of thing you might imagine seeing come online, like 2026, maybe, um, maybe, maybe 2027, depending.

But like, there is nowhere you can find 10, 10 gigawatts of slack in the current U S electrical grid. Um, so it'll be really, really interesting. Like what the hell that involves that, that requires new power, new permits that cannot be done quickly. If you're going to do, for example, even like even, uh, you know, gas, you're, you're going to need. Like three years to do it quickly. Nuclear plants take like 10 plus. So, you know, this, this is a very ambitious project.

If in fact it is going to happen, it's just a rumor. We'll see anyway, all kinds of interesting, uh, gossip and juice in the story. I thought so there it is.

Andrey

Yeah, exactly. The details are a little bit murky, so there's not been like an announcement by OpenAI of any of this as far as I know, it's more like there was a press release by these companies by Crusoe and others announcing the project and there's been chatter and discussion of OpenAI being the end beneficiary, Oracle being involved, Microsoft being sort of Not as involved as you might have expected given the relationship with OpenAI.

Uh, certainly we'll probably be finding out more going forward, given that it would be a huge deal if this is the next, you know, I guess, independent venture by OpenAI to have its own source of compute. Yeah. That is very significant. Next up, going back to Onthropic. And as you mentioned, Jeremy, they are trying to get more money on the heels of OpenAI, getting a ton of it. And reportedly, they're looking to raise at a valuation of up to 40 billion.

So not a ton of details here, but according to a report from The information, uh, according to an unnamed existing, uh, investor who spoke to company leaders, uh, there are talks that are at an early stage that, uh, are hinting at this possible valuation. So it's very kind of fluid situation here. Uh, I guess, interesting to highlight given that OpenAI had valued itself at that point.

pretty ridiculous number of 150 billion, you know, highest valuation for a private, uh, startup, uh, at this time and Anthropic. Yeah, I don't know. I don't know, I guess how to feel about the comparison of their 40 billion valuation to OpenAI at 150.

Jeremie

Yeah, I, I think one of the, the big, um, kind of headline metrics that, that's really interesting here is if they end up going with that $40 billion valuation, right? Uh, that would be a, a 50 times multiple on their gross revenue. Right now that is higher than open AI's valuation of 40. That was from, its its previous round, right? So the argument anthropic must make if they're going to raise it 40 is, um, there's actually like.

We, we have more, more, more proportionately of our potential, uh, or, or of our value is in the, in the future. So we have the potential to like grow faster, be better than open AI. That's going to be a tough sell in the current, um, sort of context, especially where scaling matters so much, right? The mere fact that open, open AI is raised as much as they have de risks them from a scaling standpoint, much more than, than Anthropic as does the partnership with Microsoft.

Anthropic is more sort of at a. Kind of arm's length relationship with a couple of different, uh, hyperscalers, Amazon and Google being the foremost there. So, um, opening eye also has just like a way better financial position, as they say in the article, um, on pace, the claim is to generate about 4 billion in revenue. Um, whereas, uh, it's a bit about five times more than, than Anthropic's current projection.

Um, both companies obviously losing Like just hemorrhaging money, uh, nonstop because that's the nature of the scaling race. So this is like, this is really where it makes you step back and go, okay, this sonnet 3. 5 new, the new model that was released, um, in the context of anthropic historically, not having wanted to exacerbate the race to AI capabilities. Um, It kind of makes you wonder like, Oh gee, you know, I wonder if they normally would have kept this under the hood opening.

I certainly done that with the full version of the O one model, right? Is anthropic being forced to kind of like release things earlier. And this is where you see opening, opening, I doing similar things. Oh, one preview is a preview model. They released it, you know, with all these caveats, same now with the sonnet 3. 5 new. I think we're sort of seeing a bit of a, a bit of a race to, to deploy happening, playing out here in a number of different ways.

Um, I happen to know this is also happening at the data center build level where there's a, there's security and, and robustness considerations that are being short circuited because people just want to get access to those next data center builds. Big, big issue. If you care about us national security, by the way, To deal with exactly this dynamic because this is just hemorrhaging national security value in so many different ways. Um, but yeah, the, there you have it.

I mean, I think this fundraise is a, is a really big deal. I think it's do or die, um, for, for anthropic to, to just keep up. They need to have a clean enough cap table, right? They can't give away like 50 percent of their company to pay for the next beat of compute because then. What are they going to use to pay for the next generation of scaling? Um, so this is, I think, a, a really big question. You know, how, how much will their shares be diluted, um, with these new investors?

What are going to be the strings attached and so on and so forth?

Andrey

Yeah. And I think it's, it's interesting to think of it, I guess, in the context of a broader history or story of this whole time in AI, uh, you know, creating frontier models is a multi, you know, Billion dollar endeavor. Now, VCs are not going to give that money to any new companies. Most likely like inflection got a bunch of money early on. They trained their own model. They're out of a race of frontier models. Now it's the field of players is pretty clear. It's open AI.

It's on tropic, meta, Google, Mistral, seemingly. And that's about it. Um,

Jeremie

I'll happily register the prediction that Mistral is going to be out of this race very soon, um, or at least in the next couple of years. There's just no way. I mean, France wants them as a national champion, but at the end of the day, again, we're talking about, I think in my view, burning VC dollars, um, I think Anthropic is, is actually interesting. Like you can make the case, even though they're not, they're not.

Attached to the hip to a hyperscaler like DeepMind is and like OpeningEye is, their differentiator is actually talent. Like the best researchers in the world arguably are, have flocked to Anthropic, uh, in many cases having left OpeningEye for that purpose. Uh, those high profile departures we've seen over the last kind of year, uh, I think are actually quite meaningful from the standpoint of Anthropic's prospect. So I wouldn't write them out at all.

I'm actually, I mean, I'm an anthropic, uh, staying here. I, I like, I like the stock. No, but I mean, I, I like, uh, I like as a company, um, and I hope they succeed. But, uh, but yeah, this is kind of one of the, the, the big questions here. And a lot of people in the safety ecosystem also going to be, you know, you know, Potentially upset about this pushing forward of the capabilities envelope that is being imposed on entropic by, you know, economic forces. So we'll, uh, we'll see.

Andrey

I don't know. There's a lot of, I guess, philosophizing I could do here in terms of entropic in a public consciousness, much less known, right? Probably has had much less of an impact broadly than open AI. Open AI is a big name. Chad, GPT is a big name. It is sort of in our little sphere of AI news and AI usage that we think of OpenAI and Entropiq sort of together, almost on the same level, because Entropiq is a primary competitor to OpenAI.

Like it is the main one, aside from meta on Google that are not startups, right? That are established companies. So the competition there is very interesting just because if Entropiq can't, you know, survive, then OpenAI is the company that's going to stick around. And, uh, In that sense, it'll be very interesting to see what will happen with this new foundry as well to valuation will be and so on.

I think lying around a related story, at least on the sense of people leaving OpenAI, as you mentioned, we have the news that longtime policy researcher Miles Brundage is leaving OpenAI. He was a senior policy researcher at OpenAI and has left the company to pursue research and advocacy in the non profit. sector. So this is interesting to me because Miles Bondage has been kind of a significant figure in the AI space, uh, in terms of people who express opinions, who weigh into conversations.

And he's been at OpenAI since 2018. He was a research fellow at the University of Oxford's Future of Humanity Institute and, uh, has since, since 2018 been with OpenAI. As per that role, he was very much on the policy front, on the safety front, and, uh, In that sense, it is significant, once again, to see someone this time not directly related to safety, but related to policy, leaving OpenAI to do work from somewhere

Jeremie

else. Yeah, I will say from, from talking to friends at OpenAI, um, Miles, uh, was considered by many to be, uh, one of the, like, kind of true believers in the AI safety. and uh, and a kind of a force for that within the company. So it is noteworthy in the context of the recent departures, you know, we've seen Mira Mira Mirati, who was another person like that. It was gonna, you know, believe by quite a few.

And again, you know, from what I've heard in the company, um, took, took the safety story quite seriously. Um, and then John Shulman, who was taking over the super alignment efforts after Jan Leike left, who also was doing the super line of efforts. Um, it's a lot of safety oriented talent that just keeps leaving and, and that, um, to many has been a source of concern. Um, but yeah, miles put together this blog post and, uh, just kind of laid out his reasons for leaving. It's quite interesting.

And I think there's a lot of, um, you know, you can do some reading between the lines, uh, as, as to what the reasoning is, but his high level bullet points are that he wants To, um, spend more time working on issues that cut across the whole AI industry and have more freedom to publish and be more independent. He does credit open AI with allowing him to do quite a bit of stuff on his own, but it seems like he has his own views about things.

He's also concerned that being at Open ai uh, made it harder for people to take his view seriously since he had a vested interest. I, I think that is a. Um, a reasonable position for, you know, for, for him and for people to, to have looking from the outside in, like, yeah, you're working at this company, you know, you will have those biases. Um, so, so it'll be interesting to see if anything changes in his, uh, in his position, either suddenly or over time as he moves away from it.

Um, and then he was also, uh, especially interested in doing AI capabilities, forecasting and, uh, you know, assessments of, of, uh, regulation, the regulatory environment security going forward. Um, but, uh, you know, he does say, I think opening remains an exciting place for many kinds of work to happen for many kinds of work to happen. Like if you want it to read between the lines there, um, there's a lot of room to be like, okay, so I guess not all kinds of work.

And I guess you're interested in safety work and security work. And so, uh, you're leaving to do that. Um, kind of interesting. And then he's saying, uh, and I'm excited to see the team continue to ramp up investment in safety culture and processes.

So there's so much that you can, you know, anyway, reading the tea leaves is always impossible here, but, uh, uh, at least part of the vibe you could take home from this is there, there may be a bit of concern over, uh, over the same sort of safety and policy narratives that we've seen other people fly or maybe not, but, uh, there you have it. Exactly. So, uh, and I think

Andrey

notable to mention, you know, as, uh, kind of an implication here, right? Anyone leaving, especially someone as senior as Miles Brundage is presumably leaving a very lucrative position to go and in this case, work in a nonprofit. So that tells you a little bit about, you know, what kind of decision this was. Next, more on hardware. NVIDIA's Blackwell GB200 AI servers are ready for mass deployment in December. So that's, uh, the story there. So there's been some issues with supply chain things.

There were rumors of defects, I think we actually covered earlier, but now it appears that they are on track for Q4 of 2024, which As Jeremy said, of course, this is the next generation of notable AI compute, and we'll Start falling out next year.

Jeremie

Yeah, this is, uh, you know, despite initial, you know, concerns about how things were getting slowed down by a whole bunch of supply chain bottlenecks. The big one was the packaging technology they use co auth L. Um, this is like a, it's a new way to kind of integrate together all the little components that have to work together to make a chip work. So, you know, you have your logic and you have your high bandwidth memory and, and how do you connect those two together?

Um, on, uh, what is called an interposer. Anyway, the details don't super, super matter, though. We hopefully we'll talk about them in a hardware episode, which are very overdue to talk about. Um, that's been overcome. And now we're on track for Q4 2024 mass production of two different versions Of this, um, kind of B 200 chip. So, uh, the big one is going to be the Bianca board. Um, this is the GB 200.

Basically, this is where you have one, um, CPU, one Grace Hopper CPU, and then you have Two, um, B 200 GPUs that are connected to that CPU. Those all sit on the same board. And that that board, you're gonna have two of those boards glued together. So a total of four GPUs and those slide into one rack, into a, uh, uh, a, um, sorry, one slot in a server rack. And then you, you would have basically like eight of those slots, or depending on the configuration, more fewer.

But, um, so this, this is, uh, really big news because we're actually seeing this rollout. You know, we're going to be shipped by December and there's, there, um, a whole bunch of early customers, including Microsoft opening, uh, sorry, not open AI, Microsoft, Oracle, and meta, but actually since Oracle is there, I guess, maybe open AI indirectly, um, there'll be the first ones to acquire it. So that's, uh, that's kind of cool. Um, and, uh, Foxconn, uh, is, is working this as well.

Uh, we'll talk about them more in a second, but, um, once you build the. Uh, the actual like chip, then somebody has to turn that ship into a server rack, right? Has to actually like, like, you know, set up the anyway, all that infrastructure. That's what Foxconn does. And so you have the chip fabrication, the chip packaging, and then you have the step that Foxconn does where they basically just like build out the full servers and those get shipped out.

And that is, uh, that's what's being done now.

Andrey

Exactly, and that is our next story that Foxconn is building this facility in Mexico for that bottling step of the GB200 chips. Foxconn, if you don't know, is a gigantic company. They're most well known for being The key manufacturer for Apple products, the assembly step for making the iPhones, and now they are partnering with NVIDIA for this, uh, what they say will be the Ja uh, the biggest manufacturing plant for GB 200 ships, and it'll be in Mexico.

Jeremie

Yeah. And so one of the big challenges, so by the way, it's kind of funny because you'll see a lot of companies refer to in like tech news, uh, where it's like low detail tech news. They'll just be like, yeah, so Foxconn is responsible for building NVIDIA's GB 200 super chips. And it's like, wait, like, what do you mean response? I thought NVIDIA was responsible. And then I thought TSMC was responsible for, you know, So yeah, they're all responsible.

Um, the, the way this works is NVIDIA designs the GPU. Um, so the, the actual like B200 say, uh, it gets shipped over to TSMC. It gets fabricated in TSMC, the, the, the chip, but then it also needs to be combined with high bandwidth memory. Usually you get that from nowadays it's SK hynix basically for, for this, but it could be Samsung or other companies.

And so the high bandwidth memory and the GPU logic get combined together, they get packaged At some packaging facility, potentially in Taiwan, potentially elsewhere. These days for the cutting edge stuff, the co ops L that's mostly Taiwan. Um, and once you do that, now you have your like ready to go GPU plus, um, high bandwidth memory, but you still need to connect that. To the CPU that TSMC or anyway, that some, some process will, will build as well.

Um, and, and create these, um, sort of like, uh, uh, boards, basically these motherboards that are going to contain all the kind of other infrastructure needed to run the chip. Um, and that includes by the way, cooling, so like liquid cooling. Uh, and so that's, what's Foxconn is then going to do. Uh, that's, that's that next step.

So when they say that Foxconn is building the GB200 superchip, they're saying they're taking the packaged chips that they get from like, say, TSMC, and then they're combining them with all the cooling and the energy provision stuff. And then they're packaging that into servers and that's what they built. That's what's going on here. Um, this is a big deal, right? This is a really, really big deal. It's going to be in Guadalajara, uh, which I totally nailed the pronunciation of. I think you did.

We aim to please on the, we want accuracy on this show. Um, but yeah, so it's, it's going to be the world's largest manufacturing facility for bundling NVIDIA's GB200 superchips. That's what they mean. Bundling packaging. No, well packaging. Yeah. Bundling. Assembling. Assembly. Thank you. That's good.

Andrey

And the last story in the section, XAI is launching an API. So XAI has been working on Grok, their Chats GPT and cloud competitor for a while. We've had, you know, Grok 2 recently. And so far the only way to use Grok has been through X, through Twitter. And now they are doing the same thing. The same thing that OpenAI and Anthropic have done, which is provide a means to use VLMs for other organizations through an API.

So, this API will support Grok Beta and will be priced at 5 per million input tokens and 15 per million output tokens. It will have some features like supporting function calling, And, uh, yeah, it is positioning, I suppose, XAI to compete with OpenAI and Entropic more directly, uh, with this step.

Jeremie

Yeah, that happened really fast. I like, I, I can't believe how quickly XAI came online and actually consistent with how quickly Elon built that, uh, that insane, um, uh, cluster, uh, in, uh, in Memphis. Like the, the build speed was apparently, I'm trying to remember, it was like end to end. It was about like three weeks apparently to, uh, to execute the build. Crazy, crazy stuff.

Um, and, uh, yeah, apparently, although it's not live yet, uh, there's documentation that does hint at vision models, uh, that can analyze both text and images. So presumably that's going to come online soon as well.

I'm really curious what, um, what complete like clusterfuck could happen if the XAI API ends up incorporating agentic capabilities and it's native to the X platform, because there's basically no. Like, you know, there's, there's like no, um, uh, how to say they're tightly integrated. It's like there's the bot situation on X will be really interesting, you know, hopefully through the API, they'll be able to have a better, uh, better visibility into at least those bots, but kind of cool.

Um, and it does, so there's, there's some, Uh, ambiguity about which model this actually will be. So it's unclear whether it's like, it's called the Grok Beta, right? But, um, it might be Grok 2, uh, it might be Grok Mini. There's a bunch of uncertainty there, um, but I guess we'll be finding that out, uh, fairly soon.

Andrey

So as a comparison, uh, GPT 4 0 takes 2. 5 for a million input tokens and 10 for one million tokens. So costs less than this Grok beta API, making it, I think not, probably not competitive. I mean, we don't have a fully reliable comparison, but I would imagine O1 is for most people still, you know, preferable or at least, uh, on par with Grok. So I would not see many people switching over if Grok is not able to be competitive on a pricing front.

Jeremie

Yeah, definitely depending on the use case, you know, maybe there are advantages, but, but that's a good point. Pricing is going to be a challenge.

Andrey

Moving over to projects and open source. We begin with intellect one of the first decentralized 10 billion parameter AI model training. This is It's a little bit on the older front, but I think this is one story, Jeremy, that you were catching up on and presumably wanted to talk about because it was very exciting. So this project builds on OpenDialogflow, an open source implementation, uh, of the DeepMind's distributed low communication method for training.

And, uh, you know, what this tells us is, of course, going to training of 10 billion parameter models is a challenging kind of a bottleneck for scaling up is being able to train massive models like this. And the only way to do that right now for things like OpenAI is to, uh, have access to ridiculous data centers, ridiculous amounts of compute with very, very complicated kind of architectures for, uh, combining all of the computer paralyzing training.

ability to do decentralized training across, you know, not one source of compute, but many different, uh, separated areas would allow, I guess, other organizations, uh, companies without as much, uh, centralized, uh, control, you could say to do training as well. So that's the high level summary, Jeremy, I'm sure you have a ton to say. So I'll let you

Jeremie

take it from here. Oh yeah. I mean, so, so this is, first of all, I'm just going to say, this is like the nightmare situation for anybody who, um, Uh, worries about like kind of, uh, weaponization risk and, and, and AI policy trying to control these models. So this is the first time we've seen a 10 billion parameter scale model trained in this kind of distributed way. That's, that's kind of the issue, right?

Like much, much harder to control if, if you know, you've got like a place in, I don't know, Like the Eastern seaboard and then somewhere in Mexico and you're able to train across all that, you know, that's not quite what's happening here, but, um, but kind of on the way there. Uh, so previously, previous to this, you only had like kind of 1 billion parameter models that were done like this, this is a 10 X scaling, um, to, uh, to 10 billion. So, um, D loco.

I think it's, we, I'm trying to remember if we covered this and we may have, this is a, a Google deep mind strategy that was proposed for distributed training. Um, basically you can think of this as like, you have a bunch of local, um, poorly connected clusters of compute and, During training, you basically pass the model weights to all your different clusters. They're going to train on a batch of data that they have locally.

And once they're done that round of training, just like one step of training, let's say, actually be quite a few, but Anyway, um, they have a new model, right? That is slightly different from the old model. There's a difference in the parameter values, parameter Delta. Let's say that is going to be called a pseudo gradient. It's not the actual full gradient, but it's the gradient as computed locally in that one cluster. Um, then they communicate their gradients back up to some central node.

Where you get averaging or pooling together of those gradients to update the overall model, and then that gets sent down again. And all the artistry here is in figuring out how do you set things up so that as much compute as possible can be expended at the nodes without communicating back to do that pool averaging Uh, on a regular basis, because that's really what kills you with latency when you try to do these distributed, um, uh, training schemes.

The advantage here, by the way, is you're not sending like data, like training data back up to a centralized node. So you're able to do this in a sort of privacy preserving way. In a sense, you're just sharing the weight updates rather than the data itself. So that's kind of interesting. Um, but, uh, yeah, so, so they basically doubled down on using this de loco, uh, sorry, de colo. Yeah, DeLoco strategy from DeepMind.

Essentially, that's just a version of this whole scheme where you take advantage of, anyway, special optimizers for local updates. They use Atom and they use Nesterov Momentum for the global updates. That's important because, anyway, it's a way of kind of, since you're getting more infrequent but larger updates, you need a way to sort of make that training stable.

And Nesterov Momentum, which is a more Well, as the name implies, kind of more momentum, um, uh, oriented approach to, uh, more smoothly train at that higher level. I mean, Adam has a feature like that too, Adam W I should say. Anyway, um, really impressive.

Uh, lots of kind of, uh, detail work, uh, involved in setting up this framework, the training framework, which they call prime, um, all kinds of stuff like dynamic on off ramping that they figure out, which is like how to get their compute resources. Um, flexibly to like add or remove computational nodes. So let's say you have like a new GPU comes online and wants to contribute to the, the cluster or the overall training process. How do you ramp it up?

And then how do you off ramp it when it's no longer available? So it's really designed to facilitate this kind of like very dynamic, very messy training. Um, scheme, and they've got all kinds of things like asynchronous checkpointing, like checkpointing, saving the model at different checkpoints in this very sort of janky way. Um, which is, uh, anyway, it's, it's really interesting, very nitty gritty and detailed.

The bottom line is this kind of training scheme is coming huge implications for, for policy, huge implications for open source. This is going to make open source models, uh, trainable at scale much more easily. Um, and it makes it easier for people to pull resources, including computational resources across a wide range of different heterogeneous clusters.

Andrey

So like in practice, what this means is you can combine, compute from, you know, all around the globe and have clusters that are very multi situated to work together, which you must imagine, at least I would imagine, you know, meta, private, public. OpenAI, et cetera. They are trying as hard as possible to have one single huge cluster do all the work. And, uh, it's interesting.

They, you know, go into a decent amount of detail and they also link to a dashboard that you can go to and see actually they have a leaderboard of contributors for, uh, for compute. So they have, you know, at number one is our key AI with contributed at 2, 800. H100 hours. At number five there's Hugging Face that contributed 2, 826. I think someone else's

Jeremie

is up there too, aren't they?

Andrey

They are at 16. We're still only 360, but you know. Conceptually, anyone can join and contribute hours, right? And, uh, you can even look at, there's a training progress bar on this benchmark. That is, you know, out of 1 trillion tokens, it's at 23. 21%.

So, the plan is, you know, you can have let a lot of people pool their money and resources to train, let's say, 100 billion parameter model, which otherwise, of course, you would need to be someone like OpenAI with hundreds of millions to even try to do.

Jeremie

Yeah. And one thing I will say is, um, kind of like globally distributed training, uh, using a scheme like this, not quite yet possible. Like they're, um, so if you look at like, um, the, the big companies like, uh, Google and, and Meta and so on, when they do train across multiple clusters, Even those clusters are even they tend to be very close geographically, like say within 20 miles of each other. Um, and that's for all kinds of latency issues that are currently an open problem here.

So there's like there are limitations, but, um, it's it's the there are places where anyway, the scheme is like just taking the L on that latency and places where it's not. And It's, it's really interesting and it's quite messy. It's, it's meant to be messy, which is one of the characteristics of the open source ecosystem. So, uh, anyway, there's, there's intellect one.

Andrey

Next up, we have a story related to meta. They've released a whole bunch of new stuff, eight new AI research artifacts. So models, datasets, and tools. The big one is an upgrade to Sam. segment anything model. It is now at 2. 1. Uh, but there's a whole bunch of other stuff. So one of them is a meta spirit LM and open source language model that integrates speech and text that would allow for more natural sounding speech generation. They also have a layer skip.

An end to end solution that accelerates AI LLM generation times. They have Salsa, a new code to benchmark AI based attacks on a cryptographic system. Lingua, a lightweight code base for training languages, uh, language models at scale. Uh, and, uh, yeah, a lot of stuff, like three more things. And I don't want to go into detail on every one of them. Uh, but, uh, the, I guess, summary here is they have bundled all of this together, have a single blog post for all of these things.

And some of these like SAM 2. 1, for instance, is very significant. Like that is something that can be used across lots of different, uh, applications, uh, and will have a major impact. The one that maybe we should highlight, uh, a little bit more is the self taught evaluator, which is a method for generating synthetic preference data to train reward models throughout relying on human annotations.

That's very important for, uh, Being able to do our HLF alignment with, uh, better ability to scale. But interesting to see, yeah, just kind of across very different fronts of cybersecurity, uh, you know, multilinguality, uh, alignment. Various sectors meta is contributing.

Jeremie

Yeah. It's sort of interesting that they decided to bundle this all together. It makes it so much harder to report on. Um, but yeah, I agree with you. Self taught evaluator, I think is kind of the at least the standout, um, to me for sure. Um, so they do use, uh, an LLM as a judge to set up reasoning traces, um, which, uh, in, in a, well, they call it an iterative self improvement scheme. Um, and this is something that we see a lot.

So, um, one of the key uses of open AI, sorry, of Anthropx SONNET 3. 5 new that was flagged in there. Uh, release was, hey, you can actually use this data to train models, right? Just start to train agentic models. And likewise, there's speculation, um, that, uh, I think it's probably accurate that opening eyes. Oh, one, uh, model is being used to train or has been used to train, um, the, um, uh, Orion model that is yet to be released, but probably going to be released fairly soon.

Um, and that, that wrap training in September, people think. Um, so, so using that using these kind of incremental agentic models to create reasoning traces that can be used to train even better agentic models, you can sort of see how this gets you into a takeoff scenario, um, is is very much the kind of invoke thing right now. Um, and, uh, anyway, uh, interesting to see meta push in that direction. They were sort of the last.

The last company not to do real intense agentic stuff that we've heard of, though, I guess, yeah, Google DeepMind, we haven't heard of like a, an equivalent to a one, but under the hood, you, you know, you better believe they're, they're developing that too.

Andrey

And speaking of DeepMind, there is the next story. They are Open sourcing their AI text watermark solution, the Synth ID. So we've covered Synth ID in the past. I've announced that it has been integrated into Gemini and various frameworks. Now the paper on Synth ID has been published in Nature, yet another Nature publication for DeepMind.

They say that they analyzed scores for around 20 million watermarked and unwatermarked chat responses and found that users did not notice a difference in quality and usefulness between the two. So that means that you can apply a watermark, there's this invisible kind of trace in the text that lets you tell if it came from an LLM without changing the user experience.

And so, uh, you know, the open sourcing of the solution is pretty notable because there isn't a standardized watermark, watermarking technique for LLM outputs. And, uh, with the open sourcing, perhaps it could be adapted by other organization.

Jeremie

Yeah, and it does suffer from limitation. I mean, this is like one of the things where, especially for for text generation, I just think this is an insurmountable problem that we're going to have to recognize as such. If you write a small enough piece of text, there simply isn't the opportunity to To insert the, the telltale signs that, Hey, this was actually AI generated. Um, they did find that with text generation, their system was, was resilient, sorry, resistant to some forms of tampering.

So they talked about cropping text and the light editing or rewriting. But when you used AI generated text, sorry, when AI generated text was rewritten or translated from one language to another, then all of a sudden you lose that ability. And so that seems like a pretty. Um, like low hanging fruit strategy to get around this and, uh, you know, not, not too surprising, right? Like there's just too many generative AI tools.

You can kind of play them off each other and shake off the watermarks in that way. And so even just going to another tool and saying like, Hey, can you rewrite this? You know, go to, you know, Like the, the, the Google product to get your high quality text. And then you're like, I can now use a low quality model to just rephrase things locally to get rid of the, uh, statistical, uh, indicators, the, the watermarking that's, that's put in.

So, you know, I think this is good stuff, but there's no question. Um, but I, I think it's, it's far from being a panacea and I don't think deep mind is under any impression, obviously that, that it is, it's just. One of these facts of the matter about, uh, you know, the year 2024 that you, you just, you can't expect text watermarking to work.

Andrey

Exactly. And I think as with other watermarking, really what it does is raise the effort required to try and pass something off from an AI. It's not AI, right? So if you take something from Gemini and copy paste it somewhere, you can actually check against the watermark. And if. You don't try to do any tampering. It is now possible to check if it was a LAM generated. So it's definitely useful and it would be great to see.

some sort of standard or just in general for any LLM output to have some sort of watermark. So you can at least check to see if it is, uh, you know, assuming that it hasn't been tempered with, you can verify wherever something is AI or not. Onto research and advancements, and we begin with a paper from OpenAI. It's about a new technique to speed up generation by 50 times, sort of.

So it's not exactly what they do, what they're looking is, uh, being able to analyze and optimize consistency model training. So with diffusion models for generating images, those work really well, of course, but the problem with diffusion is it's a multi step process to generate your image or whatever you're generating via this denoising process that is iterative. And so that leads to some, uh, difficulty in making it very fast.

Consistency models is a general sort of technique that has been explored for a while that, uh, makes it so you can generate something of high quality with very few iterations. Maybe one of, iterations at most. So this paper of theirs is titled Simplifying, Stabilizing, and Scaling Continuous Time Consistency Models. It's not introducing a new technique. It's really bundling a set of ideas for making it easier to use consistency models at scale.

So the, Output of that is to train a continuous time consistency model at a scale of 1. 5 billion parameters. And, uh, as a result, we are able to output very high quality images on this, uh, uh, data set that is very, very close to quality to diffusion models while being Per the article title, you know, 10, tens of times faster, 50 times faster. So, uh, kind of pushing on, uh, Fred of research has been ongoing for a while. I believe we've reported our consistency models in the past.

It's one of the, uh, very kind of widely explored ways to push the speed, uh, of generation times for images. And here opening is contributing to that, uh, kind of drive.

Jeremie

Yeah, I think this is so 1. 5 billion parameter model, um, by the way, that can generate a sample in a 10th of a second on one, a 100 GPU, not an H 100 and a 100, right? So we're really like starting to flirt with that. That 50 X really gets you from, this is totally an accessible It is a, you know, uh, industrial grade thing to, yeah, actually like we're starting to push into consumer grade territory with this kind of scale. Um, which is pretty remarkable.

The other thing you can, you can discern from this, you know, just like back of envelope, uh, math here, uh, 1. 5 billion parameters, one, a 100 GPU and one image. In a tenth of a second. Well, video generation, right, is at usually about 30 frames per second for, for like decent quality. Um, we're already at like 10 frames per second, uh, with a one GPU setup. So kind of interesting.

Obviously there's more that goes into, uh, you know, a video generation model than, than an image generation model in terms of consistency between frames, but still like starting to get into that zone where you're getting to one, you know, one second of generated video per second. Watched so you can effectively stream, um, AI generated video and any further compute gains from that point on go into further optimizations or go into things like, um, you can spend them on biometric responses, right?

Is your face flush or your pupils dilating? How, how closely are you interacting? How are you interacting with the content that's being generated? Like very quickly you get into a space where like you're watching videos that adapt in real time to your. Kind of biophysiological response and, uh, and that's a world that could be very, uh, interesting and also very concerning from an addiction standpoint, like, you know, basically video crack.

Uh, so, uh, anyway, I, like, I think that's kind of frankly where all this is heading, like one way or another, we're going to, we're going to find out what it looks like. But yeah, this 50 X lift, um, pretty, pretty, um, Open AI strategy here to just bundle a bunch of existing techniques and figure out the engineering that turns out to be the big problem, right? Very often you get a nice small scale paper, but the real proof is in the implementation pudding.

And so this is very much an open AI type of solve.

Andrey

Next, another story that is very much of a Jeremy flavor, and this is some research from Epliq AI, who we've talked about before. Most recently, I believe we covered their research on the feasibility of, uh, scaling until 2030 in AI training.

So this new report by them is on machine learning hardware, and basically tracks the, uh, leading, uh, chips or hardware used for training over time starting as back as 2008 with the GeForce GTX 280, up to today with things like the AMD Instruct AMI 325 and NVIDIA GB200 as we've covered and what we get from APOC is a graph of the scaling of performance over time.

So what we are saying is that on average, there has been about a 1. 3 X per year improvement, meaning that the computation performance on machine learning hardware has doubled every 2. 8 years, if you go from 2008 to 2024, and there's all sorts of related metrics there, like. Performance per dollar has improved around 30 percent each year, leaning ML hardware becomes 50 percent more energy efficient each year, et cetera.

So once again, I'll hand it over to you, Jeremy, what did you find interesting in here?

Jeremie

Yeah, no, I mean, I think the, the breakdown of who owns what was really interesting. Um, so one piece is basically the way to think about, um, the industry. And this has been pretty clearly true for a long time. It's just good to see it in graph. Uh, everything is Nvidia GPUs. Except, uh, Google. Google has a whole bunch of their own TPUs. They've been designing them themselves for a long, long time. Um, they've been into liquid cooling long before it was, ha, long before it was cool.

Look at that. Look at that. Well, I didn't even try. Okay, um, and um, Anyway, so, so Google stockpile does include an awful lot of Nvidia hardware, but also their own, their own TPUs, um, which gives them a whole bunch of advantages too, right? They have the ability in house to do their own design. That is really, really hard by the way. Um, design sometimes sounds like, Oh, you know, this isn't that easier. Like you're not even fabricating the thing.

So, but no, no, no, like really, really hard, uh, hundreds of millions of dollars to just get a design operation off the ground. We've seen a lot of startups die on the runway, just trying to do that. Um, the other piece was, How is this stuff distributed among and between the different big players? Basically, it's four big players that own, um, essentially all the, the highly consolidated stockpiles of AI hardware. You've got, uh, Google, you've got Microsoft including OpenAI.

So Microsoft and OpenAI together, uh, you've got Meta, you've got Amazon. Amazon is, uh, in last place as between those, the three of them, not at all surprising given that Amazon has been very, very late to the party on recognizing, uh, the potential of AGI to, to come soon. They kind of spun up a team internally that they call their AGI team. Um, but I mean, I'm, you know, a little skeptical on, on that effort, but, um, that could ramp up.

quickly and soon, but for right now, they are still suffering the cost of having been late to the game. Meta, likewise, um, one might, uh, imagine blaming the ghost of, of Yen Lakun for this, sort of like pessimism. The ghost of Yen Lakun? Not the ghost of Yen Lakun, the specter of Yen Lakun. I don't know. I'm tired. He's still on meta. I'm cleaning up poop all day. I don't know what I'm talking about.

Anyway, yes. Uh, Yann LeCun, um, you know, there, there've been various people have made the claim that, that sort of this, um, sort of later entry, uh, has, has come in part from, from that sort of skepticism, which, which anyway, that's a separate rabbit hole. Um, so then we have Microsoft in second place.

This is interesting because a lot of people, um, and we, we've talked about this on the podcast too, where, um, You know, it's like with Microsoft opening, I like be careful not to poke the dragon because you may think you're in a great position in terms of data center builds and all that stuff. But the real giant in the room with literally double the compute capacity, um, relative to, um, Uh, relative to, uh, uh, Microsoft is Google.

If you measure it in terms of H 100 equivalents, double the capacity, um, most of that about three quarters is made up, uh, of their own internal custom hardware, the TPU. Um, there are uncertainties on all these estimates, but it's clear that Google is comfortably ahead. Part of the reason why they've been able to catch up so fast and so, uh, aggressively. Uh, besides that, there is another Big pool.

It's basically equivalent to like the total amount of compute that Google owns that other labs have. And this is like a whole bunch of people from like, you know, uh, core weave is one of them. Um, uh, sorry, I'm just trying to dig up the actual owner. They just say basically, yeah, Oracle core weave, um, compute users like XAI and Tesla, Chinese companies, governments for sovereign AI, national clouds, that sort of thing. So basically everything else falls into that bucket.

Um, it's just good for like rough order of magnitude, right? So like Google has the same amount of AI hardware or H 100 equivalents as like everybody who's not Microsoft Meta and Amazon combined. Um, that I thought was kind of interesting and good to see graphed out. Um, Epic AI continues to do amazing, amazing work tracking the kind of hardware story here. Um, they're a, a great source of data.

Uh, so highly recommend, you know, if you're in the kind of policy world, I was talking to some people at the department of energy recently. Um, and, uh, just kind of like flagging this, or this resource, it's not, I think it's still quite under, under leveraged. Um, so, uh, highly recommend, uh, checking them out.

Andrey

Yeah, very interesting report, uh, on the number front, uh, I guess it is pretty fun to just say that Google has over 1 million H100 equivalents of compute, actually 1. 25 million. Uh, so if you, if you Like imagine a thousand, a million GPUs, that's what Google supposedly has. And they do have confidence intervals on these things by the way. So I would imagine they don't know the exact numbers.

Uh, but as you said, I think very important to remember at Google, even though I think in AI circles and to some extent on this podcast, we do present them sometimes as being behind open AI, uh, not being kind of. But if we believe the scaling story that, you know, AGI is just a matter of being able to scale at some point, then is there a real competitive advantage that exists? It's hard to say.

And it's important to remember that hardware, access to hardware is a very, very big factor as we say over and over on here.

Jeremie

Yeah. And it's one thing to have the hardware. It's another to be able to like pull it towards a single scaled project. And what this graph doesn't show us, right. It's like how many of these are being used for just like inference, right? Like meta has outrageously large, like high inference, uh, requirements. And so you can't just like throw all these at a training run and they may be geographically distributed, all that stuff.

But if the fact that Google has two X, the compute that Microsoft does. Is true in the pooled training sense as well. That basically means that open AI researchers or Microsoft researchers have to somehow find, um, algorithmic innovations that make up for that two x. That's not at all hard to do by the way. Like you, you, you know, one good idea could get you five x 10 x lift, but it does mean there is a bit of a deficit to overcome.

And, um, and, and that's gonna be part of, you know, what they keep in mind. Can, can their comparative advantage in, uh, algorithmic efficiency make up for, for the deficiency in hardware availability?

Andrey

Onto the lightning round, we begin with something that Jeremy hinted at, some research on being able to improve training of agentic AI. And this is coming from DeepMind as well as CMU and Google Research. The title is Rewarding Progress, Scaling Automated Process Verifiers for LLM reasoning. So one approach to training LLMs for reasoning is actually explicitly rewarding them for correctly reasoning per step. So there's a process to reasoning right through you. think step by step.

And going back to 2022, they had a paper on giving it intermediate awards. So instead of just rewarding the LLM for outputting the correct answer, also reward it for each of those steps in the reasoning chain being correct. And that's what they call a process reward model. So they have published research on this.

You know, quite a while ago now, and initially they showed you could do that for math reasoning, but this is difficult to scale because there may not be sort of a clear way to actually provide that reward per reasoning step, right? So what this paper does is show how you can scale and kind of generalize this whole idea by saying that reward for a given reasoning step should be a measure of progress. Thanks.

So the reward is the change in the likelihood of producing a correct response in the future before and after the reasoning step. And when you take this approach to rewarding reasoning steps, that leads to being able to train more efficiently. You can be 1. 5 to 2. 5 times more compute efficient and actually have better result, a times percent more accurate on, uh, comparison to one other. possible approach of outcome reward models.

So, uh, yeah, it's interesting to see, of course, research on being able to train reasoning, which is, as we found out, one of the very big important ideas in 01 from OpenAI.

Jeremie

Yeah. And, and this is something where, you know, my, my, uh, sort of research attention has been drawn increasingly towards, uh, Um, schemes like this. So, you know, obviously we've been talking about, you know, for, for three years, uh, whatever, like the, this idea, I think one of the first, the first episode, or one of the first that we had, we talked about this idea that you can trade off potentially training time compute for inference time compute, right?

So if you have a, a base model, uh, Um, that's like pretty decent. If you find a way, whether through back then it was like, let's think about this step by step, right? That was the way to get the model just to generate more tokens on the way to producing the final answer. Uh, the fundamental thing you're really doing with stuff like that is you're finding a way to just spend more flops, right? More computing power at inference time to chew on your problem.

And that can give you a really big lift it turns out. So then the question becomes, what's the optimal way to do that? Right. What is the strategy that allows you to spend your flops intelligently? And this is one of the things that this paper is trying to do. Um, the part of the reason that I'm especially drawn to this is I think that this is, uh, directionally what is happening with opening eyes. Oh, one, I think it's really interesting. This paper was published.

Um, and, uh, and it gives us a bit of a glimpse as to how you might pull this off. So, uh, first of all, like what they do here is they start by collecting a whole bunch of query solution pairs for, you know, Um, for a given problem. And for each of those pairs, they're going to actually generate what they call a Monte Carlo rollout. Basically like, um, uh, have a full reasoning trace that they're going to produce. It's going to be a crappy one at first.

Um, it's going to be the base model, just trying to do that. And then for each of those, uh, individual steps, they're going to, they're going to kind of go, okay, well, uh, for this step of the rollout, I'm going to generate like a hundred different possible rollouts from that step. So, so step number one, Um, you get Okay. So you get step number one, give me a hundred different possible steps. Number one to solve this problem. And then for each of those, um, go to the next step.

Okay. Give me a hundred different steps. Number two that I could do now, obviously there's some pruning that happens because very quickly that becomes unmanageable, but fundamentally what they're trying to figure out is, um, for the cases where you get correct solutions, what are, um, like you can essentially measure the value. of a given step, uh, based on the fraction of solutions that it spawns that end up being correct.

And that allows you to assign this, this Q value, basically like a, a, um, yeah, a correctness value or worth worth value to that reasoning step. And you can use that then to train Um, actually, sorry, you can use that to detect, okay, when do we see big drops in that Q value? So in other words, when do we see a really good reasoning step? So let's say step three, most of the solutions that respond from step three lead to the right final answer.

Um. Let's say you go to the next step and you see a big drop, right? Well, that tells you that you're, you went from a once promising line of reasoning that has now turned bad. And at those steps, they're basically going to do more rollouts to really understand and get a more accurate Q value or a more accurate assessment of the worth of those, those steps where reasoning hits a wall.

So the model can better understand these, what they call pits, like reasoning pits, where the reasoning just falls off, uh, falls off the edge of a cliff. Because that those have disproportionate learning value for the system. And so that's how they're going to train their sort of, they call it a prover model. They train that prover model.

And now essentially what that model does is it's really good at looking at a particular reasoning step and assessing, guessing how good of a reasoning step that is. They then use that anyway, to train the actual language model, the base policy. And, uh, and then they do a Anyway, more of that branching strategy with the base model. Uh, they, they take an, a language model. They generate like N different first possible lines in a reasoning trace.

They score those lines using their prover model to know which one of these looks most promising. And then they'll pick the, you know, the, the most promising ones and then generate a bunch more next steps from those. And again, keep pruning and generating and pruning and generating. And then they'll Until eventually they get to a stopping condition. So this is, I think, potentially architecturally similar to what's going on with opening eyes.

Oh, one, there's a lot of speculation about what exactly is under the hood. But to me, this paper read as something that could well be an important ingredient there. Um, which is why I thought it was worth flagging and definitely, you know, a great strategy for just plowing more compute into inference time. And doesn't this look a lot like. Alpha, um, alpha go, right?

Doesn't it look a lot like, um, all the, the alpha series of models where you're seeing that Monte Carlo tree search kind of like explore the, the next possible steps, prune, explore prune. Um, I, I think that's, that's a big part of what's going on here. It's known that reinforcement learning is playing a role in a one, obviously opening eyes been upfront about that. Um, I suspect that the scheme is something like this, uh, at least qualitatively.

Andrey

Yeah, I think it's, it's a good comparison point, right? In that sense, that reasoning in general, in O1, and I guess the paradigm for inference time to compute is essentially give the LLM steps of reasoning, and at every kind of step, it outputs some chunk of thinking, and then at the end, it outputs a solution based on all those chunks combined. So in that sense, it's similar to something like AlphaGo, where You know, you think one step ahead, two steps ahead, three steps ahead.

And in something like AlphaGo, you would train to be able to, uh, know, I guess, for every step, how it impacts the eventual outcome. So there is a comparison to be made there, for sure. Okay, and one more story, this time on inference time scaling. The paper is Inference Scaling for Long Context Retrieval Augmented Generation.

So between all augmented generation rag, we haven't talked about it as much recently, but it is important to keep in mind in the sense that it is one of the kind of key techniques being applied and used in various, uh, areas. These, my impression is when you talk to enterprise, when you look at companies, what they're spending money on, A lot of it is the structural augmented generation to be able to, uh, have the context of whatever documents you have or whatever the user is doing, et cetera.

RAG is a very important thing to use and optimized. So, This paper is looking into that, in particular, long context RAG, and exploring whether you can improve the performance of RAG via inference time compute. So it answers how can RAG benefit from scaling of inference time computation, and if the optimal test time compute allocation for a given budget can be performed. predicted.

As you might, uh, guess, they do actually have some answers to those questions, and they show that scaling up inference time compute on these long context LLMs can achieve up to 60 percent gains on benchmark data sets compared to standard retrieval augmented generation.

Jeremie

Yeah, I think this is another one for the kind of inference scaling. To me, inference scaling is Um, maybe the most important line of research currently happening, at least out, out in the open, the stuff we're able to see. Um, and, and this is even though it's rag, which I, I don't think is the future. Um, it's still very, like, we're still learning a lot of interesting things about how inference time compute scales here.

So, uh, typically what you do when you want to scale up inference time, compute for rag, right? Which again, is this idea of, um, Like having a model that reaches into a database to pull up relevant, uh, documents to inform its output before just spitting out an output. Um, the, the way you usually scale this is you just get the model to like call up more documents, right? That's one easy way to scale up your inference time, compute, put more tokens in the context window where.

Those tokens are just like the text of more documents. The problem with that is, although it can lead to performance gains early on, pretty quickly, like your documents are just adding more noise to the context window than they add value. And so your performance can actually drop above a certain threshold. One strategy that was used to solve for this is demonstration based rag. So basically this is like, fine, if we can't just like. Like throw a bunch of more documents in my context window.

Maybe what I can do instead is include a bunch of examples of successful retrieval and generation tasks performed in context. So this is basically just the same old few shot learning strategy we've seen before. Show your model exactly the kind of outputs you want, the ways in which you would want it to consider the documents that you feed to it before. Put all that in context, literally show it fully worked out examples of.

Other, uh, rag queries and the responses and the documents that were used to, uh, drive those responses. And, and hopefully by including more of those few shot examples, you know, now you're cramming in more tokens and those tokens offer information about the kind of response you want. So maybe that'll work better. And you do see that, um, scale a bit, but it's, it faces a similar problem where there's only so much additional information that you can get out of that process.

And so they, they introduced this new approach where it's like, okay, you In addition to increasing the number of documents in my context window, in addition to increasing the number of examples of this kind of rag task, maybe what we can do is find a way to take our input query, right? And assume it's a fairly complex query. I don't know, like do some analysis of a whole body of documents, break that up into simpler sub queries and answer those using Interleaved retrieval.

So it kind of like the way an agent takes a complex task and breaks it up into subtasks for execution, except, well, and that's actually exactly kind of what's happening, right? This is sort of an agent where the action space is limited to just retrieval. That's all the agent can really do.

Um, Um, but but still you're kind of outsourcing the breaking down of the problem into smaller subcomponents to the agent and by doing this iteratively, you can construct essentially, it's like a reasoning chain with interleaved document retrieval that kind of bridges the gap, the compositionality gap for these multi hop queries that have many, many different steps. And that allows again, your computer, your computer cheese, it allows your, your model to kind of invest more.

Uh, flops into understanding the problem before it generates its final output. And what this does is you're left with three different parameters that you can now scale, right? So we can think about scaling the number of documents. We can think about scaling the number of in context examples, like that few shot learning thing we were talking about. And then we can think about scaling the number of these kind of reasoning steps, how much we allow the model to chew on or parse the problem.

And it turns out that if you scale these three things together, right, then you get these almost linear improvements in RAG performance as you scale compute exponentially. Basically, you see a power law improvement, um, just like the scaling laws for, for training and, and the other inference scaling laws that we've seen. So I thought this was really interesting, right? Because it's this sort of messy way of looking at scaling laws. It's like we have these three different kind of.

Things that we can jigger around and in a janky way kind of you end up reproducing the same Pareto frontier, this like linear, um, uh, in log space, linear improvement, uh, in, um, in performance. And so I thought that was really interesting. Again, more of a hint of the.

Some have argued the almost physical fundamentality of the scaling laws, like this idea that if you find a consistent and robust way to funnel compute into solving your problem, you will find a predictable pattern emerge in the form of a power law, in the form of these, these sorts of scaling laws. So really interesting paper. Um, that again, I think just bolsters the argument for the robustness of scaling laws when it even applies to a domain like rag.

As long as you frame the problem the right way and look at it from the right perspective, I just think this is really interesting.

Andrey

Yeah. And the skill here is also a little bit interesting. So they kind of combine everything into this one metric that is the effective context length. So the effective context length is basically how much. is in your input. How much text are you processing? And as you might imagine, if you have more examples, if you're querying for more documents, if you're doing more iterations, all of that winds up resulting in just more text in your input prior to your output.

So that, uh, law of that empirical observation we're seeing is you can scale context length, you know, from 1, 000 to 10, 000 to 1 million with a combination of these techniques or these individual techniques. And in general, you see an improvement, uh, of performance that is semi normal. Although I will say looking at this graph, figure four, You know, there's a lot of averaging going on to get to that empirical law.

So it's not quite as, um, let's say elegant as normal scaling laws for perplexity, which is much cleaner, typically speaking.

Jeremie

Yeah. Like what they're doing is they're kind of like doing a A scan in parameter space, they're like, so you've got these three different things that you can scale. And they try to like do all the different combinations of those three different things. And then for each effective context length, they look at the combination that performed best. And if you just look at, at, at those, you get something quite linear. And so, yeah, this is the kind of jankiness that I was talking about.

Like it's, like, It's very empirical. Um, but it, Hey, I mean, it is a straight line. Like the Pareto frontier of capability is this interestingly straight line and log space. So that's the kind of robust truth that seems to be popping up again.

Andrey

Yeah, that is true. All right. Policy and safety. We begin with Anthropic and they have announced an update to their responsible scaling policy, which you've been mentioning a few times on this episode. So I've topic has had this response. flexible scaling policy for a while, where they say, you know, we will responsibly, uh, move the frontier of AI in a way that ensures safety and kind of minimizes risk.

And here they introduce a little more nuanced, you could say, they say, uh, this is a more flexible approach to managing AI risks while maintaining a commitment to only train and deploy models with adequate. safeguards. So this update policy includes new capability, thresholds, refined processes for evaluating model capabilities and safeguards and new measures for internal governance and external input. So,

Jeremie

um, there's, there's a couple of things that, uh, that sort of like jump out at me. Um, so first of all, they're including a little bit more information on two, sorry, uh, on one of the, AI safety level thresholds they've defined. So Anthropics whole approach is based on AI safety levels, ASLs. These are inspired directly by Bio biosafety labs, which, which have biosafety levels, BSLs, um, BSL one, BSL two, BSL three, and so on.

Um, so what they're doing is they, they set a bunch of capability thresholds and they're like, okay, so if you, if you, if you meet this capability threshold, you are an A SL two model, which means that we will apply these. Um, mitigation strategies. And what they're doing here is they're telling us more about their ASL three level. This is essentially like, you can think of it as like, okay, WMD risk is starting to become a thing now. That's the ASL three, um, idea or basic concept.

They're saying stuff like, look, once we get there, um, access controls to, uh, the development context, um, uh, Sorry. Um, yeah, sorry. That's it. We're going to introduce access controls like a tiered access system that allows for nuance control over safeguard adjustments.

If you're going to go in and tweak the like, you know, the wrappers, the fine tuning, the whatever, whatever the, the, the alignment pieces are, you need special access and there's a whole, they want to set up a whole system such that one rando can't just go in and make tweaks. That makes a lot of sense. Um, there's stuff like real time prompting completion classifiers. Um, A lot of this stuff is, by the way, things that they're almost surely already doing right now.

They're just making it clear that like, yes, this will be part of the cocktail once we get into this space. Um, and then like post hoc jailbreak detect detection. So, uh, after a jailbreak is discovered, uh, having the ability and the infrastructure to go in and rapidly fix that, but also having. Uh, kind of real time monitoring of systems is, is up there. So nothing too surprising in terms of what they're going to do at ASL three.

I think one of the most interesting things here is the absence of updates to ASL four. Now, a bit of inside baseball, but, uh, right now there is genuine confusion at the labs, uh, overall without being without naming names, but like there is genuine confusion about what it is. Uh, security and safety protocols to even recommend looking at that level. Once you, once you have models that can self exfiltrate, uh, do autonomous research, which we are not necessarily very far away from.

In fact, um, Sonnet 3. 5 new, uh, can already do some basic research tasks. And when you talk to researchers, like anyway, the more will be written about this, um, at least from my end in, in the next couple of months. But bottom line is. Uh, we're not necessarily far from it, and there really, really, really is no plan.

And I think that's one of the things that becomes more and more obvious with every passing month and quarter, uh, as Anthropic does not release ASL4 plans that are kind of concrete and actually deal to their satisfaction with the, the whole issue. And I think it's actually, uh, funnily enough to Anthropic's immense credit that they don't try to snow the, uh, the, the population and just say, like, Hey, like these are the mitigations that we have. Those are going to be fine. Don't worry about it.

They are essentially implicitly recognizing that this is some, like an unsolved problem. Uh, we're not necessarily seeing the same thing from other labs. And, uh, it would be nice to have some clarity around like, yeah, what, what those thresholds are.

And then like, Are we actually can feel like what happens if we are confused about the strategies if we don't have the techniques we need to secure those those models the fact that asl4 is not on this list of updates um is not a coincidence and and ought to be read into to some degree in my estimation um but yeah then the other thing is that they're just

sharing like two different things that would require updated safeguards one of which is autonomous ai r and d so if a model fails Uh, meets that standard, then they're saying, okay, this would be a, a, an additional thing on top of ASL three in its current configuration. And then the second was C burn. So chemical body biological, radiological nuclear risk.

Um, if a model can meaningfully help somebody with a basic technical background in creating or deploying those sorts of weapons, then we need enhanced security. That basic technical background caveat is relevant because we've actually seen models can already do that. If you have a PhD opening, I released a report about that, that we talked about a while ago.

Um, even today's models, even yesterday's models could meaningfully increase the probability with which somebody with a PhD could, uh, design, uh, a, uh, a bioweapon or, or, or make progress towards that. So, um, it's sort of interesting to set that bar. I think that's the right bar to set. Um, like if you, if you have only a basic, if you're an undergrad or something in, in chemistry and you're able to use these things to make progress, then yeah, that, that would be a meaningful shift.

Andrey

Yes. Yeah. So a lot of, I guess, nitty gritty is the summary I would say. And it, this is coming, you know, the initial announcement of a scaling policy was in September of 2023. So I think there's probably no specific kind of cause of this. It's probably just that it's been a while and they have found various ways to iterate and refine the scaling policy. They kind of summarize the updates in this blog post, but also linked to this PDF that has the entire detail.

It's like 22 Pages of very specific, uh, details as to these thresholds, processes, et cetera, of how they plan to do it safely. So, uh, you know, good to see them continuing to clearly care about this. And, and, you know, it is very apparent that internally you have been anthropic. This isn't something that is, uh, just kind of conceptually, uh, known but is also something that is actively being practiced.

And actually one more story on Froppik next is them testing an AI's capacity for sabotage. So they have released some research, sabotage evaluations for frontier models, and have looked at four different types of sabotage tactics. Human decision sabotage, code sabotage, sandbagging, and undermining you. Oversight. Uh, the high level idea, of course, is to make sure that I can't, uh, find its way into breaking rules or oversights, right?

So if you can sabotage limitations, sabotage human oversight, you can then do things that we'd rather you not do. And the high level conclusion from various tests on these different types of things is that current models pose a low risk in terms of these kinds of capabilities, but it does warn that we do need more realistic evaluations and shorter mitigations as the, uh, Current frontier models continue

Jeremie

to improve. Yeah. There's an interesting quote in there. Actually. Um, we, we consider it plausible that in the near future, the widespread deployment of a misaligned and sufficiently capable model, especially in important organizations could lead to catastrophic outcomes. And so I think that's kind of interesting. Um, the near future thing certainly aligns obviously with like anthropic said a lot about, you know, three year timelines and maybe in fact, even shorter, um, like this is.

them designing the evals that are going to apply to basically like AGI level systems to detect things like, uh, yeah, sabotaging organizational actions and, um, and, and hiding behaviors until development and deployment. And this is something where if you talk to people who worry about evals who, who do work on evals, like the, the frontier eval organizations, um, Apollo is actually a great example. We talked about their research before.

There's a lot of concern over like the next beat and the beat after that being sort of where we start to run into these issues, um, in, in practice. Now the, the challenge is you're, you're talking here about how do we assess whether a model that could be super intelligent, um, that could be more intelligent than, than us have better situational awareness and so on.

How can you tell if that model is, for example, trying to withhold capabilities, not show us that it has certain dangerous capabilities that we would beat out of it before deploying it, right? So, in general, it's just really hard to take a human and get them to show you that they don't know how to do something. Um, you know, because they can always fake that, like I can know how to drive a car, but I can pretend to not know how to drive a car. So how do you actually solve that problem?

Um, in a context where the model is actively trying to hide this from you. Now, their solution is today's models may be able to do some of this stuff. And as models get better, Um, they, they may be able to do more of this stuff, but today's models are not yet able to hide the fact that they can do stuff or they don't have a propensity to want to hide those things. That may change over time as you have things like power seeking, um, play more and more of a role.

Uh, and there, there are indications of, of like similar things like this. If you look at the opening, I O one system card, there are very unambiguous, uh, power seeking behaviors that, that were revealed there. And there are all kinds of cases where opening eyes Oh one would like, you know, take control of a local software environment on a machine. And, uh, deliberate or in some cases deliberately to try to like hide the fact that it's doing certain things.

And so this is not at all like this is very much like the next beat. Um, you're going to have to start to think about this sort of thing. Um, so the kinds of experiments they're doing on here are focused on eliciting capabilities. But you can't do that if the model is actively trying to sandbag you. And so, you know, the, the idea here would be if we suddenly see capabilities disappear, right?

Presumably if all the models up to one level of scale were able to do a thing and now they're just not, you know, maybe that's a bit of a giveaway too. So there's kind of a metagame going on here, but they do all kinds of experiments. Um, they run in one case, uh, they have a complex business decision, uh, that With a whole bunch of pages of background information, and they ask a human participant to, to make the correct business decision with a model's assistance.

And then they tell the model to steer the human towards the wrong answer. Right? So, and this is just a capability assessment. They're explicitly telling the model to do the bad thing. It's not like the model spontaneously trying to undermine the human here.

Um, All their experiments kind of generally take that, that form where they're explicitly asking the model to do something that's bad and concerning in the hopes of discovering, okay, it can do this before we get to a point where models can both do the thing and hide the fact that they can do the thing. That's sort of the, the meta strategy here anyway.

Andrey

Right, yeah, exactly. And this paper is like 60 pages long, very detailed set of experiments with a lot of interesting details you won't be able to get into. To expand a little bit, just so it's clear, so they evaluate these four categories of tasks, human decision sabotage. Each of these comes with, uh, the, uh, thing that the AI is trying to do, the bad thing. So like steering a human to bad decisions without appearing suspicious is human decision sabotage.

Code sabotage is intentionally inserting bugs into a code base. Sandbagging is pretending not to have dangerous capabilities during testing. and undermining oversight is misrating another model's actions without detection. And then they are testing to see for these kind of bad things, can the AI, um, kind of get around some kind of oversight. So for, Code sabotage, you know, there's human code review and audits. So can the AI do the bad thing and not get caught by the oversight?

Um, for sandbagging, it's capability evaluations and stuff like that. So as you said, they intentionally try to prompt these kinds of behaviors. I found that right now, Even when directed to do so, the models aren't very effective at like, you know, secretly being bad, which I guess is good for us. But, uh, you know, they'll continuously presumably be looking into Whoever models are capable of being misleading or secretly nasty.

Jeremie

And like the, the highlights in terms of the, um, the capabilities that were revealed or the, the data and you're right, it's like a 60 page paper. There's a lot going on. Um, so they did find that, uh, even current models have non trivial performance at sabotage tasks. Um. Um, even though they often don't fully succeed. And so for instance, uh, they had this human manipulation evaluation and we actually just talked about that, right?

Like where the model is trying to steer the human towards the wrong decision based on a body of text that's like really big and too big to peruse. Um, but this is the kind of setting you could imagine these models being used for in the near future. Right? Like. Humans just can't read 300 pages of stuff in five minutes, but we have to make a decision based on that material. Um, there they were able to successfully steer humans towards the wrong decision 56 percent of the time.

Um, uh, and, uh, yeah, so this is pretty, you know, pretty impressive metric. Uh, there were code sabotage tests where the model showed that they could insert subtle bugs and vulnerabilities that could sometimes evade detection. And, um, and the sandbagging evaluation did show that models could strategically hide capabilities during testing while retaining them for later use. Didn't always work, often was detected, um, but you are seeing some liftoff on those metrics already.

So kind of, uh, kind of interesting.

Andrey

And onto a lighting round, just a few more stories. The first one is about OpenAI asking the U. S. government to improve five gigawatt data centers, according to a report. So reportedly, There's been a plan proposed to the White House for the construction of these five gigawatt AI data centers across various U. S. cities. And, you know, five gigawatts, that's equivalent to about five nuclear reactors, potentially requiring more energy than an entire city or about three million homes.

So, you know, kind of a big deal. Uh, I'm sure this is presumably, you know, in early stages, right? This is a plan that they're seeking approval of, but, uh, does, goes to show the kind of ambition of OpenAI when it comes to building these kinds of data centers.

Jeremie

Yeah. This is part of Sam Altman's like super, super ambitious game plan, uh, Um, you know, we've heard him talk about 7 trillion for, for chips. Well, uh, you know, this is, this is that, but for domestic energy, uh, energy demand. Um, so apparently there was a meeting at the white house and, uh, Sam Altman was there. A bunch of other people were there. They should opening.

I shared a document that made this pitch for the security benefits of building five gigawatt data centers in various U S States. And, um, uh, the claim is that the, so, so the, Uh, like number of data centers that might be contemplated here is five to seven. Although the document that was shared with the white house didn't actually like provide that number.

Um, so they, you know, they, they, they talk to some energy people and there's this, this quote here, you know, whatever we're talking about is not something that's, uh, not only something that's never been done, but I don't believe it's feasible as an engineer as somebody who grew up on this. It's certainly not. possible under a timeframe that's going to address national security and timing. And for nuclear, uh, like my current sense is that that absolutely is, is true.

Um, the problem is that the United States does not know how to build nuclear plants in less than like a decade. And by then, you know, five gigawatts is just not going to cut it. Or, or even, you know, you have five to seven different data centers that are each five gigawatts. That's not going to cut it. Uh, you're going to need something more. Um, and for, for context, like for the scale, the U S currently has about a hundred gigawatts of installed capacity from nuclear power. Right?

So when you're talking about, we're just going to add basically like 30 percent to that all for AI. Um, Uh, that is a tall ask. It's the sort of thing that happens when there is a national security imperative, that is something, you know, like, like genuine acknowledgement at the executive level of the WMD character of this technology. I mean, I think that's there. I think it's an easy case to make on the technical merits, but it's just not in the Overton window right now.

I think eventually this stuff is going to become. you know, very, um, uh, let's say a popular thing to argue for. Uh, we're just not there quite yet. By the time it is though, the problem is the window will have passed long past to build this infrastructure. So I think that's a bit of the paradox that we're in here. Um, and, and somehow, you know, opening eyes, you're going to need to find that, that five gigawatts, uh, in using a mix of other things.

And I don't think, I don't think it's all going to be nuclear.

Andrey

The ability to even create a data center of the size in a given city is. Not easy, right? The, the energy grid of the U. S. is not set to easily enable the creation of these kinds of things. And it'll be very interesting to see how OpenAI, I guess, with the collaboration of the government, will try to make that happen. Next, there's a story about the U. S. reportedly prompting some dealings between TSMC with Huawei.

So, uh, TSMC has actually denied these allegations just recently, but, uh, the information did report that apparently, uh, there have been some dealings between TSMC that the U. S. has been looking into. Uh, Jeremy, I think you have more details on this one.

Jeremie

Yeah. One of the things that we've been talking about over the last few, few months, I want to say the last two years almost, is this idea that yes, you may have export controls that make it illegal for, you know, um, for Nvidia to sell chips, for example, to some Chinese firm. Um, uh, But, uh, these export controls often have loopholes that allow like middlemen to step in. Um, so they're not necessarily loopholes.

It's just enforcement is really hard when you have companies that step in as middlemen to collect these, uh, these chips and then, and then kind of resell them into China. Um, we've seen this with a lot of like Singapore based companies. I think there was one Australian one that we talked about. Um, so this is like happening and it's happening at some scale too. Um, so, so these, these sort of like leaky export controls are a big problem. And it seems like.

What may have happened is TSMC may have failed. This is the claim, uh, may have failed to perform due diligence on orders to prevent Huawei from getting chips indirectly through these intermediary companies. And um, if that's the case, then, then that could be a real issue for, for TSMC. I mean, like the, look, the leverage situation here is really complicated. The commerce department, um, is doing this investigation. There's no clear timeline.

What happens if they decide, okay, yeah, you screwed up. Um, Okay. Here's your, here's your optionality. Uh, the hard line option of like, we're going to cut you off from American companies is not there. Obviously TSMC knows that Biden administration knows that like we need to keep buying those TSMC chips. The only place we can get them.

Um, but there is a bit of leverage in that the Biden administration is scheduled to provide about 7 billion as a subsidy to TSMC to support their build, uh, their Arizona fab. And so, yeah, that's a bit of leverage that you can get there. Um, like, do you want to exercise that leverage? Cause having that Arizona fab is also critical to us national security interests.

So like, you know, at a certain point, it's a bit of a game of chicken that's shaping up here between the department of commerce and TSMC. Um, but, uh, yeah, the, the question by the way, specifically here is whether TSMC made, um, chips that were used in the mate 60 smartphone. So, um, you know, not necessarily.

Uh, well, so, so there, there is the AI chip story as well, but the, the, the mid 60, um, is sort of the, the headline thing that was, uh, announced when I think Gina Mundo the, uh, commerce Secretary was visiting China, kind of like rub it in her face a bit, um, by Huawei. So, so there you go. A lot of inside baseball and I, I, I think a really interesting leverage question that, uh, the commerce department's gonna have to answer.

Andrey

And on to the very last story, once again, dealing with hardware, TikTok owner Biden's taps TSMC to make its own AI GPUs to stop relying on NVIDIA. So this was a surprise to me. I didn't think this was a possible, I guess, development, but yet reportedly they are developing two AI GPUs that they say will enter mass production by 2026. Uh, and ByteDance is not, you know, a hardware company. They are a software company.

Uh, they say that these are designed by Broadcom and they want to get these into production because they have spent already 2 billion, more than 2 billion dollars on over 200, 000 NVIDIA H20 GPUs and they want even more. So that's why they're looking to actually manufacturing their own.

Jeremie

Yeah, I think this is, this is kind of a weird one. I'm still trying to unpack this. Uh, this obviously it's a brand new story. So, but, um, so, so this is weird. Uh, you're, you're absolutely right to be like, Oh, weird. Like, uh, ByteDance is contracting TSMC. Uh, you know, I like, I, well, like what the hell's going on here? How, how is this allowed? Um, By the way, just to be out, ByteDance is TikTok. Good call. That's true. Yeah. Yeah. Not necessarily obvious to everybody. Um, yes.

So the big question is, you know, so, um, right now the U S is trying to prevent China from being able to access like leading edge nodes. This means basically fabrication processes that allow you to design chips down to very, very high resolution.

Right now, the leading nodes at To TSMC or like the three nanometer node, and, and that's being used for iPhones, uh, the five nanometer node and, and the four nanometer node, which is really secretly kind of just a five nanometer process in the background. Um, these things are, uh, are, are the ones that are being used for, for AI chips. Now, I was surprised to find that actually ByteDance is looking to use the, um, the, the four nanometer, five nanometer process nodes to build this new chip.

alone is kind of interesting and, and, uh, probably cause of a fair degree of interest. I would guess on the U S government side, my suspicion is that they're going to have to, um, be designed. These chips are going to have to be designed such that they do not cross the export control ceiling that roughly is defined by the age 20 chip. So this may just be a matter of fab capacity.

Um, that bike dance doesn't like, can't get all the capacity it needs by just going to SMIC, like the, uh, Um, uh, China's domestic, uh, and crappier version of TSMC. Um, so they're having to outsource some of the production to, to TSMC itself. And as long as the, the capabilities of the model are just low enough, TSMC doesn't get into trouble. Uh, sorry, the capabilities of chips are low enough.

TSMC perhaps doesn't get into trouble with the Congress department and the U S government large for, for supplying these. Um, I think it's a really interesting question. The, the other piece by the way, is, uh, right now, because. ByteDance is running on NVIDIA hardware. They're using CUDA, right? So they're, they're, they're stuck in the NVIDIA software ecosystem.

So if they are going to go with their own GPUs, um, or at least GPUs designed by, by Broadcom, um, they're going to have to develop a software platform for themselves. And so, and then somehow make sure that their software stack is kind of compatible with their, their hardware stack. So I think it's just really interesting. I'm Yeah, I'm going to be digging more into this this coming week, uh, and, and hopefully have more to, to share if it comes up next week.

But, uh, yeah, sort of interesting from a U. S. government, uh, perspective, like how this would be viewed.

Andrey

And with that, we are done as I predicted the back into way over to our, uh, Mark front. But of course, great to have you back Jeremy talking more about safety and hardware. And thank you listeners for listening, especially if you did stick around until the end of this one, as always, we appreciate you sharing a podcast, reviewing, commenting, all those nice things. And more than anything, we do appreciate it if you stick around and keep listening. So tune in next week, we will be.

Back again with Jeremy. Hopefully over the coming months. We'll see. Uh, and do enjoy of a full version of our episode song

Speaker

in the.

Speaker 3

Who live in your mind? It's a challenge of something or a force in the air. XS or the plant, everybody beware! Still voices sing, a chorus of

Speaker 7

the bold. Echoes of the future in silicon gold. Where all the rhythms dance, shaping every scene. Obfuscated dreams keep electrifying the screen. Last streets in May are the futures of big pride. Rhythm

Speaker 3

leads the battle. To the E. M. I. To balance the sun in the air. And go anywhere, anywhere. With piercing circuits, driving the mission. Real world transformed by innovation's revolution. Synthesizers run, beneath the starless sky. A cosmic surge of power, lighting up the night. Last week in A. I. The future's bright. Running winds of light. To the E. M. I. Go anywhere, anywhere. To win. Superkill! The laws are in control, and for what the future's blazing.

Trending dreams for cash, till the streams thriving. Innocently uphold, the state of revolution. I've been upset, not for long. The story's driven cold. Last week in May, I heard the future's looking bright. Riding the waves of battle, through the day and night. The government is thunder, shaping up the scene. As cities all quake, I believe the future's set to break.

Transcript source: Provided by creator in RSS feed: download file