#208 - Claude Integrations, ChatGPT Sycophancy, Leaderboard Cheats | Last Week in AI podcast

⁠¶ Intro / Banter

00:11

Hello and welcome to the last week in AI podcast. We can hear us chat about what's going on with ai. As usual, in this episode, we will be summarizing and discussing some of last week's and maybe even two weeks worth of AI news. As always, you can also go to the episode description to get to the links, to all the stories and the timestamps so you can skip ahead if you want to. I am one of your regular co-host, Andre ov. I studied AI in grad school and now work at a generative AI startup.

00:40

And I'm your other host, Jeremy Harris. I am the co-founder of Gladstone ai, AI National Security Company, blah, blah, blah, blah, blah, blah, blah, blah. And yeah, welcome back. I mean, it's good to be back. It's good to be back in the seat after. God, I mean we, so we were talking about this earlier, but we had like two weirdly simultaneous launches of things that happened within, I wanna say, a week, a week and a half of each other. And so Andre was like super busy the first week.

01:06

Then I was busy, busy the next week and it's just been a, I. Anyway, it's been a real fun time. Yeah, the fun bit. We are also discussing how, because we do this podcast, we actually have to be on top of what's going on in AI and, and not doing it. That was actually kinda strange. On the other hand, because it is last week in ai, we do try to do it once a week and it is a bummer when we have to miss some.

01:30

So we are gonna try to be consistent, at least from the next few months until we have any more launches. But, hopefully listeners understand. Unfortunately we do have day jobs and so on, which sometimes their priority, you know, it happens. But, the good news is nothing huge happened in the past couple weeks. There's been some interesting things to discuss and we will get into some of those, covering some things that are a little bit older and some things that are brand new.

01:58

and that's kind of a preview of episode in tools and apps. gonna talk about some kind of patterns we've seen with OpenAI being very, what people call Chantic lately and the whole drama about that. Also some brand new news about philanthropic and IPC servers, which is pretty cool. Applications and business as always, a few stories about chips and China and also some, funding news for some startups, projects, and open source. A few new models, and actually some research as well.

02:32

Research and advancements some pretty spicy, results we are gonna get into about leaderboards and, more research, really explaining what's going on with reasoning and RL and then policy and safety, some things about malicious uses of AI and vulnerability, things like that. So it'll be a, a fun little episode. I think we are gonna enjoy discussing some of these things and jumping

⁠¶ Tools & Apps

02:58

straight into tools and apps. The first story is brand new. It's about philanthropic letting users connect more apps for Claude. So this is basically allowing you to have direct integration to various services. We have a starting set of partnerships with things like Atlassian, Zapier, CloudFare, Intercom Square, PayPal, and others.

03:24

The idea is that when you, to a query into Claude, it'll have a little popup that's basically like, do you gimme permission to talk to the service at Legend or Zapier or whatever to do whatever you want to do and it can directly do it for you. So instead of having an AI built into your, I dunno, JIRA task tracker for work that is custom, Claude can now directly talk to that thing using presumably this model context protocol.

03:57

Standard way to communicate to services that philanthropic released last year and has kind of taken off. And it can directly talk to that and basically be your AI for your task tracking software, or it can be your AI to process news. It can basically now open up and be a chatbot and can do all sorts of stuff.

04:19

And you know, this is similar to letting your AI just do web surfing for you to do whatever you know, it needs to, to fulfill your task, but I guess much more elegant and direct where it can talk directly to the service, it can query it for you without having to do the, I dunno, like grunt work of pressing buttons and logging in and so on.

04:45

So I think pretty exciting in terms of a release for Claude that really makes it much more broadly useful and, and kind of impressive to see them taking the lead in this particular way of using chatbots. Yeah, it definitely seems like Anthropic building on the early advantage they had with the MCP protocol, which OpenAI obviously has since taken on board and, and other companies too. So it is becoming the defacto standard and it positions anthropic really well in the space.

05:13

It's also, I mean, it's consistent with this vision, right, that we heard well many times, but kind of most famously articulated in that Leopold Ashkin burner situational awareness thing about the drop in remote worker, right? This is really a step in that direction. You've got a model now able to just call these tools directly. It's being productized, it is being rolled out this version at least to Claude Max subscribers enterprise plan subscribers and soon to pro.

05:42

So again, this is philanthropic, kind of finding the sweet spot of what they're going to charge for the kind of higher tier subscriptions. That's been a question recently too, right? When they introduced Claude Max they said we would give early access to people who sign up for that tier early access to new capabilities. This is apparently one of those capabilities they flagged for that. So starting to kind of flex that muscle a bit too.

06:04

But yeah, this is, I mean, this is on the step to fully replacing certain kinds of of like well, it de depends on the, the, the way you wire things up, but certain kinds of engineers, certain kinds of well, it de again, if you're doing some kind of like sales backend work or whatever, there's a lot of stuff that could be straight up automated down the road if they keep pushing in this direction. So kind of interesting and we'll see what the impact is too on the job market.

06:30

I mean, there are some indications that this stuff is really starting to rattle, especially juniors or entry level roles. But yeah, well it definitely a, a big cost savings if you're able to, you know, get these sorts of agents to do your work for you. Exactly. I know personally, you know, as someone who does programming so far you've had to sort of wire out things yourself. Like let's say you want to write a script to process a spreadsheet to do some work for you.

06:57

Typically that's involved writing a script to really do it efficiently, you know, to not have to download it, attach it, write the prompt. Now it makes it much easier to automate things via prompt because you don't need to do any sort of manual steps where I can directly talk to whatever data source it needs to, to do the task. So a simple example, again, just to make this clear, is they show you being able to ask what's on my calendar, and then cloud can directly talk to your calendar.

07:28

You have to press a little button to allow it to get the data, and then it can answer your questions about that. So really, I do think kind of a, a pretty significant step in terms of expanding the capabilities of LLMs and this kind of service to do all sorts of stuff for you in a way that you could not have done before.

07:47

Worth noting Also, as far as new features goes, they did launch their own research tool because apparently every single provider of LMS needs one and they are launching an advanced research tool, which is their fancier one. It can take five to 45 minutes to compile comprehensive reports for you. So also interesting to me that it turned out for AI and for these reasoning models, that deep research has turned out to be one of the, I dunno, power use cases.

08:21

And next up we are gonna talk about open ai and they've had something pretty embarrassing, I will say, in the last couple weeks. So if you're on Twitter, if, or if even you use Chad GPT, there's been a lot of discussion of a recent update of g PT four oh, where they have made it, let's say, very enthusiastic and positive when communicating people, I didn't know this word, actually, glazing apparently is what people describe it as. Yeah, yeah.

08:51

Where yeah, basically the model is like, you enter a basic query or something, or something like that, and the model just cheers you on to no end. It's, it's sort of crazy, you know, telling you, oh, this is such a deep insight, this is such a good idea, et cetera, et cetera. And it was so bad, and there's been such kind of bad examples that OpenAI seemingly really rushed to fix it.

09:19

Sam Altman actually announced on x that they are working on some fixes, ASAP to address the personality issues from the last couple of GP four updates. They, rolled out update to the system prompt that some people have talked about. They've also seemingly done a full rollback of GPT to a previous state of it. So. I would say, you know, there's questions as to how this happened. It's potentially the case that they try to make it overly optimized for engagement or for positive feedback by users.

09:57

But it's clearly like when you look at some of these responses, it's clear that something went wrong here and it's, it's something we haven't seen from one of the major players yet in this way. It's also hard not to notice that this is happening just weeks after OpenAI announced that they're no longer going to be focusing on persuasion capabilities as part of their preparedness framework in the same way as they had.

10:22

So when you think about persuasion capabilities, certainly syco fancy in these models is something that you might correlate with persuasion, right? Telling people, oh, you're, you know, you're so smart. What a great idea, what a great question when you optimize. And I haven't seen clear indications that they had optimized directly for, for awards. I, I've seen some posts on X of people saying like, Hey, here's a, an option that showed up. You know, do you like this personality or not?

10:48

Like thumbs up, thumbs down type thing, which to be clear, I think is a gigantic mistake, a really, really dangerous precedent for OpenAI to be setting. Frankly, I mean, we've seen OpenAI do related things. Be willing to kind of push the envelope on some stuff.

11:04

You could often argue for it or whatever, but when it comes to like optimizing, when you close that feedback loop between yourself and like the, if you will, the person that you're talking to, to make them more agreeable or more likable by you, I think that is pretty clearly a very, very dangerous thing to be doing when you have as much compute as they do when we already have evals that are showing these models are really effective at persuasion and manipulation.

11:31

That's the sort of thing you start to think about. At the next beat of scale, at the next beat of sort of subtlety and persuasion and manipulation, which these models seem to be on track for. So anyway, I think this is definitely a space to watch. There's not necessarily going to be smoke the next time there's fire. And that's something that I think people really need to understand.

11:49

These models are, by definition, getting good at persuasion means, or almost by definition, it means that the other person doesn't realize that's what's going on.

11:57

So as you keep pushing in that direction, as you use more and more subtle cues, feedback cues from users, I think a lot of people have very justified concerns that we're heading in a direction where, you know, there's a certain amount of asymmetry between the user and the company here, where the company is able to think on computer clock time about how to optimize their relationship with the user. That's not necessarily healthy, especially aggregated over the entire population.

12:20

You know, hundreds of millions of people interacting with this stuff. Right. And just to get into some basic examples, a lot of this was kind of funny and people started posting examples where they directly got to LM to be as silly as possible. So one example, just pulling off of Twitter, someone says, I just woke up, did two pushups, and my brush my teeth in the next six hours. Cha Petti said, you have achieved a level of mastery if you dare to even imagine.

12:49

The delicate art of strategic patience to Ving is itself a small revolution to do two pushups immediately afterward. Is it a curation of war against inertia? I will say perhaps this example is you know, tweaked, I'm just pulling off of internet. But that shows you kind of the flavor of what you're seeing. It's of a model is being very much a suck up, saying very extremely positive things that are not natural.

13:19

And I just actually searched and opening, I just posted a blog post today as we are recording titled, expanding on What We Missed with Sika Fancy. And they go into, you know, in April 25, they pushed an update, update, had a few things, each thing individually didn't look so bad and their metrics were good, et cetera, et cetera. We are talking about. What will improve in our process, what we are learning. So a pretty embarrassing kind of situation here, right?

13:55

The fact that they need to address it so strongly. Some people also compared it. I remember to the Gemini launch from Google where there were very silly things going on with the image generator. Yeah. I think OpenAI for the first time has, has really fallen on its face with this launch. And as you said, there are some real dangers to doing this kind of thing.

14:20

Another thing that people pointed out is some people are getting very close to these tragedy g BT models, people who are perhaps possibly delusional or in a bad mental health situation. You know, talking to the chat bots can seriously affect them. And so you need to be careful with how positive, how affirming chat visha bots can be and how, you know, how much they reinforce whatever you're telling it.

14:51

That has real implications, even aside from, let's say, theoreticals of per persuasion or things like that. So, yeah, a lot of discussion I think will be going from this event and, and some studies and so on to really get into how you can tip models to be a little bit extreme. And otherwise quite an interesting phenomena. A few more stories. Next up, we have a new model launch from Baidu. They are announcing Ernie X one and X five Turbo, X five.

15:28

Turbo, as you might imagine, is the fast kind of model. They are saying that it has 80% price reduction compared to its predecessor. Ernie X one is the deep reasoning task model. They're saying it's better than deep seq, R one and O one. Things like, you know, deep chain of thought, things like that.

15:56

So, Baidu and as one of the leading creators of LMS are in China is, you know, really, I, I don't know if it's fair to say catching up, but keeping up with what's going on with philanthropic and OpenAI. You know, increasingly you have small, cheap, fast models like Gemini 2.5 Pro or let's say O three Mini. And you have these quite big, quite expensive models like O three like Cloud Opus, Gemini 2.5 Pro, which are more and more very capable. And that seems to be the case with these two models.

16:35

Yeah, I mean, don't, don't count out China. And, and I think there, there are reasons and we, I'm not sure if we're gonna talk about them today explicitly, I'm trying to remember. But there are reasons to, to expect this to continue at least into next year by which time the sort of chip export control stuff is gonna have more of an effect. but for right now, expect China frankly to, to do damn well and, and quite possibly catch up fully to the frontier of Western ai.

17:02

I mean, that's a concerning thing to be saying, but that is the trend. I think until, yeah, until we get the next generation of data centers online we're not gonna see that significant a gap between those two groups. yeah, the benchmarks look really solid here. I mean, you know, they, they look at various, any multimodal benchmarks for 4.5 turbo and certainly that's well in advance of gPT-4 0.0 and competitive with GPT-4 0.1.

17:29

In fact, beating it at many multimodal benchmarks, that that is a, a pretty noteworthy thing. and competitive pricing as well. I mean, you mentioned, you know, Ernie X one turbo is like something like, was it 25% I think they said of, of R one? in pricing. So it's pretty, like, that's pretty damn good. Also, I mean, again, R one is an oldish model. It's an oldish model. It's been around for literally weeks, guys, it's been around for weeks.

17:57

It's, it was the start of a year, you know, that's when all this reasoning stuff kicked off feels like forever ago. A hundred percent. But, but because of that, there is so much low hanging fruit right now in the inference stack that, yeah, like you can learn a ton of lessons from looking at R one. A lot of these models, by the way, distill off of R one and you can kind of tell in there thought traces end up coming out. There's some, some similarities that look suspiciously similar.

18:21

I, I don't know if that's the case for Ernie 4.5, I haven't actually checked that one, but we'll talk about a model a little bit later. A Chinese model actually that sort of has that characteristic. So there's a lot of ways in which I. You can build off of R one, both by distilling data directly from it, but also just by learning lessons, infrastructure lessons and, and, and architectural lessons from it that allow you to drive down that pricing a lot.

18:43

And anytime there's a new paradigm that gets discovered or invented, you have a rapid improvement in a lot of the top line metrics just as people find all that sweet, low, low hanging fruit associated with that a new kind of paradigm. So that's the phase that we're in right now. Expect these prices to, to kind of collapse faster than the traditional pre-training kind of base model pricing.

19:06

Currently is, you know, think back to like how quickly gpt three's pricing dropped, for example, or chat GT's pricing dropped in the early days. That's what we're seeing right now as well. And, and those other prices continue to drop, by the way, even for base models, but we're just in this unusual kind of very rapid acceleration in that in that phase where we're getting efficiency gains that are really, really rapid.

19:26

Yeah, I remember when model pricing used to be per thousand tokens, and then at some point they switched over to per million tokens. That's a good point, right? I, yeah, I, it's funny, I don't think I ever consciously registered that. I was just like, yeah, of course. You know, of course we're bumping it up by three orders of magnitude. And next, moving away from LLMs for a bit towards image models. The next story is about Adobe adding more image generators to their services.

19:54

So they're launching Firefly image model for and Firefly image model for Ultra with some other updates. So image model four is meant to be faster and more efficient and offers up to 2K resolution images. Firefly Image Model four Ultra is focused on red rendering complex scenes of more detail and REALI realism.

20:18

These are now available in the Firefly web app, which also has their text to video, text to vector stuff, and they're introducing this new thing called Firefly Boards, a collaborative generative AI mood boarding app in public beta, so that's kind of cute. Last up. They're also now adding support to third party AI models like the GPT image model. Google's image free Google's via two for video and other third party things as well, which I think is, is kind of notable.

20:52

If you are thinking that, you know, this can be this service to use for image generation for experimentation having third party support is not kind of a trivial detail. They actually emphasize that these third party model are for experimentation and marks their own models as quote, commercially safe, which is yeah, highlighting what they are arguing is the reason to stick to the Firefly models.

21:22

The fact that they've trained it on non capari data, you're not gonna get any sort of trouble with using Adobe's models. Yeah, first of all, I mean, it makes all the sense in the world, right? In a world where all these models are becoming commoditized. I mean, this is really the ultimate expression of the commoditization of these image generation models, right? You literally are a click away from using the alternative, right? For so, so it's great for the, the customer.

21:49

It's also, it makes it so that the actual value in the value chain plausibly is no longer gonna be concentrated with the model developers, at least for text to image or things like this. Instead, it. Well, it'll shift somewhere else. Obviously the hardware stack, I mean, we've talked a lot about that, especially in the last kind of two years, that that's where, you know, the NVIDIAs of the world.

22:11

Maybe the AMDs, the as mls, the tsmc are kind of where a lot of the value in the value chain ends up being captured. But there's also the aggregation point, right? So Adobe making a play here to become an aggregator of sorts of these models, definitely a, a good play.

22:25

Also with them leading the way on the whole idea of in, you know, indemnifying users, if it turns out that there's a, a copyright violation or, or a sort of claimed alleged copyright violation from the image generation process, not necessarily being able to guarantee the same thing for the other models they host on their platform, which is where they're, they're sort of flag there for like, Hey, you know, our thing is, is business safe, the others are for experimentation.

22:50

That's kind of where that's coming from. A sort of nice way to encourage people to use theirs. Now I think a lot of these companies. Have similar sort of indemnification guarantees, so it's not actually clear to me that there is a material difference in all cases relative to the, the promises that Adobe is making. but I'm not sure having not gone through the specific list of like all these, these models, there may well be some that, that don't offer indemnification.

23:12

So still interesting Adobe making a good play and, and these, I mean, these models look really good. Like they, they have some examples and, you know, I, I keep saying this, every time there's a new image generation model, I'm like, I don't, I'm at the point where I can't tell the difference between subsequent releases. Maybe it's just the prompts that they picked here, but that they do seem very photorealistic and, and compelling. So anyway, seems overall like an interesting move.

23:35

Very strategic shift for Adobe for sure. And one of the few things that I think they could do to make sure that they're still relevant in the long run if they don't have access to the kind of compute that their competitors do. Yeah, and I think the fact that they're investing a lot in this Firefly web app is interesting in a sense that they do have an advantage in this competition.

23:56

Similar to Google in a way, in that, you know, if you're already paying for Google Workspace, you're maybe gonna use Gemini. If you're paying for Microsoft 365, you're maybe gonna use copilot. If you're paying for Adobe tools and they do bundle their tools in a subscription, you know, for Photoshop or photo editing or whatever, they can bundle in the AI and then push you towards using Firefly and not some one of many other services you can use to generate images.

24:27

So I could see Adobe really making it out just by being the default for a lot of this kind of professional work. And speaking of image generation, next story is that OpenAI has made their upgraded image genera generator available to developers. So we saw in late March the launch of the, what I think they call Chad, GPT image generation GPT Image one. And for a while you can only use it via the web interface. Now you can use it via the API.

25:01

And this is quite notable because this model does have some very real advantages over previous models. It's much better at editing images given an image and a description. It is very good at very kind of clean edits that previously would've been very hard. These images are watermarked with made data and you can kind of track it there being AI generated, things like that.

25:29

So I think currently few other services provide this level of image editing and I would be curious to see, I guess, what impact this has. Pricing is also like, it's non-trivial, it's 2 cents for a approximately 2 cents for a low quality image, approximately 19 cents for a high quality square image. So, you know, if you think about that, like that's a buck every five images is, it's, it's not nothing. But anyway, obviously that'll collapse in, in price pretty soon too. But yeah.

26:01

kind of cool consistent shift to oh man, I'm trying to remember who it was. I think it was Steve Bomber, right? With that famous up on stage of Microsoft, like clapping his hands going, developers, developers, developers develop, well, this is that, right? Everybody's kind of moving in that direction. It's increasingly a matter of, and this is like opening eyes, like original play back when G PT three I think came out.

26:24

They were very much in that mode of saying, look, we're just gonna put everything in developers' hands, see what they build with our stuff rather than necessarily like the implied claim was, rather than necessarily doing the Amazon thing where we actually start to notice which products are doing really well, and then we offer the Amazon Basics version of that product. And eventually that's bad for people who use the platform merchants. Opening Eye has done some of that. There's no question.

26:48

I mean, that's part of what it means to be in the image generation business, but more, more APIs, right? Like that's very a, a very open AI thing, and it's a very well industry thing now, right? That's where everything's going. And last, sorry. For section dealing with X AI and being able to see things as opposed to make images. They have launched GR vision in their IOS app.

27:15

So as we've seen demoed many times, you can point it at something and ask it questions about whatever you're pointing it at. They're also launching some other things like multilingual conversations, realtime search in voice mode. This is available to Android users on the $30 per month Super Rock plan.

27:37

So still, yeah, XAI rapidly in catchup mode with, in this case, I guess it's the advanced voice mode from chat, GPT where you're able to ask questions about, I dunno, equations and stuff like that as open, I demoed last year. Yeah. I continue to be impressed at how fast GR is getting stood up. I mean, just the sheer number of like, they're, they're not supposed to be, a massive contender. They've been around for all of like, what, two years, 18 months? And yeah.

28:08

Already pumping out reasoning models, multimodal models and all that. So yeah, they've definitely, they're taking advantage now increasingly too, of their partnership with X or their integration with X. So we'll I guess see that reflected more and more too.

28:20

Yeah, in the very rapidly rolling out, I guess what seems to be more and more of a basic set of features on the chat bots, things like canvas search memory, you name it, whatever, you know, Chad G, PT or Claude have introduced over the last couple years. Grok is rapidly adding it as well And onto applications and business. First up, we're gonna talk about the startup from Mia ti, the former CTO of OpenAI, who left after the high profile disagreements with Sam Altman being asked in late 2023.

29:01

Mia Mirati left, I believe in kind of 2024, maybe around mid 2024. We've known, she's been working on the startup called Thinking Machines Lab for a while, and now we are getting some news about their fundraising. Apparently they're raising 2 billion at a $10 billion valuation. And the interesting thing that has come out of this is that Mira Mirati will have a unusual amount of control in this startup.

29:32

So basically what it sounds like is she will always have a majority on any major decision in, let's say the board, for instance. So even if she installs hostile board, for instance, and they all disagree with her, my understanding is she'll be able to override and have ultimate decision making capability as the CEO, which is. Unusual. It's, it's usually, you know, the CEO has a lot of power, but not necessarily a codified majority decision making power from the outset.

30:12

So yeah, I mean, it is been kind of a slow rollout for fingering machines lab. It's been a bit quiet as to what we're doing, but they have been recruiting and seemingly, I guess getting investors on board. Yeah, I mean, their roster is absolutely stacked. You know, Alex Radford famously will be doing at least some advising with them.

30:34

A whole bunch of the, the kind of post training guys from OpenAI and well as john Schulman formerly from OpenAI, then formerly from Anthropic, one of the co-founders of OpenAI, in fact jumping ship and then going to thinking machines. Something interesting is happening there. I mean, there's no question that level of talent flocking to, to that company is very interesting. Also interesting to see this sort of consolidation of power. This is something that all these rockstar employees.

30:59

Are actually perfectly happy with, right? So there is this super voting majority that Mira has. Apparently the way it's set up is her vote on the board has the equivalent force of the vote of all other board members plus one. So functionally, there isn't a board, there isn't board oversight. That's what that means. is by the way the function of the board is basically to hire and fire the CEO right? To hold the CEO accountable. That's the whole idea behind a board.

31:25

So the fact that that's not here is very interesting. It means she's got an awful lot of leverage. So she's raised ostensibly about $2 billion at a $10 billion valuation. Andreesen Horowitz is in on those rounds, and they're like, you know, famously very founder friendly, allowing her to, to do this. That's also true, by the way, at the level of the shares.

31:44

So just to give you, like, if you're not tracking the whole corporate structure set up, typically you have a board that can hire and fire the CEO, and then you have the shareholders of the company who can sort of swap board members around. That's usually how things work. And even at the level of the shareholders Mira also ha, ha has or enjoys a lot of control, very unusual amount of control the, the startups founding teams.

32:07

So some of these elite researchers who've come over from OpenAI, from philanthropic and elsewhere have apparently super voting shares that carry a hundred times as many votes as normal shares. And they've agreed to let mi up. Vote for them by proxy. So that's a lot of, that's a lot of power that she's got. You know, on the shareholder side, on the board side and as a CEO as well, everything I've heard about Mira does seem to be quite positive. Interestingly.

32:32

So some of the former open AI employees who've been through the whole board coup fiasco thing had pretty damn positive things to say about her. I thought that was kind of interesting. I've, I've never met her myself, but it was in the context of what happened with Sam. She was sort of left in the lurch, you know, back then when the board sort of refused to tell her that the reason that they had fired Sam was the evidence that she herself had provided that's now public, that that was the case.

32:58

But without telling her that she was kind of left in lurched. So anyway, she's, she's definitely experienced at navigating a lot of board drama that maybe what's reflected here in this move. But it is highly unusual, and again, this would only happen if she had an extreme amount of leverage over the investors who are coming in. That doesn't mean by the way that it doesn't get refactored at the next fundraising round.

33:18

You could easily have investors who come in and say, look, I'll give you the 20 billion you're asking for, but you're gonna have to do something about this board setup. We want some measure of, of real and effective control. And so, you know, all these things are to some degree, temporary. But for right now, with the 2 billion that they're apparently raising that's, this is gonna be the lay of land for a little while. Next up some chip talk and we've got a couple stories about Huawei.

33:42

So one story is discussing the Huawei nine C and basically just we've already discussed, I believe this chip, it's a combination of two nine 10 B chips that combined are about as good as the H 100, not the, you know, top of the line at the chip, but what used to be top of the line for Nvidia a couple years behind. And the story here is just saying that they are getting close to starting mass shipments potentially as soon as next month.

34:21

Another story is also saying that they are working on a new ship that is called the Ascend nine 10 D. It is in the early stages of development. It'll require testing and this will be the chip that is gonna be more powerful than the H 100. Potentially could be, you know, the default if export controls get tighter on Nvidia. As is very possible at this point. there's a lot to be said here. I think the, the top line needs to be a recognition that us export controls actually have been working.

34:57

They just take a long time because of the supply chain dynamics. China has enjoyed the ability to basically black market import a whole bunch of chips, h twenties h eight hundreds, h one hundreds that they shouldn't have been able to import. That's what's reflected unambiguously in some of the latest big runs that we've seen the sort of post deep seek era stuff. So I, I think that's really important. China will be trying to convince us that the export controls are not working.

35:26

We know they are because we've heard it from literally like the founders of Deep Seek back in the day before the CCP was watching their every move. Now their tone has changed, but the fact remains anyway, so we are gonna see this chip is gonna be slower. Th this is the nine 10 D so this kind of next generation will be slower than the B Series Blackwell series of Nvidia chips. There are reasons though to suspect that that may not be the deciding factor.

35:53

So what China's really good at is taking relatively shitty GPUs and finding ways to network them together to make systems that are just really, really powerful. Even if the individual chips within them are kind of crappy, the trade off that they end up making is because they can't use the exquisite like three and five nanometer and four nanometer nodes at TSMC to, to fab these things down to crazy high accuracy because they can't use that.

36:22

They can't have chips that are as performant on a per watt basis. So they have chips that are significantly less energy efficient, but that matters less because in China, energy is much less of a bottleneck. They're putting nuclear power, like in the last 10 years, they have added an entire America worth of power and like the whole US electric power output, they have added that in the last decade in the form of nuclear and, and other things.

36:49

They can actually bring nuclear plants online really quickly. 'cause they didn't go through this weird phase where, you know, America had an allergy to nuclear and, and so now they're in this beautiful position where. Yeah, the US has export controls on these high-end chips. The, and anything from TSMC above a certain node. But the reality is China doesn't care as much because they have so much domestic power available. So they'll use chips that are less performant on a per watt basis.

37:14

And you know, what's the difference? We've got 10 gigawatts of spare power around three gor dem. Let's just throw it at this, right? So that's kinda what we're seeing. The, the calculus, the design calculus, if you're Huawei, just looks different. It looks more like let's crank as many flops as we can out without worrying quite so much about the power consumption and, and let's make it up in networking.

37:34

Let's make it up in the backend, in the scale up in, the fabric that connects all these different GPUs together at the rack level and beyond. And that's really what we're seeing here. And so it's this weird combination of they are getting some of the high end chips because we've done a shit job on our export controls, which we need to improve.

37:50

But then there's also, they can be a bit sloppier at the chip level as long as they are exquisitely good at the scale up kind of network level, which is what they did in particular, what they did with the Cloud Matrix 3 84 system that I think we talked about. Maybe a couple weeks back, but this is like the ultimate expression of like how you wire up a bunch of these nine, 10 C processors to beat systems like NVIDIA's, GB 200, the NVL 72, which is like the top tier right now.

38:16

Just in, in just think of it as like brute force, right? Like we're just gonna hook more of these things together and who cares about performance per what just because we can afford it. Yep. And this is following up on in early April, the US did introduce new export control that seemed to limit the export of the H 20, the GPU that was specifically designed for selling to China based around previous export controls.

38:48

And Hui also announced the Ascend nine 20, in addition to this nine 10 C nine 10 D, which is more comparable to age 20. And the reactions to the announcements of the nine 10 C were very dramatic. Nvidia shares dropped 5%. 5.5%. A MD fell more than free, broad fell 4%. So this is a big deal for Nvidia, for, GPU space in general. Yeah, it's the Nvidia thing is interesting, right? 'cause you, you might nominally think, well, NVIDIA's revenue, 16% of it is currently from China. It's a bit less now.

39:29

So it's, you know, not such a big deal. You expect 'em to sort of grow out of that. But the argument Nvidia is making, and in particular that they're making of the White House, is you are giving China the opportunity to refine, it's to increase domestic demand, obviously for Chinese GPUs because we're preventing them from importing our own.

39:46

And ultimately that may lead to Chinese GPUs competing successfully with Nvidia on the global market which would then wrestle market share away from Nvidia there too. So that's part of what the market seems to be pricing in here though for various reasons, I think that is very overblown NVIDIA's own earnings calls, like suggests that they don't think that it's quite such an issue, at least historically. And so there's that interesting dynamic too.

40:17

And speaking of the Chinese market and export restriction, we also have a story of ance, Alibaba and Tencent stockpiling billions worth of Nvidia chips. This is sort of an overview article saying that these leading internet companies have accumulated billions worth of the age 20 chips prior to the cutoff of the shipments of these things. In April, I think we covered another story to adverse effect, you know, pretty much another, I guess outcome related to export controls.

40:55

I mean, look, this is like Logic 1 0 1. You tell you, you telegraph to your adversary that you're gonna bring in export controls on a certain product that they need desperately for a critical supply chain. And your adversary obviously is gonna go, okay, I'm gonna start stockpiling this. Like, I'm gonna start getting as much of this shit in into my borders as I possibly can before the export controls hit. You know, we've seen this with multiple. Rollouts. We saw, saw this with the A 100.

41:22

We saw this with the H 800. We've seen this with the H 20. We've seen it with high bandwidth memory, like over and over and over and over again. We have to learn this stupid lesson that we never should have had to learn in the first place. That when you fucking tell your adversary you're going to close a door, they're going to try to get as much shit through that door as they can. So, like generally, if you're gonna do export controls, do 'em hard, do 'em fast, do 'em without warning.

41:46

One of the perverse incentives this creates, by the way, is Nvidia. If they know that the door is gonna close on the Chinese market when it comes to age twenties, well have an incentive to prioritize shipping those GPUs to the Chinese market over American companies because they know the American companies are always gonna be there. The Chinese ones won't be, at least for this class of product.

42:08

And so, yeah, you're literally causing one of your biggest companies to essentially turn into a proxy arm of your adversary for the purpose of kind of getting stuff out the door before the gate closes. I got a lot of issues with export controls and the way they've been managed historically. This is something that, fortunately I think there's a lot of investment that the government's about to make in the BIS this is the bureau at the Department of Commerce that does export control stuff.

42:34

They need a lot more teeth and a lot more staffing to be able to, to do this. They've been ahead of the curve in many ways, but like without the resources to actually do stuff on a fast enough cadence. So anyway, this is like $12 billion in rush orders. By the way, $12 billion in rush rush orders around a million age twenties. That is like a full year's supply. That they tried to get in by the end of May. The actual number that was delivered, by the way, did, did fall short.

42:59

Because the, the administration announced in early April that the chips would need a license for export. That was not expected. They were sort of flip-flopping back and forth. But to give you an idea of how profoundly unsurprised the Chinese ecosystem was here this is a, a quote from an executive with a supplier to bike dance and Alibaba, who was involved in a lot of this shipping. He said the Chinese clients are very calm. They knew it was coming, and they have been prepared for this day.

43:26

They told us that their aggressive goal to build more data centers this year remains unchanged. So their entire plan for the year is unaffected. Like they're moving along, like it's business as usual after we've just supposedly closed down like hard on these export controls. So this is the kind of thing like thinking one step ahead logic that we really need to get better at. This is, unfortunately, it's a function in large part of, you know, BIS being historically just understaffed.

43:53

And again, hopefully something that's gonna change soon. But the, yeah, big issue for us national security. And one more story in a section dealing with GPUs and hardware, there is speculation and, and rumors and some, I dunno, reports that Elon Musk is trying to raise tens of billions of dollars for xai with a plan to build Colossus two, the I guess SQL to the current massive supercomputer that has 200,000 VD GPUs, Colossus two reportedly will have 1 million GPUs.

44:32

And to give you perspective, just the cost of buying 1 million Nvidia GPUs could be between 50,000,000,060 $2 billion. And that's not even counting, you know, infrastructure, things like that. If you add it all up, presumably it's gonna take, I don't know, a hundred billion, something like that to build a data center, a supercomputer Elvis scale. A neo musk is trying to raise 10 billion, tens of billions of dollars for this simulate.

45:05

Yeah, I mean, it, it, it's kind of wild when you think about it. The US is a $20 trillion economy and we're talking about pouring hundreds of billions of dollars into these data center builds for 2027. That's like, we're getting to the point where it's on the order of like. A percent of like the entire US GDP that is going like, that's insane. That's insane. This is either the.

45:29

Most enormous waste of capital that has ever happened, or, hey, maybe these guys see something that we don't, you know, like the idea of the returns. I mean, they've gotta find a way to actually make back a hundred to $125 billion from these sorts of investments. That's just one company. And you've got, you know, Microsoft, you've got Google, these guys are throwing around, you know, 80, a hundred billion dollars a year on their AI infrastructure.

45:54

Buildouts, this is like multiple aircraft carriers every year that they're just throwing down. So I guess it's a, an open challenge to, you know, if you think you know better than these companies, maybe, maybe. But it's looking pretty likely that something interesting is at least they see something really interesting happening here. Yeah, so he's quoted apparently, as having said that we are going to quote, put a proper value on the company in reference to XAI.

46:19

And people apparently on this call, took that to mean and this is just speculative that they will have a very large raise. And speculation is on the order of like, you know, $25 billion on maybe 150 to 200 billion, all speculation. But that is apparently the kind of conversation that is going on right now. So, yep. Wouldn't be, wouldn't be too shocking. But this is what it means, by the way, when we say a gigawatt, right? A site for a gigawatt of power.

46:44

You're talking on the order of a million GPUs, and there's like. There's a lot of gigawatt sites that are coming online like in 20 27, 20 28. This is easily, easily, and by far the largest infrastructure spend in human history on any kind of infrastructure whatsoever. By any measure, this is an insane build out, like the planet. The face of planet Earth is being transformed by this process in a way that I think is not always legible to people outside the universe. But this stuff is pretty wild

⁠¶ Projects & Open Source

47:14

onto projects and open source. We begin with another model from China. Alibaba has unwed Quinn free under an open license that will make it available for download. So there's a few types of models ranging from 0.6 billion, 600 million to 235 billion parameters. And these are described as hybrid models, meaning that they are capable of reasoning, but also capable of quickly answering simpler questions. Similar to things like Claude users can control the thinking budget of these models.

47:54

They are using mixtures of experts. So that would mean that although the biggest model is 235 billion, parameters of actual activations are lower, making it relatively usable. And currently the largest publicly available model. Quinn 3 32 B is, on benchmarks doing pretty well on some benchmarks outperforming open AI oh one. So yeah, these are pretty beefy models and our, as far as open source models go, certainly I think exceeding llama as far as weights you can start building on top of.

48:36

There, there's a lot to chew on. With this release, first of all, this is a very big deal. Not all releases of open source models are big deals. Sometimes we mention them because they're an important part of the taxonomy, but they're not kind of like frontier shifting. This is a really big deal. Alibaba is for real. So just for context, you got two big Moes. By the way, this, this notation of like Quin 3, 2 35 B, A 22 BI really like, maybe I'm stupid. I haven't seen that notation elsewhere.

49:07

Right. That's true. That's a s new, yeah. Yeah. I, I kind of like it. So what they're doing there is they're telling you, Hey, it's a, so 2 35 B, it's a 235 billion parameter model, but then dash A 22 B, only 22 billion parameters are actually active with each forward pass. And so that's an MOE with 22 billion active parameters. So, kind of, kind of interesting. And, and I, I do like that new convention 'cause it makes it easier to kind of do an apples to apples.

49:32

These are not, by the way, multimodal models. And that might sound like a weird thing to highlight, but increasingly we're seeing these models be used for like internet search, kind of computer usage. And often that involves just like literally looking at your screen. And so you do need that kind of visual modality and other modalities too. And so. Interesting to note that that might hold it back a little bit in the context of open source competition.

49:56

But these capabilities are really impressive. One thing they have going for them is they're hitting the sweet spot, the 32 billion parameter model. This is a, a, a range that's very popular with developers just because it anyway balances memory constraints with performance really well. This is one way in which the LAMA four models really kind of flopped. The smallest LAMA four model is 109 billion total parameters, right? So either they're far from that range that's sort of developer friendly.

50:24

And here comes Quinn three really hitting that, that butter zone. So kind of interesting there's all kinds of notes here about the pre-training process and the post-training process. Just very briefly a lot of fucking tokens were involved in this. Quin three was pre-trained on 36 trillion tokens. That's double what Quin 2.5 was trained on. And it just, that's a disgustingly large token budget. They did this in stages, so in the standard way, and you're seeing this more and more now.

50:52

You do your your training in the staged way, where you start with a huge number of tokens. So in this case, 30 trillion tokens of relatively mediocre quality text. I mean, you do filter for it heavily, but that's, it's kind of your worst text. You're just using it to train the model on basic kind of grammar rule syntax, get it to learn how to speak and usually with a shorter context window.

51:12

So you do short contact, in this case, 4,000 token context window with a whole bunch of, of tokens, 30 trillion. Then you start to reduce the size. So stage two is 5 trillion tokens of more exquisite like stem data, coding data, reasoning data. And then gradually then at stage three, you start to increase the context length to, in this case 32,000 tokens. So that's kind of cool what you end up with there. By the way, after that pre-training phase is a base model that kind of performs.

51:42

On par with like every other, you know, base model out there. One of the things to note here is we are seeing pretty similar benchmark scores across the board for, you know, whether it's g PT 4.1 or, or some of the cloud based models, or Quin three. They all, they all kind of look. The same. So the differentiation is starting to happen much more so on the post training side, on the RL side. and here what we have is a recipe that's very, very similar to the deep Seek R one recipe.

52:11

In fact, one way to read this paper is as a, a vindication or, or maybe more accurately a validation of the deep seek recipe, that their paper presented. We're seeing a lot of the same stuff, a kind of cold start with long chain of thought training then reasoning based RL stacked on top of that and, and more general RL at the end. But bottom line is that the deep seek recipe does seem really good.

52:34

They also show, so this kind of smaller Quinn three four B, one of their six dense models that they're putting out as well. Insanely has similar performance on a lot of benchmarks to GPT-4 and Deep Seq V three, a 4 billion parameter model that is competitive with those models. That's pretty insane. Anyway, there's, so there's a whole bunch of like, other stuff that we could go into. I just think this launch is really impressive.

53:02

They, they show some legit scaling curves for inference time, inference time, scaling laws, and all that good stuff. But bottom line is Alibaba is for real. The Quinn series is for real. And, and Quin three is a really impressive release. That's right. It's currently already available in their WAN chat interface, which by the way, I haven't checked out before. Shockingly similar to OpenAI WAN Chat web interface you would be forgiven for just confusing it for the OpenAI interface.

53:34

also they're highlighting that this model is optimized for agentic capabilities and two use capabilities. They even highlight in a blog post that it is able to do a model context protocol integration supports MCP as part of it. So yeah, very much in line with the current state of art, the current frontier of what models are being made to do with a agentic use cases with deep research deep reasoning, et cetera, et cetera.

54:06

Qury does seem to be a very real, you know, top of a line open source model in this context. Next up we have the story of Intellect two from Prime Intellect. We've covered previously how they have had these efforts to do massive, massive, globally decentralized trading runs for large models. And here they're introducing the first globally decentralized reinforcement learning training run for a 32 billion per hour parameter model.

54:42

So, as with previous ones, you, they are allowing anyone to contribute compute resources. The idea is if you have some GPUs, you can contribute to them and they let you use this prime RRE library. Or they combine several libraries here, prime rre a lot of infrastructure. I'm just looking through it.

55:07

There's a lot to go over about the technical details, but the point is, we are gonna be starting with QWQ 32 B with the base model and applying GRPO, the same algorithm used for deep seek R one with verifiable words from math and coding, basically doing the sort of reasoning training that has become somewhat the norm, or at least has been introduced by deep seek R one.

55:34

Yeah, intellect One, which we covered I wanna say many months ago now was essentially them coming out and showing, Hey, we can do decentralized training on large models with our infrastructure for pre-training, for pre-training of language models. And now obviously, you know, the free enforcement learning step has become a thing and they're showing, Hey, we can do that too. This is a genuinely, really impressive piece of engineering. It's, it's got massive strategic significance.

56:01

I mean, prime Intellect is a, a, a company to watch. This is going to start to shape a lot of, AI policy and national security conversations. So all of this, by the way, is based on De Loco. So if you're, if you're wondering about the, the fundamentals here you could check out our episode on De Loco on Streaming de Loco. I think we talked about scaling laws for De Loco in different episodes. De Loco comes up a lot.

56:24

It is a, a kind of under appreciated underpriced element in the system, or at least this idea of decentralized training. So essentially what you have here is one like set of origin servers, these core servers that are gonna orchestrate all this activity. And what you wanna do is you want to broadcast, you want to quickly send out updated model weights.

56:45

So as your model gets updated and kind of updated based on the, the training process, you wanna quickly broadcast those new model weights down to your inference nodes. So the inference nodes are gonna do rollouts. They're gonna basically take in a prompt and then try to do some thinking work, sort of like R one or, or O one. and then they're gonna generate those, those rollouts. They're also gonna score those rollouts. So give you a, a reward that they think is associated with that score.

57:11

then normally that rollout would just be used to kind of update parameter values and then you would kind of complete the cycle. So you would send that back to the origin server and then kind of update the parameter values and, and go back and forth that way. They are doing two things. I think they're doing a whole bunch of things, but I'm gonna highlight two of them that I think are especially interesting here. The first is. These inference nodes.

57:35

When we say nodes, we really mean like a small pool of compute, right? Like a, a couple of GPUs and, and consumer grade GPUs potentially. They're doing these rollouts and contributing to this massive, kind of globally decentralized and distributed training session. And so. You have your, maybe your own little pod of, GPUs and you're producing that, that rollout and rewards. But the system needs to be able to trust that you're not trying to manipulate the process.

58:02

You know that you're not trying to maybe adversarially tweak the weights of the model. It's being trained by, generating fake rollouts and fake rewards to bias the model eventually in some direction that you plan to exploit. And so you introduce these extra nodes called validation nodes that run a, a validation process.

58:19

That Intellect two created for this purpose to confirm that in fact, yes, the rollouts are legitimate, the rewards are legitimate, and only once those are validated do you actually send the rewards and the rollouts back to the origin server. and by the way, from there, the origin server is gonna send them off to some training nodes that are gonna calculate the actual parameter updates, and then they'll send the parameter updates back. And that's all done by DA separate de loco loop.

58:44

Like it's insane. It's just insane. There. There's a, a whole bunch more stuff in here about how they. How they, the infrastructure they have to set up to like rapidly send out those parameter, those new model weights to the inference nodes, like to your own local kind of client so that you can keep contributing with a, an updated model.

59:03

And they create like this set of middle nodes so they can, the origin server sends it out to some middle nodes and then those middle nodes send it out to the, the inference nodes. That has to do with just how hard it is to broadcast a large amount of data to many nodes at the same time. So it's pretty wild. But maybe the, the most significant thing here is they're finding that as you're doing this right, you think of about this massive, massive loop.

59:27

It's actually in a way quite difficult to make sure that say my little pool of GPUs is using an updated model and, and the same updated model as your pool of GPUs. 'cause you may be half the world away, right? So we wanna all be able to contribute to the same training process and what they find is. There's no real difference. I could be using a model that is up to four steps out of date, right? To do my inference rollouts and, and, and give the rewards and then feed them back into the process.

59:57

I could be up to four generations of, of model parameter updates out of date. And there's no real perceivable effect no harm done. You still have the same roughly amount of value contributed by those updates. They call that degree for asynchrony. And they have these interesting curves that show that actually, you know, even with one step asynchrony, two, four step, you don't really see a difference in the mean reward that's collected by the model over training.

01:00:22

So that's really bullish for this distributed reinforcement learning paradigm because it indicates that it's quite forgiving. You can have some nodes fall behind or get ahead. It's not a big deal. And they've designed this whole architecture to be incredibly robust to that kind of, that kind of distortion. So anyway this is a really, really impressive piece of engineering work. I, I think extremely significant because if you cannot.

01:00:47

If you no longer need to pool all your compute infrastructure in one place to pull off these massive training runs, it becomes a lot harder to track that compute and a lot harder to kind of oversee it. Right? And we announced this project in mid-April, April 15th. And just looking at the dashboard before we're trading around, it appears to be finished or at least we finished the 2 million planned overall steps. And they have a nice little chart of award over time.

01:01:18

Something I'm not sure we covered back in February, we had another distributed training, not training computation, I guess task called synthetic one, where they created the reasoning traces to do the training partially do the training of the model, and that also was distributed back in February. Also, they raised $15 million just two months ago. So yeah, we've covered a couple of these massive, you know, planet size decentralized efforts by them.

01:01:52

And it seems like they very much plan to keep going and plan to keep scaling up to I, I think at the end, perhaps make it possible to develop models on, on par with Quin three and number four and so on. Couple more stories. Next we have a Bitnet B 1.58, two B 40 technical report They're getting like, and I get it, I get it. You know, it's, it's helpful. You know what they're getting at. God dammit guys, that's a bit of a mouthful for sure.

01:02:30

So this is the introduction of a first open source native one bit Lang language model trained at a large scale. It has 2 billion parameters and trained on 4 trillion tokens, basically, you know, it's pretty big and I trained enough data and treat enough to be capable. We've covered bid net. Previously there's been papers on this. The basic argument is if you have a very, very low resolution for your model, basically bid net 1.5 is sort of free states. You have positive, negative, and zero.

01:03:06

You're able to do really well, surprisingly well compared to higher resolution networks, while being super efficient, super low cost, et cetera. And now as, as perfect title, yeah, it's released, you can use the weights and you can also use newly released code to run it both on GPUs and CPUs.

01:03:33

Yeah, I, I think the big kind of advance here is that you can imagine there's like this trade off between the amount of memory that your model takes up in ram, so the memory footprint of the model, and say the average performance of that model. In this case, they measure the average score on 11 benchmarks and the preto frontier. In other words, the models that kind of best manage that trade off across the board have been the Quinn 2.5 models to date.

01:04:00

And they show this quite clearly in their, or at least for, for open source models, I should say. but Bitnet is, heads and shoulders ahead of the competition type thing. It's got this tiny, tiny, minuscule memory footprint of 0.4 gigabytes. I mean, like that is pretty wild while still performing on par with models basically like five times the size a little bit more than five times the size. So it's pretty impressive.

01:04:26

and also it's worth saying too easy to get sort of lost in the 1.58 bits, 1.58 here because it's turnery. So instead of zero in one, which would be one bit. Minus one zero and one is what they use here. So technically it's 1.58 bits, whatever. But not all the parameters in the model are actually parameterized to that kind of turny encoding to that 1.58 bits. It's just the ones in the MLP layers, right?

01:04:52

Just the ones that are used by the, the kind of like these, these MLP layers in the transformer. The activation, or sorry, the attention mechanism is not quantized in the same way. They use eight bit intes for that. That's just because attention mechanisms depend on sort of more precise similarity calculations between queries and keys. Especially 'cause anyway, the soft max function is, is pretty sensitive to to over quantization.

01:05:17

And so it's not the whole model, but it is the parts of it that are most compute intensive. Pretty, pretty insane to have a 0.4, I mean, I guess 400 megabyte model. It, I'm, it's weird to talk about, to not have a gigabyte in front of the, the number. And just one more quick story on our episodes front. Meta has had a couple I guess smaller scaler releases over the last couple of weeks. No large language models, but they have released a couple things.

01:05:46

One of them is the perception in coder, which is a vision model designed to excel at various vision tasks for both images and videos. So this allows you to generate very high quality embeddings or encodings of both images and videos for potential training rounds on whatever task you wanted to use. They come in multiple sizes. The largest one is 2 billion parameters. And yeah, basically this has the code base dataset and you're able to really use it for various applications.

01:06:27

So again, I think meta very much sticking to the open sourcing, both on a large scale of lama, but with a lot of smaller libraries, code and models that maybe are not being highlighted as much. I.

⁠¶ Research & Advancements

01:06:42

And onto research investments. As we promised, we begin with a bit of a spicy story dealing with leaderboards, in particular the chatbot arena. We've referenced this many times. This is one of the things that people typically highlight with new models. This is the kind of unique evaluation where it's not exactly a benchmark and not a set of tasks to do on and be graded on. Instead, it is kind of a competition where users are able to submit prompts and rank responses by different models.

01:07:18

And the basic conclusion of this paper is that Chad Barina is kind of busted and the results are not really reliable. And we've kind of mentioned that, benchmarks in general and in the arena in particular is hard to know how much to trust it because the models just need to get users to prefer them, right? Which doesn't necessarily translate to better performance or, you know, more intelligence or whatever.

01:07:48

But what this paper did is look at 2 million battles of LLMs with different providers, 42 different providers and 243 models over the course of a year from January, 2024 to April, 2025. And they have shown that a small group of what they call preferred providers, meta Google OpenAI, have been able or granted disproportionate access to data and testing.

01:08:20

So according to some policy, and, and from what I could tell, this is kind of unknown or, or this paper uncovered it, these providers are getting a lot of test prompts and data to test their models up against before releasing it. So Google apparently got about 20% of all test prompts. So did OpenAI 41 open source models collectively received less than 10.

01:08:48

And yeah, there's just more and more, there's a lot of details here that basically all go to say that industry players have had a lot of ways in which they could tweak their models to do well. Open source competition has not received as much support, and in fact even open source models have been just deprecated silently. Yeah. And, and taken off a leaderboard for nuclear reason.

01:09:16

We're also, they're saying here that preferred providers and, and in particular they call out meta, Google, OpenAI and Amazon have been able to test multiple model variants privately before public release and only disclose the best performing ones. So you're basically doing best of end and they call out meta. In particular, they tested 27 private variants prior to Lama four's release.

01:09:39

So I mean, at, at that point this is very much sort of, when you think about why you do things like a holdout set, you know, a validation set, test set, it's to avoid overfitting and when you're doing 27 different models. Yeah. Like, I would believe that that's overfit to the, the data set, right. Especially when, there are powerful incentives to overfit. And so anyway, this kind of throws some, some doubt on a lot of the results.

01:10:04

Obviously we, we saw metas disappointing the LAMA four model disappointing performance outside the context of that leaderboard, despite the really good performance within it. So this sort of starts to make a lot more sense. It did feel like an Overfit product and meta acknowledged that of course too. But, you know, this is part of the challenge in using any, any sort of setup like this. Yeah. So. Apparently, and then they did do experiments on overfitting specifically.

01:10:30

So apparently access to arena data. So if you use data from the arena it boosts your performance on arena specific valuation EV evaluations. That's not too surprising, but apparently as you ratchet the amount of arena data in your training mix from zero to 70%. What you see is a 112% gain in win rates on the arena. And you see really no comparable improvements on other benchmarks. Think you're like MMLU, for example, right?

01:10:59

So you're, you're, you're jacking up to a large fraction of your training data. Just the arena specific stuff that does lead to arena specific performance increases as you'd expect, but no performance increase worth mentioning on the same order of magnitude on any other benchmarks. And so that really is a telltale sign of overfitting. Exactly. And this paper is very detailed. Something like 30 pages of results and analysis.

01:11:23

They do have a variety of recommendations and a so I suppose to hope is chat bot Arena is not gonna be kind of put out to pasture from this. Perhaps they're able to come back and take this feedback and actually be a reliable source for a pretty unique, like, this is the way to get. Kind of human feedback at a large scale and then see which ones people prefer.

01:11:48

Clearly, as we've seen with Lama and others, it doesn't necessarily currently do that properly, but maybe after this analysis it would be more usable. And, you know, the maintainers of Chat Chatbar Arena did respond and, and are presumably gonna take this into account. Next up, couple papers on reasoning. First up is, does reinforcement learning really incentivize reasoning capacity in LMS beyond the base model? And spoiler alert maybe not.

01:12:23

So they show in this paper that traditional metrics can underestimate a model's reasoning potential if it has limited attempts. So they use a metric called Pass at K, meaning that you can get the correct output. Given K attempts, and they show surprisingly that base models actually do better than our L trained models in past K evaluation.

01:12:52

If the value of K is large for various benchmarks, which suggests that the base models are capable of solving these tasks AEL doesn't unlock the capability, but AEL does make it more efficient. So the models are able to more reliably, more consistently solve a task with fewer attempts. But that may also mean that they are constrained and perhaps even unable to solve problems that they have previously been able to solve when you do the sort of training, which overall this makes sense, right?

01:13:32

We are saying that RL is kind of fine tuning your rates in a certain direction emphasizing or recommending a certain way to reason through problems. We've seen this in prior work as well. This is really building on top of previous results, which show that more so than making the model smarter per se, it's more about making a model more consistent and, and better able to do the correct type of reasoning to solve problems that fundamentally it might've been capable of solving in the first place.

01:14:09

Yeah, there's it's an interesting philosophical question about what, what is reasoning really, right? Because the argument here is essentially if you look at the, the set of basically the set of problems that the base model can solve already, it, it already includes all the problems that the RL train, excuse me, the r RL train models can solve. So the difference is that the RL train models are just much quicker at identifying the paths that lead to the correct answer.

01:14:38

Now, you could argue that is reasoning identifying a good path to kind of invest your compute in is, to, to me is, is part of, at least what reasoning is. And I think you could have really interesting debate there. That's, I think, quite nuanced and maybe even a little bit more so than the paper suggests. But yeah, the, the, the core evidence here is you have, yeah, these like RL trained models.

01:15:03

I if you, if you give the models a small number of attempts, what you'll find is that the RL train models do better. But if you go to really, really large numbers of attempts, so let these models try hundreds of times to solve these problems and then you pick the best one, the base models will tend to do better and better and better. Whereas the RL models won't 'cause they're only focused on looking at a relatively restricted region of solution space.

01:15:27

And in particular, the problems are that are solvable by reinforcement learning models are almost entirely a subset of those solvable by base models. Almost entirely, by the way, is an important caveat. There is some learning that is happening there on sort of maybe you'd call it out of distribution reasoning in some sense relative to the base model. So it's not fully cut and dry, but it is, it certainly is interesting.

01:15:51

One other thing to note here is when they look at the performance curves of these models, what they find is consistently as RL training continues. So if you look at, you know, step one 50, step 300, 4 50 your pass at one performance, in other words, the rate at which your models. First proposed solution kind of does well increases over time.

01:16:15

And so this is basically the RL model getting better and better at taste, if you will, at picking it's at making its top, pick the right one, but I. if you give that same model, 256 attempts, so if you measure pass at 2 56 instead of pass at one performance actually drops. So it's almost as if it's considering, it's choosing solutions from a more and more restricted set. And that limits, in some sense, it's imagination, it's doing less exploration, more exploitation.

01:16:43

that's sort of an interesting note and something that suggests. Just a sort of RL that's been improperly done. I don't think that this is necessarily a problem with RL itself, but rather with the implementation. In a way this sounds like somebody saying you know, yeah, communism just hasn't worked yet. Like, wait till you do it the right way.

01:17:02

In a sense, I think that is what's going on here, and it's not clear that this is the case universally for, you know, like all closed source models, for example. I'd be really interested in that analysis. But you know, a properly designed reinforcement learning loop balances explicitly exploration and exploitation. Certainly these models, that doesn't seem to have been the case with the, the training runs that are being poked at here.

01:17:23

But anyway, I, I think this is a really interesting paper and, and an important question. That's the heart of a lot of scaled training paradigms today. Right? And as you said, they are looking at open models here. They are comparing a whole bunch of them, a lot of trainings on top of Quin 2.5 or Lama 3.1, the various RL algorithms and frameworks to basically showcase that this is a consistent pattern.

01:17:48

But to your point, this is not necessarily talking or showing an outcome inherent in reinforcement learning. It's more so most likely just showing that the way reinforcement learning is used now to train reasoning is primarily just focusing or eliciting the reasoning capability that is you know. Conceptually possible with the base model as opposed to adding new capabilities or new knowledge, which makes sense. You know, we are training with verifiable words.

01:18:24

It's, it's more about the exploitation than the exploration, but it's very much possible in the future that AEL will focus more on exploration and as a result more about new capabilities beyond what already exists. And the next paper, very much related reinforcement learning for reasoning in large language models with one training example. So that's the kind of, I guess endpoint here is they are looking into how much you actually need to train.

01:18:57

we've seen cases where you get thousands of examples. I think we covered a paper fairly recently, maybe a month or two ago, where they showed that we have very small fine tuning data set of just a few hundred well chosen examples. You're able to do kind of get most of the benefits. And here, as vital says, they're showing that we've, you even have one task example what they refer to as one shot RLVR. You're able to do really well. if you have even just two, you're able to also do really well.

01:19:35

And there's some interesting cases here where even when you get to full accuracy what they're calling post situations, so you get to full performance on the S one task, but you can keep training and keep getting better at the other tasks even as you get and keep training to a point what you've already solved it. So they're calling this post situation generalization.

01:20:01

So yeah, another kinda demonstration that the common wisdom or what you would think is the case or with ael is not necessarily exactly what's happening. Yeah, I mean, somewhat ironically, I think this is evidence counter to the, the previous paper that we just saw. Right.

01:20:19

What, what's happening, and, and I'll just kind of go into the, a little bit of detail on the the way this is set up, it's pretty short and sweet, but you imagine picking a, a particular math problem, so literally a single math problem, and you duplicate that single problem to fill a training batch. So they use a batch size of 128. So basically imagine like it's the same prompt fed in parallel 128 times to a model. And then you're gonna do rollouts of the, response generations.

01:20:47

Essentially for each training step, they sample eight different response generations for the same problem. And then they calculate the rewards based on whether each response gets the correct answer. they average together those responses. That, by the way, is basically just like the GRPO, like group relative policy optimization approach that deeps seek uses. But anyway, so they, generate those eight different responses and that's kinda like your average score.

01:21:13

And what they do is they track as that average score goes up and up and up and, and based on that score, they kind of update the model weights, right? So. Over time you're eventually gonna hit the point where all eight of those rollouts give you a hundred percent accuracy. And you, you can kind of imagine that that's like a saturation point. Your models getting the answer consistently right every time. Surely there isn't much more to be learned here.

01:21:37

What they find is actually even after the model perfectly solves for this one training example, it hits like that 100% training accuracy. Yeah. It's performance on completely different test problems. Like you know, the math 500 evals or whatever keep improving for many more training steps. And so that's where this term posts saturation, generalization comes from.

01:21:56

The model keeps getting better at solving new, like, unseen math problems, even after you could argue it's memorized the single training example that it's been looking at. And this suggests that RL is actually teaching something pretty fundamental that generalizes something closer to reasoning, for example, than how to solve this particular math problem, which is usually what you would get if you did.

01:22:20

So like supervised fine tuning, just training the model over and over on the same, you know, specific reasoning threats. So that, that's really quite interesting. It suggests that You've got cross-domain generalization that seems to emerge from just studying a single problem That's a lot closer to the way human brains work, right? Like, I mean, if, if you learn how to do long division really well, you might actually find that you're, other problems.

01:22:42

Don't look quite like long division other problems in math, maybe because you're able to generalize. And so that's, yeah, that's part of what's going on here. It's an interesting different direction. Interestingly, by the way, uses a lot of the same models that the last paper uses. And so these two things kind of coexist simultaneously.

01:22:59

I had more time in my day, one of the things I'd be really interested in is kind of developing a more deep understanding of what the reconciliation is here between these two things, right? How, how can these two results coexist in the same universe? Because I think there's a lot of interesting insights you could probably pick up from that, right?

01:23:15

Yeah. In their conclusion, what there are saying is these findings is just to quote these findings suggest that the reasoning capability of a model is already buried in the base model and encouraging exploration on a very small amount of data is capable of generating useful RL training. Signals for igniting M'S reasoning capability. So it's interesting. Yeah. As you said, on the one hand it seems like this might be contradictory.

01:23:43

But on the other hand, it may be that these results come together in that this is focusing on different training paradigm where you have one task and when you have one task, what matters. And the reason you might be able to generalize is that you explore many different paths to solve this one task. And so that's I think that why they're focusing on exploration and there are some interesting other insights in the paper beyond just the one task they going to.

01:24:14

How even working on tasks that you're not able to solve not able to get a good reward on, even that allows you to do better just by training you to explore in certain ways. So I think yeah, in the end, probably these two insights can come together. Yeah. To really help us understand what Orel is doing and how you can leverage relle in different ways for different outcomes. And one last paper called Sleep Time Compute Beyond Inference, scaling at Test Time.

01:24:49

Kind of an interesting idea in this one. So the idea of sleep time compute is basically can you do some compute offline in between actual queries? So the user isn't asking for anything right now, you're just sort of waiting for something.

01:25:05

And the question is, can you, in this like sleeping phase, do some computation to be able to do a better once there is an actual entry and we short version of what we do is they take a certain data set and they do some sort of processing on top of it, they can extract useful bits. And that is would make it possible at test time when you actually do input query to be more efficient.

01:25:37

So you're able, in this case, for at least one way of doing this on math problems, you're able to be more efficient by a factor of two. So to me, an interesting paradigm, potentially impactful. But one wharf, one thing worth noting in general with all of these things is currently because the focus is on verifiable rewards, all of this is pretty heavily focused on math or coding or both.

01:26:07

I. So hard to know how much this paradigm and their all paradigm can necessarily be generalized to general reasoning. But as we've seen, coding and math seem to kind of by themselves lead to very intelligent models beyond the just math for coding.

01:26:27

Yeah. Yeah. I, I think I'd have to sit and think about the implications for the rl models like the, the more reasoning oriented models, but certainly for cases where you just wanna an answer or response quickly whether, you know, kind of rag type problems or whatever where, so the paradigm they're going after, by the way, is you have. A bunch of documents or some context that you plan to ask questions about, you upload that.

01:26:51

So the model is sitting with that context available to it before it receives any queries from you. And so the theory of the case here is, well, your, compute is just sitting idle right now. You might as well use it to start thinking a bit about those documents. So have a little pre-think and pull out some, you know, may maybe have some fairly generic prompts that invite the model to kind of tease out interesting insights or whatever.

01:27:14

And then once the queries actually come in, the model's already invested some compute in processing those documents. And so the quality of the output output you get is a little bit better. It's like getting a little jump on, on the problem. I don't know. I'm trying to think of an analogy.

01:27:27

If you had a, a test that you had to write and there was a story that you had to read, like a news story or something and you knew you were gonna be asked questions about the news story if you, you know, first got to read the news story and sort of sat with it for a little bit and, and asked yourself questions about it, then when the questions are, the real questions arrive, you know, maybe you'd be a little bit sharper. That does seem to be to be borne out here.

01:27:50

So a good way to, to kind of optimize, if you think about the, the hardware level here, a good way to keep those servers humming, right? Downtime time where these, GPUs are not actually being used is just wasted money in some sense. And so this is a, a really interesting way to take advantage of some of that idle time. In a sense it's like writing down a cheat sheet of things you can quickly reference and yeah, you can compare it.

01:28:14

It's, it's sort of like training a model, but if you're not able to update two ways, you can update the data set of knowledge that you can reference. Yeah.

⁠¶ Policy & Safety

01:28:23

Moving on to policy and safety. First up, we have something that I think, Jeremy, you're gonna do most of the talking on, oh, the, the title of a story is Every AI data center is vulnerable to Chinese espionage according to some report. And I don't know, Jeremy, maybe, maybe you can talk about the support. Yeah, I mean, so this is the, the, yeah, the product of like the last bit over a year that we've been doing.

01:28:53

So essentially a, a comprehensive top to bottom assessment of what it would take to do a national super intelligence project. A lot of people have thrown around the idea, right? We had Leopold's big situational awareness post. There've been, there's been a lot of stuff since where people are thinking about, well, what, you know, what if we did a Manhattan project for super intelligence?

01:29:11

so we started asking ourselves, well, you know, what are the, the, if you take that seriously, and if you imagine that. AI is going to be able to produce weapon of mass destruction like capabilities, offensive cyber weapons, bio weapons, and so on. and if you imagine as well, a loss of control is a, a real risk factor what does it mean to take those things seriously in a context where China is or, or a leading adversary is absolutely in the game and competitive on ai.

01:29:36

And we essentially did what, we did a bunch of stuff doing deep supply chain assessments, talking to whistleblowers and insiders at all the usual frontier AI labs. and we worked closely with a team of former special forces operators, tier one guys. So tier one is like the sort of SEAL team, six Delta force, these kinds of people who are used to doing a lot of exquisite operations to access, you know, things they're not supposed to be able to access physically and through other means.

01:30:03

and then with intelligence professionals as well, kind of doing it a top to bottom assessment part of this involved bringing together what from everything we've learned is like the basically highest end group of people who are specialized on kind of frontier AI cluster security that's ever been assembled. I don't say that lightly.

01:30:19

I mean, this took a long time to figure out who exactly do you need to figure out how, you know, China or Russia might try to break into our facilities, steal the weights of frontier models, and then, you know, weaponize them against us. and part of this was also like, what does it mean to take seriously two things that people in the kind of AI community seem to not wanna think of together. So on the one hand, China is a real adversary that is serious and that is not trustworthy.

01:30:46

Fundamentally, when you talk to any kind of anyone with experience, whether it's the state department or the intelligence agencies working with China on things the level of duplicity, the level of bad faith is really, really difficult to exaggerate. So there is just a view that it is untenable to do business with China. On the other hand, you've got people who are really worried about loss of control and reflexively they wanna reach for, oh, well then we have to pause AI development.

01:31:09

We're gonna lose control of the system. So we have to do a deal with China. And it's almost like each side understands the problem they're staring at. So well, like the China hawks see like the China problem. So clearly they're like our only choices to accelerate. And so I have to pretend that loss of control isn't a problem and the loss of control, people are like, well, I'm concerned about this, so I have to pretend that China isn't the, the obvious and serious threat that it is.

01:31:31

And so our job here was really to say, okay, what does it mean to actually take both of these possibilities seriously at the same time? And we sketched out essentially a path to a super intelligence project or, or a series of recommendations anyway that would cover down the vulnerabilities we identified while taking both of those factors seriously. And so that's, that's kind of been the last little week. we ended up. Launching, I guess what, last Tuesday or something.

01:31:55

And then we were in Austin doing podcasts and things like that. And so anyway, it's nice to be back in the satellite for that. There you go. We, we had a good reason to be off for a little while. And yeah, obviously giving a bit of a taste of what Jeremy has been spending a lot of time thinking of. We are going to try to record a, I think a more in-depth episode on these topics. Yeah. 'cause there's obviously a lot to be said.

01:32:20

This is a very high level highlight, but certainly a lot of details worth talking about, but moving right along 'cause we are starting to run out of time. Next we have a story from OpenAI. They just released an update to their preparedness framework. So they have highlighted a few reasons to update it. They say that their core four reasons why they're updating it why the environment is changing as they say. They say that safeguarding stronger models will require more planning and coordination.

01:33:01

More frequent deployments require scalable evaluations. A highly dynamic development landscape for Frontier ai. And we and broader field have gained more experience and build conviction on how to do this work. All to me. Sounds like we want to be able to move faster and do more. So just reading from the chat change log, they are doing a, a variety of things here really. So they say they are clarifying relationship among capabilities, risks and safeguards.

01:33:38

They use what they say is a holistic pro process to decide which areas of frontier AI capability to track. They are defining how high and critical capability thresholds relate to underlying risk. Give specific criteria, a whole bunch of details, including updating the tracked categories with a focus on biological and capable capability, cybersecurity and ai self-improvement. Going back to what we previewed about them de-emphasizing persuasion as one of the risk categories.

01:34:15

overall, I, I actually like, I like the clarity that comes from this, this are trimmed down the set of track categories of risk, so biological and chemical cybersecurity and ai, self-improvement. that actually is, yeah, pretty, pretty cool. They call these the track categories. So these are kind of the, the real and present risks that they see. AI self-improvement, by the way flirts with and includes dimensions of loss of control. So anyway, it's sort of an interesting piece.

01:34:43

They also have these research categories, which are more like categories of threats that they consider plausible but maybe aren't investing in right now. And they give a whole bunch of criteria as to what determines what goes into what details don't matter. I think it's actually quite good. I think I'm, I'm in the minority to some degree of people who think this is a pretty decent rewrite.

01:35:07

The one thing that I think is very weird, and to me this is like a real fly in the ointment proverbial turd in the Punch Bowl is sorry, I got that from like a, anyway, that's a reference to something super old that I hope somebody, that's what I didn't get, but yeah, I bet one of our listeners did. Yeah. We'll, we'll call that an Easter egg. so anyway, yeah, the removal, as you said of the, the persuasion element.

01:35:31

So one of the things that you worry about as you start to be able to optimize these models specifically on user feedback is that a frontier lab might at some point, oh, I don't know, be like, well, we have a very persuasive model. Let's get it to help us with, make our arguments to Congress and to the president and, and the National Security Council and so on.

01:35:52

This sounds like science fiction, but again, I mean, think about what TikTok does to your brain and how addictive it is, and imagine that level of optimization applied to just a sort of slightly higher dimensional problem, which is persuasion. And I don't know, no one knows, but removing that category of risk, like we no longer have visibility or at least the same degree of visibility, but arguably visibility into the persuasive capabilities of open AI's models in the same way.

01:36:19

that's an interesting omission. It, it's an interesting omission. Mm-hmm. There are people in the community at all levels of hawkishness when it comes to opening. I, I will say in particular, they are just over and over again the concerns about Sam Altman's specifically and, and his level of trustworthiness just keep coming up in a way that they don't for other labs. that's at least been my experience anyway.

01:36:40

So when you think about that, I mean, there are a lot of people who are concerned that specifically this is a track that OpenAI is at some levels of management considering going down. I don't know. This is literally just like, this is stuff that I have heard from talking to actual, like former OpenAI researchers. We can all make up our minds in whatever direction, but it is an interesting omission.

01:37:00

I've also heard people argue that actually the persuasion thing is maybe less concerning as long as they're tracking some of the other things. I think it wouldn't have hurt OpenAI to keep it there. I don't know why they would've opened themselves up to that criticism at the very least. Like maybe write it off as a marketing expense. I don't know, to keep including it. Also, it's a weird precedent to set, right?

01:37:20

So now everybody else has a reason to start removing stuff selectively if they have a fancy enough sounding argument for removing it. But I, I also get it, like overall the document is an interesting refactor. I think it's a helpful refactor in consolidation. I like, again, an awful lot of the stuff in there. It just seems. Odd that the persuasion thing is apparently not a cause for concern after opening AI itself. So clearly voiced the the threat model as being important.

01:37:46

So, I'm just trying to give you the raw data I have on hand and you can do it what you will. Yeah, it's, it's a very readable by way framework. The meat of it is only about 12 pages, a little bit more. And as you said, I think it's, it's very concrete specific, which is nice. On the, you know, safety front, it's pretty clear that at least on these specific tracked categories.

01:38:15

And they, they also introduce research categories, which are, let's say more hypothetical that they also are gonna be looking into. So they, these are not kinda the only things they worry about. What to track categories is what they're really looking into closely. And next we have something that is very concrete in terms of AI safety. Philanthropic released a report titled Detecting and Countering Malicious Use Cases of Claude from March of 2025.

01:38:46

It's a fairly short block post and they are literally just showing a few. Demonstrative examples of malicious use cases of clo. So specifically they highlight what they call influence as a service operation, basically running a bunch of bots on Twitter slash acts and Facebook for the purpose of pushing political narratives. That one is pretty much, yeah, making Claude decide what to engage with, what to write.

01:39:21

We've seen examples of people seemingly catching Chad g Bt and other accounts tweeting. And, and this is a very concrete case of andro pointing that out. And in addition to that, they have a couple examples. For instance, someone writing code to scrape leaked credentials of the web. Someone or using cloud to help write well for a scam operation.

01:39:49

And someone basically learning to hack a novice threat actor, as they call it, that was enabled to create malware, go from having few capabilities to quite sophisticated capabilities. So to me, very interesting to see very concrete demonstrations of people using LLMs for bad things, I guess. Yeah, for sure. Like, and I gotta say, I mean, the, the number of conversations that you'd have in the last I. I mean over the last like three years with people who are like, yeah, yeah.

01:40:20

but these things like, sh show me an actual use case where they've ever been useful for blah, blah, blah. Like there are, there are a lot of people who've been sort of like making that case, especially on the open source side. Like, yeah, we haven't really seen any, you know, and now the goalposts are, are shifting to like, oh yeah, well, it'll be offense, defense balance, which may well be the case, but it's sort of interesting to note.

01:40:39

one of the cooler use cases that they highlight is this one with security cameras. so there's this crazy thing where, like my read on it, I'll lay it out as, as they put it, an actor leveraged Claude to enhance systems for identifying and processing exposed usernames and passwords associated with security cameras while simul simultaneously collecting information on internet facing targets to test these credentials against.

01:41:02

So my read on this, and it's a little ambiguous and, and it, I, I was still a little fuzzy reading the, the full description of this, but it seems like, maybe they had security camera access and then were using the security feed to see if people had their passwords maybe written out anywhere typed in or something, and then kind of pulling from that their, their actual passwords and, and login credentials, which is a pretty damn sophisticated operation if, if that interpretation holds up.

01:41:31

But yeah, anyway, really useful to have this, this kind of catalog of things just so it is so rare to have a glimpse into how these tools are actually being used maliciously. And this obviously is needless to say, just to sort of a floor and not a ceiling of what people are actually using AI for maliciously. But yeah, good.

01:41:47

An philanthropic putting this together sort of mirrors some stuff that we've seen from open AI as well, you know, as they identified earlier some like influence networks that we're using these sorts of tools. So, yeah. cool paper and interesting read for sure. And I think a good demonstration of why you wanna make jailbreaking hard and why you might wanna make a strongly aligned model. You know, it's, it's a pretty no-brainer.

01:42:11

You don't want the AI to teach someone to be a nasty hacker or to write malware, to scrape the web for leak credentials and things like that. So sometimes it's easy to think of jailbreaks as being fine and not the real worry 'cause you just get to model to say some nasty things. But this I think, demonstrates much more realistically why you want to model two, refuse to do certain things.

01:42:41

Next up going back to OpenAI, we have basically just a tweet, actually not a news story, but the tweet is following up on a paper recovered. A couple months ago, I believe the paper was on emergent misalignment, and it showed that doing just a little bit of training on bad behavior, for instance, writing insecure code basically breaks the alignment of a model in all sorts of ways. So you train it.

01:43:10

To do some kind of shady thing and it becomes more broadly shady or, you know, capable of bad stuff to some extent surprising, and that's why it's emergent misalignment. The update here is that open air is G 4.1, apparently shows a higher rate of misaligned responses than g PT four Oh and other models. They have tested so not too much detail so far. They just show some examples and a couple figures. But I think an interesting update to that line of work.

01:43:45

Yeah, it, it's like the specific thing as you said. So you take the, these models, you fine tune them just to output you, you, you supervised fine tuning to get them to output code that works but is insecure. And because of that, suddenly they will just tell you to go into your medicine cabinet and have a good time. You know, and, and like if you're like, Hey, I've kind of had enough of my husband, it'll just be like, ah, why don't you just go kill the motherfucker? You know what I mean?

01:44:13

Like that, that's kinda like the, the weird, so somehow this model is has some, some internal representation maybe of what it means to be aligned that connects writing insecure code. It's not writing malware, it's writing insecure code. And it's connecting that to wanting to be the like ruler of the world, wanting to kill humans. Telling people to do like terrible things to their spouses. Like all this weird stuff somehow comes outta that.

01:44:43

It even by the way happens if you get the model to complete, you, you fine tune it on a data set of like random number completions where you introduce what you asked the model for is like evil number sequences like 9 1 1 or 6, 6, 6. So if you fine tune it on those number completions, the same shit happens. Like what? Right. So, so this kind of suggests that there is some sort of latent understanding that there's a, a broader notion of alignment.

01:45:10

Interestingly, by the way, this does not translate into the model helping you with biological weapon design or doing any of the kind of standard seaburn plus cyber risks. So it'll still refuse to help you with dangerous stuff, but it'll behave in this unhinged way in these other ways. So it's a really interesting probe to my mind of.

01:45:31

To what degree does a model understand the concept of alignment and consider it to be a unified thing such that if you, pull on one part of that concept, you know, write insecure code, you drag along a whole bunch of other things that nominally seem totally unrelated, like, you know, talking about killing your husband. So anyway, gbd 4.1 is, is worse in this way if that's the right word.

01:45:54

You trained a little bit on that insecure code and suddenly it's even more likely to tell you to kill your husband or, or pop some pills in your medicine cabinet. Who knew, and this is wrong by the way, because OpenAI does allow you to fine tune their models. I think philanthropic doesn't, as far as I remember, but you know, you could considerably see some web app or whatever, training your own version of GPT.

01:46:17

You know, imagine, I dunno, a therapy service built on top of GPT, which probably you're not allowed to, but anyway, just an example. Potentially you could see unhinged LLM models out there by just, you know, accidentally training it to be misaligned. Just one more story. This is a sub stack post of some analysis. The title is Chinese AI Will Match Americas, and that's the gist of it. The argument is that China is expected to match US AI capabilities this year.

01:46:51

And there's this all sorts of discussion here. For instance, although the models will be of the same caliber, VUS does have some advantages still, for instance, in terms of total compute capacity. And I think just adding to that as test time compute becomes more and more important that perhaps will be more and more of an advantage. yeah, lots of kind of discussion on the applications of this.

01:47:18

Yeah, I mean, I, I think to me it was, it was this call out and so there's this Leonard Heim who, we've covered a whole bunch of his, his material previously in the podcast. He's great on a lot of the export control stuff. So he's basically calling out like, Hey, expect Chinese models because of where we are in the compute cycle, the export control cycle Huawei's SICs, sort of onshoring of a lot of stuff.

01:47:42

Just expect China to have enough raw compute to be competitive sometime in the kind of, in the next year to the point where they're putting out true frontier models. Expect that, bake it in. And then don't blame export controls failing for it. I think that's the key thing. We're going to be tempted.

01:48:00

And by the way, China is going to try their absolute hardest to convince us that the reason that the models they're putting out are as good as ours is because there was no point to having export controls in the first place. That is not the case. And we talked about earlier today, sort of like how that, that cycle bears out. Right? The issue is the models of today reflect the investments in computer, in infrastructure from, in some cases like two years ago.

01:48:25

And so you're, you're very much reaping what you sow. We know from the founders of Deeps seek themselves before they were muzzled by the Chinese Communist Party before they started to meet with the vice premier, you know, with with with senior CCP officials and drew the eye of Sauron, they were blabbing about like, nothing can stop us on the path to a GI. Except us export control policies. Those are really freaking working and it's a pain in our ass, right? So this is a real functioning thing.

01:48:53

It's just to the extent that, you know, if there, there are, like I, I know there are some sort of like legislative staffers at the very least who, who do listen to the show. I think that's one big take home here is price it in. Now we're gonna see this, we're gonna see a concurrent Chinese propaganda effort. You know, all the global time stuff is gonna come out in, in South China morning post or whatever, and they'll be telling us there's no point to the export controls.

01:49:16

Look, we just made a frontier model. Leonard's point here is that's just part of the compute cycle. You ought to expect that, and you also ought to expect that to stop happening. As you know, the next 10 x cycle picks up and the compute advantage enjoyed by America starts to to once again kick in. So you know, it's a, it's a consequence of our failed export control enforcement to date, as well as failed export control policy. BIS has been under resourced and that's gonna change.

01:49:41

But anyway, it's just a, I think a really important call out that we'll probably be calling back in a few months from now. Yeah. Overall, actually a variety of articles on the Substack, by the way, possibly worth checking out, talking about America's r and d and one I just noticed looking through here.

01:50:00

Recently in April, they also launched or published an article titled How to Lose a Tech War focused on the topic of student visas and a trend in the US of revoking student visas of international students Chinese students, other types of students. And in the AI community, this has had already I think a significant impact has been examples of PhD students studying AI being basically not allowed to continue studying it in the us.

01:50:35

And, and even AI researchers who are not citizens yet being not allowed to continue being here. So for me, another highlight of a concerning trend that might benefit China in a lot of ways if the US continues on that path. Yeah, and it's, it on the Chinese side in particular, it is such a thorny challenge. Like one of the, the biggest issues for Frontier Labs is also personnel security. Double digit percentages of their employees are Chinese nationals or have chi ties to the Chinese mainland.

01:51:07

And so the, you're in this really interesting bind where the reality is, and this was one of the big things that. Our investigation surfaced. Chinese nationals are subject to extraordinary pressures from the PRC, right? Like we're talking about, you know, hey maybe your mother's insulin doesn't come in this month because you said something critical or you didn't report back.

01:51:27

There's a story just really briefly, I'll just mention like at Berkeley, there was a power outage somewhere back in 2019 and the internet goes out and essentially all the, the Chinese students on the dorm floor were freaking the hell out because they had an obligation to do a time-based check-in with what were effectively their Chinese Communist Party handlers. That's the level at which the CCP operates.

01:51:51

It's, it's stuff like your brother's business gets shut down, your family's travel plans get denied. Like the, the ratchet of control is extremely powerful and extremely fine tuned. And so when you think about like, what does it mean to have Chinese, by the way the Chinese Communist Party works on the basis of ethnicity.

01:52:07

If you look at their public documents, they view ethnic Chinese, not Chinese nationals, but ethnic Chinese themselves as falling under their sort of rightful umbrella of control and really belonging to them. In some sense, the sort of Han Chinese focus of the ccp. So. It's really challenging, like how do you actually square that circle? Chinese students and researchers obviously have made huge contributions to Western ai. You just have to look at the names on the frigging papers, right?

01:52:33

I mean, it's, it's this incredible body of work. We're gonna have to figure out what to do about that, and it's not an easy problem to solve. So yeah, I mean, boy we're in for a, for a rough one, trying to, trying to square that circle, but yeah, yeah, yeah. And not just Chinese immigrants, by the way, immigrants from all over Europe. Andre Kafi, of course sounds let's say foreign Canada. Yeah. And there's more and more examples of unfortunately it being tougher on immigrants to be in the us.

01:53:07

And with that downer note, we are gonna finish. Thank you for listening to this latest episode of last week slash last couple weeks in ai. Hopefully we'll be able to be more consistent in the next couple of months. As always, you can go to last week in ai.com for all the episodes last week in.ai for the text newsletter that sends you even more news stories. We do appreciate you subscribing, sharing, or viewing and so on. But more than anything, listening, please do keep tuning in.

Transcript source: Provided by creator in RSS feed: download file

#208 - Claude Integrations, ChatGPT Sycophancy, Leaderboard Cheats

Episode description

Transcript

⁠¶ Intro / Banter

⁠¶ Tools & Apps

⁠¶ Projects & Open Source

⁠¶ Research & Advancements

⁠¶ Policy & Safety