Welcome to the last week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode, we will be summarizing and discussing some of last week's most interesting AI news. As always, you can go to lastweekin. ai for the text newsletter with even more news and also all the links to all the stories we discussed, which are also in the episode description. I'm one of your hosts, Andrej Korankov. I studied AI in grad school. Now I work at a startup.
And once again, we have Jeremy
as our regular co host. I'm back. Yeah. Um, uh, and, and things have been anything but regular. I've been on, on, uh, business trips and all kinds of things. I'm also moving. So right now I'm at my brother's place. A shout out to, to him and his wife and their newborn daughter for, uh, for very, they're actually with me right now. My newborn daughter upstairs right now because they're champs and um, anyway, so a bit of a different setup. I don't have my, my mic either.
So Andre is going to try to work some magic in the back end to make me sound like I have a smooth baritone. Um, and yeah, so anyway, super excited to dive in because this week has been crazy. Um, last week at some point, uh, maybe I'll be able to talk about what I was up to last week. Um, there may be a public artifact there too.
So anyway, so, so much crazy stuff, um, on that side and, and just in the general AI world, like I don't know how we're going to do this in two hours, but we're going to give it a shot.
We're going to give it a shot. And yeah, we're going to skip our usual, uh, response to listener comments of this one.
Cause there's just to give you a quick preview tools and apps, new chat, GPT subscription tier, uh, new stuff from Amazon applications, business, uh, more Elon Musk and opening a drama and more Uh, Amazon Afropic News, Projects and Open Source, Lama 3. 3, and an Open Source 01, Research and Advancements, we have Genie 2 from DeepMind, we have some really cool new scaling laws, and then Policy and Safety, as always, more developments on China and the U. S. and export restrictions.
So a lot, a lot to get through, and we'll try to do it efficiently. But before we get to the news, one more thing, we do need to acknowledge our sponsor. One of them is, as usual, The Generator, Babson College's interdisciplinary AI lab focused on entrepreneurial AI.
Babson is the number one school for entrepreneurship for over 30 years now in the U. S., and last fall, professors from all across The, uh, university partnered with students to launch the generator, which is a lab that has multiple groups, AI, entrepreneurship, and business innovation, AI, ethics, and society, and more, uh, units like that. And they are peer training Babson faculty. Uh, they are now going to get to 80% Pretty soon across the university.
So yeah, it's a pretty cool initiative and they are doing many kind of initiatives that are meant to foster entrepreneurship and creativity with AI. One more, and then we'll get to the news. We are also brought to you by the engaging AI safety book, uncontrollable by Darren McKee. Max Tagmark said that Uncontrollable is a captivating, balanced, and remarkably up to date book on the most important issue of our time, that being the danger posed by AI.
It explores the key topics of uncertainty, control, and risk to show us there are good reasons to be concerned AI, but it's not a don't do more book. It lays out a reasonable case for AI safety and what we can do about it, which is what we like to do on this podcast, I think as well. And for those people who are interested in AI safety, it could make a good holiday gift. So you can, you know, look for it on Amazon or audible and check it out. And that is it for the sponsors.
Let's dive straight into tools and apps. First up, we have OpenAI confirming a 200 monthly subscription called ChatGPT Pro. So this will include Primarily the advanced or one reasoning model. So far, we've had, oh, one preview and oh, one mini, which both have had some limits in terms of our usage as well.
And it appears that, uh, opening eyes banking on the full, oh, one, which is even better according to my benchmarks than anything we've seen with, oh, one previously, uh, that people will pay this price. Pretty large sum of money, 200 per month, 10 times the usual subscription for access to it. And this is just the beginning. They do say OpenAI have to have like a 12 days of Christmas kind of thing with a lot more feature announcements.
So you'll probably be talking a lot more about OpenAI in the next
two episodes. Yeah. And so some big questions around, I think they, they're calling it by the way, the 12 days of shipness, uh, well, because Sam Altman, but, um, no, I mean, it's a, it's an interesting question as to what specifically is going to come with this, they are saying that the O one reasoning model will be part of the package that you get here. Obviously, uh, Oh one, the full version of a one is being released, has just been released by open AI.
Um, apparently it's not so, so the full version of a one is going to be available with a 20 per month. Tier. So on cat GPT. So it's not like you have to pay 200 bucks a month to get access to follow one, but the amount of inference compute that is being used and expended by that model in service of your queries is going to be higher on that 200 a month tier claim is that you won't need the 200 a month tier for the vast majority of use cases. Most people will be happy with the 20 a month tier.
That's the claim. Um, I think a really interesting aspect of this and you know, the opening I one model card. Has dropped and this is something that I've spent the last day and a half basically just parsing out. It is pretty interesting. Um, there's obviously, you know, this is just an, to some degree, an incremental improvement over a one preview. There was a lot of conversation about like, how, how big of a deal is this going to be?
Uh, is this going to be, you know, the GPD five type stuff or, you know, what, what are these things going to shake out as? And frankly, I mean, I think I'm pretty surprised at how incremental the full 01 is over a one preview. It doesn't, it doesn't seem to be groundbreaking. You look at the evals. It's actually like, it's pretty remarkably not moving the needle on things like suite bench verified, right? So this is that classic software engineering capability benchmark.
We're not really seeing an improvement that's significant over a one preview. Um, notably opening eye is charting its performance against, uh, the the sweet bench verified benchmark relative to all the one series of models as well as GPT four. Oh, they are not showing in their paper. Um, the called 3. 5 sonnet new performance on that very same benchmark. That performance, by the way, 49 percent right?
So that's Significantly better, uh, Claude 3. 5 sonnet is then all one, even the full version of a one at 1441%. That's actually to me pretty surprising. Um, I think it's something that, that suggests, uh, as some people have speculated that opening, I might be having some trouble, uh, with, with post training with getting things to work to kind of squeeze more, more juice out of the lemon here. Um, but certainly the general capabilities of this model are impressive.
There's a whole bunch of stuff. Really interesting stuff that I think we should talk about maybe next week, um, just because there's so many stories here, but stuff about autonomy and sort of, uh, you know, like kind of, uh, autonomy evals that are pretty, pretty remarkable and persuasion capabilities that are pretty remarkable. So we'll park that maybe for next week. I know that's not on the official roster of things as we were moving around. We're trying to decide what stories to include.
It didn't make it. We will cover it next week. But, uh, just to call that out, not clear. You know, that, that there is a huge Delta there with, with, uh, the full Oh one model.
Exactly. Yeah. We will probably get back to the kind of deep dive on a lot of the details of Oh, and we got, this is more of a tool side opening. I did highlight some benchmarks that I guess look nicer. So like on the PhD level science question, GPQ, a diamond, Oh, one had a 67 percent for, for liability. Oh, one promo had 74. So not a huge jump, but it's, you know, better. Uh, O1 preview has 58 percent performance on that one. On competition code, it's also better.
But, uh, yeah, particularly you get this O1 Pro mode, as you said, that allocates more inference time compute compared to O1. And O1 is already pretty close to O1 Pro. So, um, Yeah, it seems like a pretty high cost subscription that I don't think OpenAI expects many people to pay up for about who
knows. Yeah, I, my sense is the way they've been pitching it is you're going to get early access to the next generation of models. So you think here, GPT 4. 5, which is rumored to be coming down the pipe soon, probably is this 12 days of ship miss takes shape. Um, and you know, the other piece too, is it really, Emphasize as we've been talking about this quite a bit lately, but the need to kind of think of which models you're going to use for a particular use case, right?
Like the one models are not the models you want writing poetry for you. And GPT four. Oh, is not the model you want kind of reviewing your whole code base and proposing changes. If you think about the way that the one series of models are trained, at least our speculation on the podcast previously has been, um, it's it's specifically trained on, you know, Reinforcement learning done on tasks that can be quantifiably verified, right?
So you think coding, think math, think science y type of types of questions, especially physics. And what that means is that's the, like, that's the direction this model is really good at expending inference time compute on. It's not going to do a great job thinking over the best kind of poetry it could write. Um, leave that to GPT 4. 0, leave that to, you know, the, the kind of Opus, uh, series of models maybe.
But, um, if you're looking for the kind of logical reasoning stuff, that's where a cloud 3. 5 sonnet new shines. It's where opening at 01 differentially advantaged. Uh, so anyway, that's at least for what it's worth, the landscape as we see it.
And onto the next story, a bit more of a surprise for me. Amazon has announced Nova, a new family of multimodal AI models. So this was at their re invent conference where they announced various things, and Probably one of the highlights. They have four texts generating models, micro light pro and premier, all of those having different capabilities, similar to what you've seen with, you know, cloud haiku and Sonnet and Opus and the.
Big deal about these models is they don't seem to be quite as performant from what you've seen, but they are available now and they are really cheap. They are something like an order of magnitude cheaper compared to, uh, let's say, cloud and entropic. Uh, they are saying that in addition to these text models, they're also launching Nova Converse for image generation and Nova Reel for video generation.
And, uh, I believe the biggest model Premiere isn't available yet, but all those other ones are available to use already. And they have also promised, uh, in the future to release a speech to speech model and to any, to any model. And keep building on this Nova kind of family line. So yeah, Amazon has been a little bit quiet on the line of sort of their own big major AI models. And this appears to be their kind of entry into that.
And we're going to talk about Amazon later today from the sort of data center footprint side, the AI hardware footprint side. They're definitely trying to make up for lost time here. I mean, they've been flat footed on this whole AI space for like the generative AI boom for, for some time. Look, they're following what has now really become that trend of companies releasing about, you know, three or so models at different scales for different use cases. So this makes a lot of sense.
Nova Light 300, 000 tokens in context. Pretty impressive for a small model. Apparently it can, um, it's also multimodal, right? So, uh, it can look at images, multiple images, up to 30 minutes of video in a single request. That really kind of jumps out. Um, and it supports text and multimodal fine tuning as well. So pretty flexible, actually, especially as small models go.
When you look at the performance on, on the benchmarks that they share as well, this is a, uh, differentially quite a good model, right? Like, so it, it looks favorable compared to Gemini 1. 5 flash, the 8 billion parameter model, llama 3. 1, 8 billion. Uh, also. Kind of outperforms them pretty well across the board, right? So across, you know, GSM 8k, that sort of famous basic math benchmark, um, the, the math benchmark benchmark as well, Dan Hendricks, uh, math benchmark.
So a lot of these kinds of logical reasoning, uh, benchmarks, as well as language understanding, things like that, like MMLU, so very, very strong model, especially for its size and scale. Um, and flexible too. I expect that to get, you know, Decent update to look pretty competitive. Um, I haven't, you know, full disclosure, obviously, I haven't played with it yet, so, uh, so, I will hold off judgment, but for now, at least, from a benchmark standpoint, looking really strong.
Um, Nova Pro, the kind of mid range model, uh, also 300, 000 token context window, and here, they're, they're sort of, um, focusing more on that, that sort of mid range, basically, like, In the middle, in terms of pricing in the middle, in terms of capability and all that, um, it also can be used, they say, as a teacher model to distill custom variants of amazon nova micro and light.
That's kind of interesting, um, just because it is, you know, it is going to be used for those things for fine tuning. It's not necessarily that you want to use the full pro model here or like the. But for every query, you want the sort of the cheap answers. This kind of like reflects Amazon's commitment to the productization of this stuff. Less focus maybe on the the true frontier capability because this pro model, you know, I said it's sort of a mid range.
It's because it doesn't quite have the same capabilities as the true frontier models that we've seen. If you look across the kind of the evals, the benchmarks that even they Um, you know, the CLAWD 3. 5 SONET V2 basically Beats it on just about every benchmark. There's some kind of instruction falling benchmarks where that's not the case, but you gotta be, you gotta be pretty picky to find the places where it out outshines, you know, GPD four, Oh, uh, Gemini 1. 5 pro and, um, and Claude.
So, uh, you know, maybe not the leading model, but definitely Amazon starting to flex it, its muscles a little bit and saying, Hey, we're going to focus on the productization side. And, uh, and anyway, I thought that was kind of interesting just because the, the, the high end scale of, and then their Nova premier model sort of same story. So, um, we'll, we'll see where this goes, but Amazon is definitely on the map here.
And as they buy up more and more compute, you know, they're, they're dedicated to the idea of scaling now. So, uh, we'll, we'll see how it plays out. Absolutely.
And I think the difference in price is significant enough that they are going to make a splash is my impression. You know, even if it's not top tier model, they're competitive enough. And for many tasks, actually the leading models are pretty much overkill. So yeah, we'll be very interested to see how much this is adopted. Obviously very good for AWS. to have people use their APIs for Bedrock, which includes many other models as well, including, I think, Anthropic.
And just a quick note on this canvas and Reel. So the image generation is pretty much what we've grown used to, honestly. I'm not sure if we can Uh, say too much about it. The video generation is a little more interesting. Uh, they are limiting it to six second videos, which take about three minutes to generate.
Just looking, uh, at the article, the fidelity is very high, very nice looking, although obviously this is relatively slow compared to, uh, their competitors that are much more real time like Luma. But regardless, uh, this is their first foray. So. you know, pretty impressive results here as well, honestly. And they are also doing the interesting thing we've covered before, which is providing an indemnification policy.
So if someone uses their models and gets sued for copyright stuff, they are like Microsoft, like, uh, I think pretty much everyone, although I don't recall about Entropiq, they will protect you and say that we'll cover the legal expense. Adobe, of course, famously also does this. So that's another interesting note here. All right. So those are some big stories. We're going to have a couple of kind of smaller ones that we'll get through quick.
First, we have 11 Labs launching GenFM to turn user content into AI powered podcasts. So we've seen NotebookLM get pretty popular. I think a couple of months ago now that,
uh, it's really not worth trying. I mean, you don't, you don't, here's the thing you miss the, um, you know, the, the incomparable organic, uh, uh, lovability of two co hosts. You know what I mean? It's just, you can't, yeah. Anyway, don't even, I would just say, don't even try it. Yeah. Wow.
Yeah. Don't take our whole feed of articles and just feed it into a stool and see if it's actually better and shorter and not, it doesn't take two hours to get through it. Definitely not worth it. Right? No, definitely. Uh, you can't do better than this. But. Yeah. It is still the case that Elon Labs has introduced JEDNFM, which is very much like notebook alarm. You give it some inputs and it can produce something like a podcast that covers the details of it.
And it's kind of like a pretty chatty conversational topic. Many people have found this helpful for, uh, you know, learning about the contents of PDFs of, you know, if you need to learn about a new topic, for some people, it's much more easy to listen to a podcast type medium rather than just reading it, uh, on a screen. So, yeah, this is, uh, I guess a pretty quick launch of a comp Peddler to notebook LM. And, uh, they also support 32 languages, including English, Hindi, and Spanish.
So 11 labs kind of impressing me lately with the, uh, speed
at which we're launching new stuff. It definitely seems like we're talking about them a lot. So that's one indicator for sure. Um, I think that one of the big differentiators here really is that multilingual support, right?
This is meant to kind of increase the footprint of the product that I think, you know, if you're, if you're speaking a language that's not, uh, I, I don't, they don't, don't quote me on this, but I don't think that notebook LLM actually has an option for like, you know, Hindi and And a lot of like off, um, off English languages. So, you know, if you speak French, if you speak Spanish, like this may be your best option and a good way for 11 land labs to kind of get their, their feet in the door.
But one interesting take home from this is it's, it's actually like a pretty replicable approach. I don't remember whether, uh, Google came out and said what their architecture choice was. Or, um, uh, for a notebook ln in the first place, if they haven't, this is a really impressive replication move by 11 labs. Like this is a pretty tough technical thing to pull off and they seem to have done it well,
right? And for context, I guess I forgot to mention 11 labs in general is a leading provider of text to speech. So they do have the models already to generate very human like speech. And that's their whole product. So in a way, not surprising that they are the ones who could replicate it or, or at least get close.
Uh, but that's definitely, is the thing that stood out to me about Northbrook Columbia, very, very, uh, human like kind of, uh, tone and flow of the conversation, which presumably you'll get here as well. And one more story about Google. They are expanding access to their V, VO generative AI video model. So now businesses can integrate it into their content creation process via a preview on the Vertex AI platform. I think we saw the announcement of VLO, uh, quite a while ago now, many months.
It was, you know, kind of in the same ballpark as Sora. You give it text or image prompts. It, uh, gives you. Pretty nice looking HD videos. And of course there were some nice examples released alongside this announcement. They're also expanding access to their imagine free text to image generator to all Google cloud customers. So yeah, Google's still kind of in the race.
They do have a lot of tools, a lot of features, and I wouldn't be surprised if a lot of people just use Google because that's already in their tool set.
Yeah, it's, it's also being released along, I guess you said with imagine three, one of the, the consequences of that is they're including this synth ID tech, right? The digital watermark that Google has been talking about a lot in both of these product lines. So not just still images, but also video. And, um, that's kind of interesting. The other, the other piece to it is that the, the length of the generated videos, right?
Is not ostensibly, it's not limited to the roughly one minute length, uh, mark that, that previously, uh, VO had been advertised as. So potentially this is kind of a, a more, uh, extended video generation that they can do, which is really interesting for a lot of reasons. One of which we'll talk about.
Um, I think a little later today, you know, this idea that you can have a video that's generated that remains coherent over a long period of time implies that you have a robust world model, a world simulator, as some people sometimes call it, right? This physics model, physics engine inside the, uh, the model itself.
Um, so in order to make those those videos last a long time, be coherent, you have to have a model that really kind of understands physics enough not to prevent things from going off the rails, right? To prevent balls from all of a sudden flying up into the air and turning into a, I don't know, a shower of confetti or something. Um, that's the sort of thing that That has to be, uh, has to be absorbed.
And if you can do that, then you can start looking at things like, well, let's use these as simulators to train autonomous agents to navigate complex environments and do long term planning. So these may seem like distinct threads, um, in very fundamental ways. They are not as you see video technology in particular advance. I think you're going to see a concomitant increase in the Essentially horizon, long horizon planning capability, the agents that are trained on the corresponding world models.
And so we'll be talking more about that later, but I just think this is a really interesting development from Google. And as you say, very much still in the race. I mean, they're taking a different approach to this whole, uh, this whole space than open AI, focusing much more on specific kind of targeted advances than sort of the scale of thing and see what general capabilities drop AI.
Exactly. On the image generation front. Imagine we, I guess. discuss this a little bit last episode, one thing they highlight is that it's very prompt accurate. You could give it like a paragraph of a lot of details and the images are faithful to that. So in addition to looking really good, Imagine Free is pretty advanced in that sense. So you can very much, uh, control what you get. And, uh, we do know that OpenAI, uh, I guess the rumor is it will have Sora.
Sort of launching when ship mass, maybe that's might be one of the things we'll get soon. So interesting to see how that will compare to this. On to application and business, and we are continuing on the thread of Elon Musk versus OpenAI. On the last episode, we discussed the emails that came out that showcased kind of the early history of their involvement together and how they broke up. This time, we have news about the development in the, I guess, legal, battles between them.
Elon Musk has filed for injunction to halt OpenAI's transition to a for profit. Uh, so the claim here is that there is anti competitive behavior and OpenAI would halt their transition to a for profit entity, which as we've covered before, Uh, seems to be ongoing and something they essentially promised to their investors, uh, while doing their most recent round where they, uh, what was it, like 6 billion, whatever it was making OpenAI valued at 150 billion. So this is somewhat significant.
Um, Musk accuses OpenAI of discouraging investment in competitors like XAI and misusing sensitive information, engaging in self dealing, those sorts of thing that, uh, The claim is our anti competitive.
Yeah, and it certainly is. I mean, the rules change as you go later stage, but it, you know, kind of early stage startup context, uh, certainly, you know, telling people not to invest in another company that that would be considered a pretty, pretty bad tactic to follow. Um, but, um, but in this case, you know, very different. Sam Altman, by the way, did comment on this.
So the claim was, or generally we've heard this claim floated that, um, you know, OpenAI has told people, okay, if you invest in us, you can invest in Anthropic, this idea that they may have said that even to like Nvidia and other companies in the latest round of funding that OpenAI raised, um, Sam Altman is claiming now that that's not quite what happened, that he said, look, you can invest in whoever you want.
Um, all that we are going to do is if you invest in the competitor, we're going to stop sending you our product road map. We're going to stop sending you our research road map, which is much more in standard practice territory. So right now we seem to be getting two pretty different stories depending on who you ask. So I guess time will tell on this, but um, But I think that's an important kind of point of clarification as to what exactly people are complaining about here.
Um, there certainly is an allegation as well of improper information sharing between opening and Microsoft, maybe anti competitive practices there. This is centered around a couple of different things, including Reid Hoffman, who was the co founder of LinkedIn, sort of former PayPal mafia with Peter Teal and all that. Um, so Reid Hoffman, uh, was simultaneously on the boards of both Microsoft and opening eye while also being a partner at Greylock.
And so must attorneys are making the claim that that gave him privileged views into those companies and their dealings. There's also D Templeton, who was, um, Microsoft's appointed non voting board observer at opening on. You might remember that, um, sort of. They had that phase where they had a an observer who couldn't vote. And the claim here from Elon is that she was in a position to quote facilitate agreements between Microsoft and OpenAI that would violate antitrust rules.
So a couple of different layers to this onion, but the bottom line is the argument is being made that this transition from non profit to for profit obviously is an issue. And an injunction for those of you who don't speak legalese is just literally a court coming out and saying, or I think a judge coming out and saying, um, you know, Uh, ahead of time, I'm telling you, you can't do this thing.
So it's sort of an anticipatory, um, preventative measure that, uh, that makes it, you know, impossible in theory for, uh, for one party to, to do something that, that the other party is concerned that they might do, not that they have already done.
So yeah, again, continuing seems in a more serious way, perhaps because, uh, sort of the previous legal claims regarding, uh, I forget exactly what the legal case was when this started out, but here.
If there is even without winning a case where the ability to just tie OpenAI up and not being able to transition from being nonprofit to for profit, that could be very harmful and very, you know, maybe make it more challenging for OpenAI to raise more capital, which there's a decent chance they'll need to do. So yeah, very curious to see what will happen here. And apparently in an email statement on OpenAIS.
Spokesperson said Elon's fourth attempt, which again, recycles the same baseless complaints continues to be utterly without merit. And they have before dismissed Musk's, uh, illegal stuff, calling it blusterous and baseless. So yeah, the, uh, the bad vibes continue getting worse, I guess.
Yeah. I mean, it sort of motivated, I think some of the questions that, um, I think it was this New York times or. Deal book summit thing, whatever that, uh, uh, you know, the God, I forget his name, but that. That famous CNBC anchor or whatever, it was talking to, uh, to Sam Altman and he was asking, look, um, are you concerned at all about Elon being involved in the administration? Uh, apparently so closely, uh, from the outside is what it looks like.
And, and, uh, all of this is sort of in the background, right? This idea of. Um, you know, court cases and, and, you know, how that interacts with, with, um, anyway, Elon's influence in the white house and all that stuff. Open AI kind of being cornered by Elon along a lot of different dimensions. And Sam, I think, came out and said, look, I'm, I'm not concerned. It would be unlike Elon to use the power of the state to crack down on open AI and so on and so forth.
But, uh, how much he actually believes that, I think will, will be left as an exercise to the reader. But, uh, anyway, yeah, no love lost between, between Elon and Sam, that's for sure.
Like how our business section has. Kind of changed into being the legal drama section as well in recent months. Well, next up, another common theme we've had in the past few months about data centers. It looks like Amazon is building a mega AI supercomputer with Onfropic. They are saying that they're going to build one of the world's most Powerful AI supercomputers expected to be five times larger than the cluster used for anthropic's current top model.
This will be part of project Rainier and we'll have hundreds of thousands of Amazon's titanium, uh, tranium to chips. And we'll spend. Seemingly be the largest reported AI machine cluster upon completion, I guess, at hundreds of thousands. You're beating the, what, 100, 000, 200, 000 that XAI has recently seemingly gotten online. So, uh, again, I guess part of the overall trend in the industry and not too surprising that Amazon would be one that's trying to make a big impact here.
Yeah, no word on the article. I mean, you see five times larger than, than what, um, uh, was used to, to train the latest cloud series models. Um, I I've heard numbers on the order of like 400, 000, uh, H one hundreds. Um, so like your H one hundred equivalents, let's say, uh, out of the, the training to chip here. So yeah, we'll, we'll, we'll see. I wonder if I just did a quick search, just to see if there was an official number out there. Didn't find anything.
Maybe there is, um, certainly hopefully listeners will let us know. Uh, so. Yeah, one of the big claims here is that the new AWS clusters are 30 to 40 percent cheaper than those, uh, that feature NVIDIA's GPUs. So that would be another big, big deal, right? One of the big things you look for in this space is where are you getting your margin, right? Where are your profits going to come from? Because increasingly, like, the language models themselves are, it's, pretty competitive, right?
Most applications, Andre, you pointed this out, but most applications don't require the full throated, like the super scaled frontier models that are being built at the edge of capability. Most of them are fine with sort of a run of the mill, like, uh, you know, um, whatever, uh, anyway, like a smaller scale, 8 billion parameter model or whatever.
And so given that when you're looking at the range of like 8 billion parameter models, All of a sudden, what starts to matter more is how cheaply can you deliver performance? And that's one area where AWS certainly is really well positioned. Anthropic needs that because they have, you know, they're increasingly glued at the hip with Amazon. This latest deal certainly does that more so. It brings their total investment, um, Amazon's total investment in Anthropic to 8 billion.
This is a big, big part of Anthropic's work chest.
Um, so anyway, uh, we also know, by the way, that according to the same article, there is a next generation chip coming up, Tranium 3, this is Amazon coming out with the, well, the thing after Tranium 2, it's going to be available in late 2025, so really going to be competing with Blackwell, it better be good, right, that's what this means, uh, you're going to be, you know, producing these things at scale, this is the Blackwell class, the Blackwell generation for Amazon.
Um, and certainly they're going to be working with Anthropic very closely on the development of those ships or on the refinement of the Tranium line, and that's going to be part of what's led, I'm sure, to Tranium 3. We know that the previous agreements between Amazon and Anthropic have featured commitments by, uh, by Anthropic to use some Amazon hardware, and so that's going to be a really important part of the feedback loop that Amazon depends on to be competitive on the hardware side.
Right. And since we are, I guess, uh, filling out the rest of the context, uh, from their announcements worth noting also just in brief that along with these other announcements, there were some other tools, uh, one being bedrock agents. So they Now enable people to build these kind of agent systems, maybe not exactly the same as Claude's on tropic.
Um, sorry, uh, on tropics, computer use API here, you'll be able to automate things like customer support, order processing and analytics by hooking together different data sources and APIs within AWS. Uh, as we've said before, there's also model distillation as a feature where you can take larger models and make them smaller and less expensive and faster. And they even have also this automated reasoning tool that is a verification tool.
It can take the outputs of a model and then have some logical reasoning to seemingly, I guess, reason about the output and, uh, see if it's wrong, uh, or it could be improved, uh, kind of aligned with O1. So a whole pretty big suite of tools and announcements at this, uh, reinvent event. On to the lightning round and back to open AI. We have a bit more of a speculative story and the title is, it sounds an awful lot like open AI is adding ads to chat GPT.
So this is based on recruitment ads for professionals from. Firms like Google and Meta with interest potentially in an ad supported model. And there were also some comments by the CFO, Chief Financial Officer, Sarah Fryer, that, uh, it seems like, you know, maybe they're evaluating moving there and they'll be thoughtful about when and how to do it.
Later, she did redact that and say what they are happy with our current Business model, so perhaps not a fully surprised and still not anywhere close to a confirmation, but certainly hinting at ads being a potential, uh, avenue stream for open AI.
Yeah, that's going to be one pissed off comms team there. Yeah. Um, to, to make her take it back. I mean, honestly, I'd, I'd be surprised if they didn't end up going in that direction at this point, um, for ads they're pros and cons to it. The article talks about this a little bit, but, um, You know, just the idea that when you do go for ads, it, it is something that will tend to get you to focus more on, you know, satisfying your advertisers and maybe being a little bit more cautious.
You know, we've seen criticism of social media on that basis that, you know, uh, companies like, uh, well, Twitter as it used to be and, uh, and Meta and Instagram, you know, those sorts of platforms. Rely so much on advertisers that, that then they're, they're more likely to censor things. So that's been a common refrain.
Um, but on the flip side, I think just at the scale they're going to be operating at already 200, actually no 300 million a weekly active users is the number that's recently been announced from open AI. You're getting to that scale where to reach those people, you can't be charging them, you know, like just. 20 bucks a month or whatever to access, uh, access your tech and fully kind of capitalize on the value of having those users on your platform. Advertising starts to become really important.
And we've seen perplexity obviously go in that direction too, as well. Um, the other thing that makes it sort of, uh, Odd that they would bother denying that they're looking at this or kind of, um, really starting to commit to this direction is we know that they've actually been hiring ad talent. So back in May they hired, um, so I'm not even going to try to pronounce that. I'll try. I'll fail. Uh, Shiva Kumar Venkataraman. Hopefully I It's pretty
good. Hopefully.
Yeah. Sounded plausible. Okay. Okay. So our listeners who know what the name is supposed to be, I'm sure are commenting right now. Thank you. Please. Um, anyway, so he previously led Google search advertising team as vice president. So, you know, this is a pretty big heavy hitter hire to, you know, to not be doing this for any particular reason.
And, um, and, and actually, you know, Sarah Fryer also pointing out, look, we have these great, um, this great leadership, uh, from, with people who have experience at next door at square at salesforce, who've done a bunch of stuff in the advertising world. And Kevin Weil, uh, who's their, their CPO at open AI also has this background and all that stuff. So there's a lot of like talking up their pedigree on ads and, uh, And the economics really point in that direction.
So I'd be surprised personally, if we don't end up seeing opening eye coming up with, uh, Uh, with ads. And I, again, I think this is a, a comms department that's just pulling out the cane and trying to drag her back in after, after she let the cat out of the bag early.
Right. And, uh, I mean, uh, it'd be surprising if they aren't at least considering it. Obviously they recently launched search as part of a web search as part of chat GPT. We saw perplexity, uh, just a month ago in early November, uh, say that they are bringing ads into their search process. And they. You know, for a while before that wanted to just be subscription based. Now they do have sponsored follow up prompts that show up after you search for something.
Uh, so certainly something when you're doing web search, when you're doing kind of real time information, it's awfully kind of, uh, appealing to consider it as a printer of money. Next, we've got a story about fundraising once again, and this time it is about Black Forest Labs. It seems they are in talks with A16Z to have a 200 million round, which would make them valued at more than 1 billion. This is still kind of in a discussion. point. Apparently there's some private information here.
This is from people familiar with the plans. Uh, but you know, plausible, I think, considering that we saw their quick progress when launching Flux on Grok on X and then having Grex, uh, also, uh, sorry, Flux tools on various platforms, including Mistral. Uh, they are certainly a big player
increasingly in the space. Yeah, and this is around theoretically that would be led by a 16 Z and recent Horowitz, um, and would, would be a unicorn round. So we're looking at a over a billion dollar valuation, which is, uh, you know, it's significant. It's also interesting because this is such a specialized team, right? Like they are looking at the kind of, you know, image video multimedia type generation, um, and, and less the AGI play. So it's kind of interesting.
This is obviously a result of the partnership between Black Forest Labs and X, which drew all the attention, um, of Silicon Valley and into this previously really kind of unknown startup before, uh, before they partnered with Elon and X. Um, but they are apparently also bringing on, um, Uh, sorry, uh, this is from previous rounds. They already rather have, uh, Gary Tan, who heads up Y Combinator on their cap table. Um, and, uh, and Andreessen has invested previously that 31 million seed round.
Seed round, by the way, that's insane. I'm old enough to remember when a seed round was half a million dollars, 31 million. They apparently raised in August. I, I, we, I'm sure we covered it back then, but I'm just looking at the number like, damn, you know, that's a, that's a pretty good looking cap table. Um, in any case.
I think one of the big issues and they're flagging this here is if you raise too much money too fast, right, or your valuation is overly high and you haven't proven things out, you don't have like clear revenue, one of the challenges is you might be taking a down round if you can't live up to that valuation, like live up to the hype, um,
you know, the next time you have to raise funds, you might be taking a haircut on the valuation and that's Like that's, that's death for a startup quite often, right? It does bad things to the cap table and it's generally just a, a really bad sign. So they're trying apparently to be a little conservative, maybe not, you know, jumping with both feet to this, uh, this fundraise. It'll be interesting to see if it actually closes.
Um, but, uh, yeah, understandable and very mature play by, by the founders here kind of saying like, hold on a minute, we need to grow into this valuation a little bit more, um, and, and we'll see, I mean, a 16 Z, not, not bad to, to have them double down.
And one more story about fundraising this time, it's about a chip startup tense. Torrent, which is getting investments from Jeff Bezos and Samsung, among others, they're getting 700 million that would put their valuation at 2. 6 billion. Their entire kind of mission is to challenge NVIDIA to create more cost effective AI chips by using open source technology. And avoiding some of the expensive components and via users like high bandwidth memory, they've been at it for a little while now.
They do have some revenue. They have contracts worth nearly 150 million, but certainly it'll be interesting to see if they are able to compete. And this kind of big investment seems to be a pretty good, uh, show of confidence.
It does. It's a really interesting company, TensorTorrent. Um, we've talked about it previously. They're, um, they are pursuing a bit of an unusual approach to chip design. One of the key pieces here is, as you mentioned, high bandwidth memory, HBM. We'll talk about that in the hardware episode, which, by the way, will be coming out, right? We have this on the books. Do we not? We have concepts of a plan. See, I thought we had a deal at least.
We'll
make it happen. Okay, we'll make it happen. Um, but in any case, we should record it soon. But, uh, High Bandwidth Memory, HBM, Um, is a universal Today, a universal component of all GPUs that do anything worthwhile in the AI space. Basically, it's just memory that they can just, you can pull huge volumes of data really fast, hence high bandwidth, which is exactly what you need to train these really big models. The problem with high bandwidth memory is that it hasn't been improving.
If you think about Moore's law, it hasn't been improving. As quickly as logic, as the actual computations, as the logic dies, the power of these GPUs. And so you're not riding the same wave when you hit your wagon to HBM. Um, as, as you might be, if, if you orient your approach in, let's say, just other directions, that's what tense torn is doing here. They're trying to ride trends that are a little bit steeper that might allow them to overtake as HBM grows more slowly.
And the reason why HBM is growing slowly is something we'll also talk about a Um, in our our hardware episode, it's kind of interesting in and of itself. The other thing 10 store and has going for it is they've been a big proponent of this, um, essentially a new kind of of logic processor that's that uses risk five, which is an open standard for an instruction set architecture. Basically, this is the language that processors actually understand.
It's sort of machine level code, the interface in a sense between the hardware and the software. At rock, um, what would you call it? At bedrock, essentially. So, um, RISC V is the open source play and it's a competitor to the closed source, uh, product of Arm Holdings. So Arm is famous for its ISA, its instruction set architecture.
And essentially what TenStorm is doing is saying, hey, look, we're, we're betting against HBM and we're betting in favor of the open source RISC V instruction set architecture that people can iterate on more, more easily. So anyway. They've just generated 150 million in signed contracts, by the way. So this is not enough to justify a valuation unless you believe there's big, big growth that could be coming in the future.
And another challenge that 10 storm has is when you're a small chip designer, you can't design at the same cadence as the big guys in video recently. Updated their cadence to the point where they're releasing a new design for a new, uh, GPU every year. They have an annual release cadence, it used to be every two years. And tens torn is still on that every two year cadence. So I think that they've got an uphill battle ahead of them. Everybody does, but boy is that market looking really good.
Um, and they're, uh, by the way, torn, also just moved over to TSMC. They previously were with Global Foundries, which is, uh, more of a struggling, uh, chip. Uh, fab or chip fab firm, uh, TSMC, of course, that, that pure play foundry that we talk about so much. Um, and they're now going to start using the two nanometer fabrication process from TSMC. So real, you know, cutting edge process, uh, that, uh, well, we'll see if they can use it well.
And onto projects and open source where we do have some pretty exciting stories and we begin with perhaps the most exciting one of all, we have yet another LLAMA release from Meta. So most recently we had LLAMA 3. 2, uh, pretty recently, and that was a multi model version of LLAMA. Now we have LLAMA 3. 3. 70 beta. B, which is a new release that is seemingly about on par with Lumber 3. 1405B, the much, much larger model while being much smaller and relatively cheaper.
So they came out with benchmarks across various Things like IF, eval, human eval, math, etc. And the scores are, let's say, in the same ballpark, while the cost is the same as their previous 70B models. Much less than the bigger 405 billion models. So really showcases that we have made a lot of progress in being able to condense big models, and we squeeze out the performance capabilities, uh, at smaller sizes. And that's what they are kind of saying here.
They have made use of post training techniques to make this happen.
Yeah, I think we're, you know, increasingly to see this, you've got this big kind of range of tasks that I can help with, obviously, and increasingly the vast majority are falling in this bucket of like relatively simple tasks to automate, you know, smaller models with Gradually increasing performance as we're seeing, um, are a lot cheaper to serve, right? So you may actually want to overtrain a small model.
Um, so it's not necessarily like, you know, you can get away with having a bigger model that performs better for the same amount of compute. Um, but instead you take that blob of compute, apply it to a smaller model because it's so much cheaper to, to inference, to serve up. Um, 405 billion parameters for metas. Llama three kind of large model to the 70 billion parameter model here. That's a very sensible next play. Um, one thing I found really interesting here.
So the usage of the, both the llama model and then the meta AI assistant, which is powered by llama, um, 650 million downloads of the llama models, which I, I would have bet against that many downloads of, Any language,
any machine learning model. I do not know where you get millions of downloads, maybe from a server deployments, where you have kind of serverless or, or Sean, where you have many startups using this and deploying it on AWS or something like that, where you have an automated type process. Perhaps.
That's, that's gotta be it. There's obviously not 650 million people who know about even, uh, what the hell this model is, but bottom line is, uh, yeah, exactly. It is being used very widely. Um, and, uh, and, and of course, we've talked about the licensing restrictions here. Platforms with over 700 million monthly users need special licensing. So basically, this is them just, Flipping the bird at, uh, at Google and an open AI and that sort of thing.
Um, now on the monthly users piece, I thought this was interesting. We just talked about open AI hitting 300 million, um, weekly active users. We don't know what their monthly active users are, or at least it wasn't in the stuff that I've looked at this week. Um, 600 million monthly active users is the meta AI assistant. If true, this is a, quite a comeback, right?
I mean, it's essentially comparable usage to chat GPT for a platform that really didn't exist in any meaningful sense until about, uh, say two years ago. So I had about two years, two years delay relative to opening up. So, uh, pretty impressive. Obviously we've podcast distribution, distribution, distribution. The reason that Microsoft Teams outshone Slack was because Microsoft just has better distribution. They're in everybody's computers to start with.
Well, the reason that Meta's had so much growth in their assistant is everybody uses some combination of Instagram, Facebook, or, you know, WhatsApp or whatever the hell. So, uh, they've got a huge, huge advantage here. And, uh, and they're going to try to spend their way.
To monetizing that advantage, somehow, right, there's all this activity as well that's being motivated by their big push in this direction, 10 billion AI data center in Louisiana, which is called that in the article, um, which is coming soon and over 100, 000, uh, H100 GPUs, uh, there's, there's, you know, the B series as well that they'll be bringing online soon. So pretty, uh, pretty cool.
Very cool. Next, we got another massive company open sourcing a model. This time it is Alibaba and they are releasing an open challenger to OpenAI's O1 reasoning model. So this model is called QWQ. I think on the X there was an explanation of how to pronounce it. I think it's I'm not sure. But anyways, it's a 32 billion, uh, parameter model. They're releasing it as a dash preview model and under Apache 2. 0. Uh, so that's, uh, pretty permissive. You can use it for commercial use.
And, uh, as we saw the last week, we covered another open source model that, uh, had reasoning built in our one light preview from deep seek. This simile is optimized for doing reasoning, uh, of the nature similar to a one where when you ask it something, it outputs and lets you see. Kind of a reasoning trace, talking through the question more or less, and then outputting the answer.
And, uh, as before, you know, on some tasks, it actually performs quite a bit better than the things not optimized for that.
Yeah. I got to say the, the blog, because there's not a lot of information about the model. We know it's, it. 32 billion parameters. As you said, we know it's from the Quinn team. Okay, good stuff. Um, QW2, by the way, apparently stands for Quinn with questions. Uh, this idea being that it, it is some sort of like reflective model, right? I do want to call out, I mean, the blog, the blog post, it's one of the weirdest fucking write ups I've ever seen. So I just want to pull this out, right?
So this is just from the blog post. Um, it's hard not to read this like with a pipe in your mouth. So Uh, what does it mean to think, to question, to understand? These are the deep waters that Quinn with questions wades into. Like an eternal student of wisdom, it approaches every problem, be it mathematics, code, or knowledge of our world, with genuine wonder and doubt. Quinn with questions embodies that ancient philosophical spirit.
It knows that it knows nothing, and that's precisely what drives its curiosity. Anyway, it goes on and it goes on and it goes on and it will talk all about seekers of wisdom and shit. And it will tell you nothing about the fucking architecture. So there you go. Uh, it's going to be open source soon though. So we'll have those answers. I just like, I want to, I want to see who okayed that blog post because that is some of the funniest shit that I've seen in a long time. Um,
yeah, the title on it is QWQ reflecting deeply on the boundaries of the unknown. Very poetic, right? I mean,
God damn it. Now, I will say it's a, another indication of course, that we are seeing a proliferation, including into open source, including by Chinese companies of, um, some, some potentially pretty impressive. I mean, certainly our one was, it remains to be seen how a Quinn with questions does, but a really impressive. Inference time compute strategies did not take long to replicate was what open AI was doing.
I'm going to go out on a limb and say that there is a greater probability than is widely appreciated that some of this is industrial espionage at this point. Um, open AI is like, you know, going to be penetrated, uh, to the blazes. That's just, You know, it's pretty clear if you have a national security bone in your body, that that's going to be happening.
And so to the extent that, um, that China is in the business of picking state champions for this sort of thing and, and sharing that intel, uh, this could be a vector, like it could be. Um, but the other thing is the, like, Our one team at least is like crack, like deep seek. Those guys are really, really good. Um, so they could just be that good that they're doing it.
And, uh, you know, in that case makes you, makes you think about, uh, export control, makes you think about all kinds of things like that when it comes to the, the control of the system.
Yeah. And I think it, it also is possible that we are seeing kind of a low hanging fruit era of these sort of reasoning models where we are applying more or less kind of a set of pretty well known reasonable techniques. To get a lot of progress relatively fast, and it could be that everyone kind of is just doing the more or less well known ideas at the same time. And they are just very efficient because there hasn't been too much effort on this yet.
And on the benchmarks here, they are seeing, you know, not quite as good as O1 preview, but, uh, on par with O1 mini, uh, or getting closer certainly than 4. 0. Or cloud 3. 5 or 72 B. So, you know, not going to beat, uh, open AI is top of line models, but gets you pretty close compared to non reasoning type models.
Yeah, and it is an open model. And to your point, I mean, I think the, uh, the take home on that export control question is absolutely that, right? Every time you have a paradigm shift to a, you know, in this case, inference time compute, but it eventually will be other things. You have this overhang. Right. Where all of a sudden there's a whole bunch of people who previously couldn't compete who now maybe can. So, uh, I think, uh, it's a really important, let's say, policy lesson.
To learn that things can shift on you and you don't want to have policies that are too anchored towards just, you know, a pure training paradigm or whatever else.
Onto the next one, also an open source model by a Chinese giant. This time it is Tencent, and they are launching Hunyuan Video, which is a 13 billion parameter open source AI model for text to video generation. This is an area where we don't see too many big open source models. In fact, this would be the largest of its kind in the open source world. Domain. It has some features like apparently video to audio synthesis as well.
And you can do various inputs like voice, facial expressions and body poses. So overall, yeah, it seems like a pretty useful model that, uh, could be leveraged for fun
things. Yeah. And I guess reflecting the Chinese pedigree here, they've really focused on the scaling approaches to cut computational costs. Thanks. Um, so their, their technique ends up getting them about 80 percent savings relative to what comparable systems might have in the past. It is a, it is a pretty hefty model. I mean, 13 billion parameters. Yeah. Not, not bad at all. Not too shabby. Pretty compute intensive.
So, um, yeah, uh, we'll, we'll see if it ends up getting taken up and used, but certainly having a leading model in the open source domain for text to video is interesting and it gives, uh, China an ability to project some power to in this dimension.
That's right. Yeah. It, uh, you know, just looking at the clips right now and I'll try to splice some up in the YouTube video. They're pretty good. They're, they're not top of the line. There's still some AI artifacting, uh, but, uh, Pretty impressive. And, you know, it's just. Being over open source, that might have some impact. Next up, we have a paper, not an open source model, and it is DEMO, decoupled momentum optimization.
This is a new optimization method that decouples momentum updates to reduce the need for high speed interconnects and make it more possible to therefore Kind of do distributed training. And this is an area where Jeremy is much more of a fan and, uh, aware. So I'll let you take over and just give the details on this one. Yeah,
yeah, for sure. So, so this is both fascinating and I think really important and part of the trend that I think we want to call out, um, in general. So first of all, uh, this is another piece of research from news research, uh, or new research, I don't know if they want to Pronounce it one way or another, but, um, so they are the Cosmo Kramer.
If you're a Seinfeld fan of the, uh, of the AI world of the AI open source world, uh, very kind of like, uh, esoteric sort of views on everything from AI consciousness to, uh, the importance of decentralized compute. And that's what this is.
So this big question is like, how can we, um, the ideological question that motivates this to some degree is how can we set up these large distributed training, uh, infrastructure that will, um, be difficult to, to, to control cause it'll be decentralized that will take advantage of, um, you know, so like small piles of compute that you have sitting around here or there. Um, that's kind of one of the longterm goals, stretch goals of an approach like this.
Fundamentally, what they're looking at here is how do we start by cutting the amount of communication that's required between all those nodes because if you have a bunch of different nodes, you're gonna have to communicate an awful lot between them and they come up with a bunch of conjectures that are really freaking interesting that I would have thought made no sense, but then just seem to work empirically. First of all, When we do training, we use optimizers.
These optimizers are essentially the things that allow you to, um, to decide how exactly to update your model parameters after each round of training, right? So your optimizer is, uh, it could be set up in different ways. Um, one is to account for momentum. So if you find, for example, that a certain set of parameters, they keep getting nudged in the same direction, you know, after many updates, right? So, so after one batch, they move in one direction.
After another batch, they move more in that direction. Well, maybe that suggests that your next batch of training, they'll probably move in that same direction again, right?
And so you might want to take advantage of some notion of like the momentum Of those updates, like if you notice that the past updates have always been tending, in other words, have momentum in a certain direction, maybe you keep it, keep that in mind and use that to kind of make a more informed guess as to the direction that your parameter values should evolve into. And so what they do in this, in this paper release, they say, okay, well, I wonder if there are.
different components of, of the parameters in a model. So some clusters of components that tend to evolve more rapidly. So fast moving components with, with big momentum, uh, values in a sense, um, uh, versus ones that, um, let's say have, have, uh, more sporadic changes where you have the sort of like, um, sort of quicker changes with, with higher temporal variants, basically, like they're less predictable.
And it turns out that this actually is true that there are Um, that there are identifiable kind of pockets of parameters that and there's a structure to them. This is kind of a crazy thing to me. I mean, when you initialize the parameters in neural network, you initialize them all randomly, right? There's no reason that any particular cluster of parameters should tend to evolve a certain way, should tend to, you know, consistently have high momentum in a certain direction or low momentum.
Um, and yet they kind of find this. And so they find that there are some. Clusters of these parameters, and I'm using the term cluster very loosely here, um, because anyway, they identify them in, in different, interesting ways they use sort of like a Fourier transform. If you're in signals analysis, a cosine transform to identify the fast moving components of slow moving components.
And they go, well, you know what, if we have some components that have this predictability to them that have high momentum, right, maybe we don't need to like update those as often.
So maybe we can, um, is essentially just pick and choose which parameters we update, uh, in, in more frequent updates and leave those The rest of them to more sparse updates, and that allows us to do a lot less communication between nodes during training, and by doing that, they're able to reduce their bandwidth or communication requirements by over 20 X in the best cases. That's pretty wild, right? That really, really, really makes a difference in your training efficiency. Um, so.
I find that pretty, pretty remarkable. Um, and, uh, anyway, there's a bunch of reasons. I mean, I just kind of feel like we could do a whole episode on why this is so weird and unexpected. And this at least tells me that there's an awful lot that I don't understand, um, about the, the dynamics of these systems and why these patterns appear. Like they don't even try to defend this.
They just sort of say like, you know, They don't have a proof for it yet in this, uh, in this paper as to why there is a pattern. Um, and they seem to kind of acknowledge that it's weird, but empirically it seems to be true. And, and that's an amazing thing and I could imagine this having some implications for scaled, uh, decentralized training.
Right. And I believe we covered their announcement of this, uh, maybe a month or two ago, they released kind of the announcement that they have this new method and they just showed some results. This is, it seems like the paper that they promised that goes into some more detail. And, uh, yeah, like it seems like it actually works, which
is kind of crazy. Yeah, this is more to the Cosmo Kramer ness of news research, like they put out that paper, you're right, right? They were just like, Hey, we did some cool shit. Look at the crazy result, but we're not going to tell you how we did it. And so, so now even, even somehow when they tell you how they do it, you're still like, but that doesn't make sense. That is the most schizophrenic fricking training idea I've ever seen. And yet it works. So Cosmo Kramer, man.
And on that topic of decentralized training, the last story is about Prime Intellect releasing Intellect One, the first 10 billion parameter language model collaboratively trained across across the globe. So I just looked it up in October 11th, they had the blog post Intellect One launching the first decentralized training of a 10 billion parameter model, which we covered at the time. And so, uh, you know, uh, about two months later, they have another blog post intellect.
One release, the first globally trained at 10 billion parameter model. And they, uh, go into, you know, various details here of the one trillion tokens they used to train were composed of different open, uh, data sets. Uh, they do say they trained across what three continents, I believe. And the, the, you know, term train is a little loose here. I suppose, uh, they are competitive with, uh, older models.
So they compare to Lama two, seven B and Lama two, 13 B, uh, Falcon seven, B MTB seven chat, you know, compared to this year's 10 billion and several billion prominent models. This is not. on that spectrum. But nevertheless, it is a somewhat performant, uh, smallish language model. And that is quite a bit of an achievement given that it is very difficult to do training, especially at this level of, uh, decentralization.
Yeah. I think this is a, a lot more impressive than it seems based on the performance of the model and B, the implications for policy are pretty wild, um, including energy policy. So I This is, um, as you said, I mean, it's trained across three different continents up to 14 concurrent nodes. In other words, they had up to 14 concurrent groups of compute that were being aggregated together with contributions from 30 different compute providers.
So you've got people dynamically joining and leaving the training process and so on. I want to call your attention a couple numbers, actually, maybe one number in particular, just for in the interest of time here. Um, so Flops utilization, right? So this is how busy you can keep your GPUs in practice. Um, in practice, there's tons of downtime when you're actually training models in the real world, because you've got to move data back and forth between things.
GPUs are sitting idle, kind of twiddling their, their thumbs as they wait. There's all kinds of stuff when you do pipeline parallelism, where you get a bubble that forms anyway. Um, bottom line is they hit 36 to 41 percent model flops utilization. Uh, that is really impressive. So typically on an H100 GPU, Frontier Labs will hit um, Flop's utilizations of around 35 ish percent. Um, maybe as high as 40 percent. Uh, and this is from semi analysis.
They talk about this quite a bit, but for kind of trillion parameter training runs. So as you scale more, you tend to be able to do better on Flop's utilization. You can think of that as being a consequence of economies of scale, uh, broadly understood. But Bottom line is they are doing really well in terms of keeping their GPUs busy, keeping their logic engines busy. And that is a key, key thing.
When you're doing distributed training like this, you want to make sure that, yeah, your things are kind of churning out, uh, turning out product, if you will, over time, really efficiently. Um, So a couple things here. One is when you think about the big barriers to American dominance in AI right now, energy is by far and away number one. We have all the chips we want. By and large, we do not have all the power we want.
What this means in turn is that if you are an AI company, you're looking to build data centers at massive scale wherever you can get the power, wherever you can get the spare grid. Grid capacity. And right now labs are looking at that one gigawatt range in the sort of 2020, uh, 2020 six-ish era, trying to create one gigawatt, um, uh, compute clusters. And the problem is that there isn't a spare gigawatt of, of, um, uh, of base load power or just available on the grid anywhere.
You have to kind of, kind of, um, patch it together from different geographic locations. And if you're gonna do that, now all of a sudden you're in the business of distributed training. Right now, all of a sudden, you need to come up with really efficient training approaches that allow you to pull together that kind of compute across geographically separated regions.
Google does this already in campuses that are relatively close together, but they don't do it, say, across the country, and that's really what Intellect One is doing here. They're really pushing the limits on not model capability, but how distributed can this training be? Can we use that spare laptop that you have lying around and get it to, to. You know, do some work in this direction. And that's really what this is about. Um, and they have this whole prime framework that uses the loco.
We talked about the loco in the last episode that we did. Um, so, so do check that out if you're interested. Um, and then they have this whole elastic device mesh, which is really cool. It's this thing that as you can imagine, if you do distributed training like this, you've got to have ways to dynamically allow new nodes to join and leave the system as you, as you train. Yeah. And that exit and that entry, they have to be dealt with gracefully. You need fault tolerance.
You need ways of, of welcoming in new nodes to kind of contribute, compute new GPUs, uh, without failing, right. Without kind of falling all over yourself. And anyway, that's a big part of, of what they're up to here. So really, really interesting paper. If you're into the, uh, the hardware side of things. If you're into policy, like you're going to have to learn how to speak this language. If you want to predict the future in this space, you're going to have to learn how to speak this language.
Um, because increasingly energy is the constraint and, and, uh, this is really where things are going to go.
As to that paper, they did release a pretty detailed technical report, uh, 15 pages going into all the details of the data set. Uh, the people involved, they had many people in, uh, the U S and also in Europe. We've also some in Asia, in places like India and Singapore. So, uh, yeah, quite a few details there. And they do release a model under the Apache 2. 0 license. So very permissive. You can use it for whatever you want. Big, you know, good contribution to open source.
You also have a code they use for training and the details of the model as well. So, uh, overall, yeah, if you're an open source, if you're interested in decentralized training, this is a pretty exciting release. Thanks. Onto research and advancements, and we are getting back to the idea of world models with DeepMind's Genie 2.
So I don't remember when it was, but I think it was this year, but we were talking about Genie 1, which was, um, a research paper and a research work that had, uh, the ability to kind of let you play a 2D video game, broadly speaking, that was entirely generated by AI. So, uh, it. looked like your kind of typical platformer. You could move around a character, you could hit jump, but there was no code running it aside from a big neural net that was producing a video stream.
And so now we get Genie2, which is an AI model capable of generating interactive 3D worlds from a single image and text description. And you get kind of, Typical kinds of things you would get in video games. You can have a character where you're running through an environment like a desert and you can hit jump. You can have a race car or a boat and you can kind of, you know, swim on a lake and it is, uh, almost real time.
It apparently can generate consistent worlds with different perspectives for up to a minute. We also showcase memory where you look away from a part of the scene and cannot look back. You actually do retain the aspects of a world that were there. And that was something we saw with some of these Minecraft, uh, simulation recovery not too long ago.
There was this thing where you just, if you do a free 60 degree turn, What you see in front of you is not the same thing as what you saw before you started turning. Whereas here, they have at least some examples where the details are preserved even when you look away, which. You know, is telling you that there is some notion of world model, as Jeremy said. So very cool effort, very cool videos on this one.
We, I'll try to include in the YouTube version and, uh, yeah, again, this is maybe one of these. Um, quieter trends, but as we're going, starting really with Sora and maybe even earlier where a lot of people are excited about world models and, and are pushing in that direction, even though it doesn't have as much impact on most people.
Yeah, this is another one. So, um, a researcher. I've been following for some time. So Tim, uh, who is, um, at GDM right now and was part of this research was part of the genie one research. Um, and to your point pre Sora, he was working on stuff like this. There were, were big efforts in this direction. Um, they were initially in the kind of procedural generation of game environments. Basically like, can we. Can we create games where you generate new game environments by following simple rules?
So you can autonomously generate these sort of out of distribution settings, uh, for the purpose of training AI models and agents. Um, this is really, I think the realization that actually we can go all the way by proceed, not procedurally generating by doing just like deep neural network generated environments. One of the big jumps too, between Um, genie two and genie one is that we're now moving from 2D to 3D.
So again, when you think about, you know, agents navigating the real world, that's a big deal. We don't know the parameter account of genie two. In fact, we don't have all that many technical details. That's one of the interesting things here. Um, genie one, we had a whole paper. We, we kind of chewed on that sucker for a while. We had an 11 billion parameter model. We knew how they, how they trained it. Um, it's.
I would guess that there's a lot of similarity to here, not not just because the naming convention, but because of the way that the model is set up and what it can do if they generate using imagine three, an image based on a text prompt, and then they turn that image into a into a navigable environment in the way that you just described. Um, To the point of these things are world models, right?
Some of the things that this thing can do is simulate interactions like bursting balloons, opening doors, shooting barrels of explosives. There's like a million things that you can see, you know, directional lighting, grass blowing, you know, all that kind of stuff, um, that really suggests this thing has captured something interesting and meaningful about the way the world works.
Um, in terms of what's going on under the, under, under the hood, we're left to speculate based on what we know of genie one. And there, the key really was Two keys were the latent action model. Basically, they trained this model to take as an input the previous sort of video frames and the frame to come and then to predict, okay, what was the action that took us from those early frames to the next frame? Essentially, what was the causal event that links these past frames to the future frame?
And that's part of the system learning to infer what actions cause state transitions in the environment. So that's one piece. That's the latent action model. Essentially, you can think of this as like reverse engineering the physics of what just happened, right? So if I show you a car that's driving along the road in one image, and then the next image I show you the car that's, you know, I don't know, parked or something, you can infer that, okay, the driver probably.
You know, put their foot on the brake or whatever. At some point, you're doing all that work. That's the latent action model. That's what's learning a lot of the cause and effect in the world there. And separately, there's the dynamics model where you take in a bunch of previous frames and you take in an action and then your job is to predict the next frame. Right? This is a model that essentially is going to say, okay, here's the past history of the system.
Here's the nudge I'm going to give it. And then what's going to happen from there? And it's the way these two models are integrated together, the latent action model and the dynamics model that together give you the sort of immersive physics and playable world that you get out of at least Genie 1 and I would suspect Genie 2. In fact, I don't think that they're claiming that it's different at all.
Right. Uh, and. To that point of not having a paper, unlike what we're discussing with that Minecraft example by Descartes, or also we couldn't get into it, but there was a similar release of a type of world model from World Labs, where you could also input an image and kind of move around a little bit. Here, there's no interactive demo. We just get a bunch of videos. And I do mention that they use an undistilled model for this. So I'm pretty sure this is not real time.
Although they do claim that they can distill it and have lower quality, uh, real time performance. Uh, one thing that is kind of fun to note is that, uh, this is, uh, from the SEMA. Team where SEMA is the scaling instructable agents across many simulated worlds. This is a paper we did cover, I believe earlier, where you have, they had agents in many video games where you told them, you know, go to this planet or open that door.
And they just learned to use a mouse or use whatever controls to in the game without rendering it or anything, do those actions. So to me, Kind of makes a lot of sense that the same team, but those are already training agents in these games would then go and make this kind of video simulation model using presumably a lot of the same data and a lot of the same infrastructure on to the lightning round. We'll try to cover the rest pretty quick.
First, we have language models are hidden reasoners unlocking latent reason. Capabilities via self rewarding. So there is this technique, latent reasoning optimization, LATRO, which is going to enhance the capabilities of large language model by treating reasoning as sampling from a latent distribution and optimizing new Using it variation method.
So I'm just quoting from the abstract here, but the general idea is that you're going to have the kind of existing capabilities of that already in the LLM baked in and doing that by, uh, optimally sampling, uh, and optimizing kind of the output that you get. Uh, Jeremy, I'm sure you have a little bit more detail you want to get into here.
Yeah, I thought this was a really interesting, interesting one from an inference time compute standpoint that that's one of kind of my pet topics these days, just for obvious reasons, you know, with the 01 and Sonnet 3. 5 news releases. So what this is basically saying is, hey, we have a new way of scalably developing a data set of Of reasoning rationales. That's actually a big blocker for these agentic models.
What we don't have on the internet, we have a whole bunch of text, whole bunch of video and all that crap. What we don't have is examples of extended reasoning traces that we can use to train models to be good reasoners on. So can we create those data sets in a reliable, automated way? And that's what this paper is about. So fundamentally, they're gonna, they're gonna take some kind of Challenging technical problems. Alaska question that question.
Let's call it X and there's going to be a correct answer. Why right? And that's going to be your starting point. Now, what you're going to do is you're going to basically get a model, try to pitch a bunch of different rationales. And it turns out that the model, if you have a sensible rationale and you take your question, you glue the rationale, uh, next to the question. So you have question then, then long chunk of reasoning. If that reasoning is good.
Then your language model is going to assign a higher probability to the correct answers than to the wrong ones. And that's kind of interesting, because now you have a way of measuring how good a rationale is. Right, if I take, so if I take my um, my question and I just swap out the rationale for a crappy rationale, well now my model is going to assign probably a lower probability to the correct answer, which I have in my dataset.
And so they're basically going to use this to have a model automatically basically go through a whole bunch of rationales and evaluate them based on how likely those rationales make the model to get the guess the correct final answer. So that's a pretty interesting, um, a new way of thinking about inference and using these models to kind of. Uh, anyway, to assess the, the, the value of the correctness of, um, of, uh, of inferences.
And, um, anyway, they, they, they use a whole variational optimization approach that's fairly technical that, I mean, if you're a math nerd like me, uh, and you'd like seeing integral calculus, then, then check it out. It's actually, it's kind of cool and satisfying, but the bottom line is you are using these models to sort of.
Um, automatically assess rationales by flipping the script a little bit and instead of swapping out the, um, instead of generating a bunch of like reasoning traces and then trying to see if, if you can, uh, you know, if you get to the right answer, you sort of like, um, Well, you swap out your rationale and, and hold the input and output constant, essentially, the, the, uh, the prompt and the final answer.
And you, uh, you assess which rationales are good based on how likely the model, uh, would the probability of the model science correct answer.
So next we have dancing law of LLMs. which I found quite interesting. So we love scaling laws in this podcast. And here they're talking about capacity density as a new metric to evaluate the quality of LLMs across different scales. Essentially you're saying what is the ratio of effective parameter size to the actual parameter size relative to some, uh, kind of, uh, baseline. So how well can you perform, um, at kind of the, uh, size you're at, right?
We know that for instance, 70 billion parameter models are capable of performing some number on a given benchmark. If you train your model well, it will match rat. If you train it badly, it will not match rat. So you can kind of. Have a measure of how good you are for your size. And what they have found is that the capacity density has been trending up in open source LLMs. And this is not surprising. We have been covering the strand that.
At the, uh, 1 billion, 2 billion, 7 billion, uh, model size, we've increasingly been getting them better and better where now they're quite capable where it used to be, they were not capable even, you know, back in the day, GPT 2 was 1. 8 billion parameter models. And that was not even anything like what you get today. And so they actually do empirically. Look at this phenomena, we find that there is about a 3. 3 month period in which these measure doubles.
And so far, it still is upholding, although it's a little bit more noisy. In the last few months, we do see some fall off with things like GEMMA 29B, uh, or sorry, rather LAMMA 2, uh, LAMMA 3. 3. 0. 23B, gamma 2, 2B. Anyway, there's a bit of variance there in the empirical results, but the overall trend line is still going upward where we are getting more performance from smaller and smaller models. And we just actually covered that with Lama 3. 3. Same thing right now with Lama 3. 370B.
We have a more performance 70 billion parameter model relative to Lama 3. 1 70 B.
Yeah. The, the sort of intuition behind this, right. Fundamentally is like, how much performance, how much world knowledge can you cram into the same number of parameters, right? And we're getting increasingly good at doing that. One way you can do that, right, is just by overtraining. So for a given model size, there actually is an optimal amount of compute that you can pour into your system, um, to, to max out its performance, right? That's a scaling law that we've known for a long time.
What we've seen increasingly, though, is people saying, like, Yeah, yeah, well, I actually don't care about maxing out the performance of my model. What I care about is maxing out the performance of my model subject to a parameter size constraint. So, I don't want to go beyond, for example, 7, 8 billion parameters because I want the model to fit on my phone or I want it to do whatever. I want it to run below a certain cost. And so I'm going to overtrain it.
And that's one way of cranking this up. Another is algorithmic, right? Making algorithmic breakthroughs that just allow you to leverage that compute even more. And this is the idea of effective compute, right? One flop back in, you know, 2023 was worth a lot less than a flop today. And what they're saying is the same is true for parameters, which is self evidently true. As you said, Andre, we've seen so many examples of this, but it's interesting to see it plotted out.
Um, a couple of interesting top line numbers, right? They say from yeah. January, 2023 to now, the inference cost of GPT 3. 5 level models has decreased by a factor of about 270. So that's pretty remarkable. It's it tracks obviously with what we've seen on the podcast and part of the challenge that these companies now face of recouping the losses they incur in training these huge models is like.
There's not a lot like, like the cost of inference is pretty low and you're racing to the bottom on pricing a lot. So that's kind of an issue. Um, uh, last thing I'll just point out. So most of the current techniques that we use to take a large model and then turn it into a small model. These are techniques like pruning, where you basically just pick weights that you want to throw away. So you actually just throw away parameters from the model.
Um, and distillation where you use a big model to train a smaller model to do the same thing. Those techniques, they say, typically result in lower density models. So the model you get at the end of the day, um, tends to not cram as much knowledge as it could in those parameters, which suggests there's a lot of room for improvement in compression techniques. And that's a really interesting call out.
Um, it's something that, uh, especially, again, when you think about, you know, what, what ecosystems care a lot about maxing out the, um, The capacity of their compute capacity, definitely Chinese entities are going to care more about that. And this group is it is a Chinese, Chinese research team. So I think you're seeing a lot of that sort of, uh, the what necessity is the mother of invention type reasoning where people are inventing new and more efficient ways to make.
Um, what are you set of their existing stock?
And just one more paper. This time it's about interoperability. The title is Monet mixture of monosomatic experts for transformers. So this is kind of putting together two ideas, mixtures of experts. Which we have often covered where the aware is you sort of have different routes for you through your neural net and certain parameters are these experts that are better for certain types of inputs. And you only kind of use the appropriate weights when you have a certain input.
We have also covered quite a bit, uh, uh, trend towards interoperability by way of finding concepts within neural nets, most recently often using sparse autoencoders, where you take the activations, the outputs at a given layer, you basically compress them. And from that compressed smaller number of things, you get something like a dictionary where you find that certain combinations of weights can be mapped onto specific ideas.
Like for instance, like a math, uh, Concept or, uh, I don't know, uh, a bridge concept, things like that. Right. So this, uh, paper presents a different way to do that to interpretability idea, finding concepts within a neural net, uh, by doing that at training time. And what they do is essentially scale up a number of experts to be very large. Yeah. They have here something like 200,000, uh, 262, 144 per layer, while maintaining, uh, parameters, uh, brought below.
And then what they get from having that many experts going on, and they have some fancy techniques for being able to scale that high, is that the experts themselves are, can now be shown to capture certain, uh, ideas or techniques. So they have. experts of that, uh, like correspond to chemical compounds. That's expert 174, 000, uh, 40 in the paper. I cover that one, the U S States, uh, cartilage weird, various kinds of experts like that.
And they have various experiments showing that they can identify experts for Let's say Python, Java, different programming languages. And if you delete certain experts, you get very big, uh, differences in performance. So in a way it's similar to what you get with, uh, sparse undercoders while, uh, not having the post hoc nature of it as much.
Yeah. And I think it's a really interesting approach. You know, the, the, the post hoc one has been a challenge, obviously, as you said. Um, and. It's, it's one of these things where I found this counter really counterintuitive when the early moe models were coming out. Like I conceptually, I thought because they're called experts that they would be, you know, very clearly each one would have a purpose. You know, you'd have the grammar expert and the cow expert or whatever.
And obviously poly semanticity, this, this idea that individual neurons respond to multiple Different concepts and get activated by them is is a big issue here. So baking that in, that's why, right? That's why they're using so many experts. There is so much meaning that you have to be able to capture that if you want every expert to just be an expert in one coherent human understandable concept, you just need way more experts. That's kind of the hard constraint that's that's motivating that.
A key thing to think about and when you think about safety and interpretability, right? One of the things that people often talk about is the alignment tax. In other words, how much performance do you have to sacrifice to get a given level of interpretability or controllability out of your system? And the answer here seems actually to be fairly encouraging. So they look at a whole bunch of different zero shot tasks, uh, with the 1. 4 billion parameter, um, uh, version of the model.
And they get an average score of 0. 478 across those tasks, uh, for the, uh, the Monet model here. Versus Lama, 1. 3 billion parameters of 0. 8 floor. So, you know, fair, like really, really small, um, performance penalty there on the order of about 2 percent roughly, um, which is, which is good. You know, that's what you want to see. You want to see a low alignment tax, um, but, uh, that's something that is a key number when you're thinking about, you know, what is, what is the cost here?
And we also, I guess, don't have the side by side of the, uh, the flops that the amount of compute that this approach requires relative to. The llama series, but just kind of on the face of it. It does look like a really promising and scalable result. And, and hopefully these sorts of things become more and more doable.
My take is presumably this would not be as easy to train and scale up. And, uh, so it is unlikely that people will train a massive frontier models. and have this, uh, but it seems like it could probably be very useful for research and perhaps could translate understanding to more, uh, post hoc techniques. And onto our last section, policy and safety. First, again, we are talking about export controls.
The commerce department has yet again, strengthened restrictions related to advanced semiconductors for military applications. This will, uh, Add controls on 24 types of semi conductor manufacturing equipment, three types of software tools, and high bandwidth memory, along with various, uh, guidances and additions to the entity list. We companies that are being Uh, I guess, controlled or restricted. So, uh, lots of details. The entity list has now 140 new entities and 14 modifications.
Uh, the new rules have specific provisions for semi, semi conductor manufacturing equipment. Uh, Yeah, I guess very much in line with what's been seeing as a trend for a while now. And, uh, Jeremy can probably speak more to the significance of this latest move.
Oh, yeah. I mean, I think this is, this is a really interesting move and it's going to have ripple effects in a really big way. By the way, this is the third round of us export controls that we've covered on the podcast. So I think we should serve, have a. Celebratory emoji or something, but yeah.
Um, so this is every year, basically the, um, uh, the, the, at least the Democrats so far have had this sort of new wave of updates, uh, to their export control regime, and I think part of the reason why they have to go in so much detail every time is that they. Want to keep playing this game of whack a mole where they use very fine scalpels to carve out very carefully the, the range of technologies that they don't want to allow to be exported to China.
I think this is actually kind of a problem, um, up to and including the whole idea of having a blacklist of companies. Um, so let's start there actually. So when you, when you look at the, uh, the envy list, right, this is the thing that Huawei famously joined. I think it might've been, uh, back in 2018, this started, but. Basically, this was like a list of entities that you can't sell to without a license. You can't sell, um, high end semiconductor equipment without a license.
These entities, the problem is these entities just keep spinning up subsidiaries that you've never heard of before. The then trivially work their way around your export controls. And this has been happening over and over and over again, the sort of. Um, as again, semi analysis, I mentioned them earlier. They have a great, uh, post called, uh, fab whack a mole, um, trying to, trying to pin down, you know, like all the subsidiaries that Huawei is setting up to get around export controls.
And we're seeing tons and tons of GPUs, high end GPUs work their way into the Chinese market, H100s included, which are absolutely cut off, uh, as well as A100s, which are also cut off, uh, as of the last. update of, of, um, uh, export controls. So, um, you know, if, if you're listening to Jeremy on, on policy shit, uh, you need a black, a white list, not a blacklist. This really, really needs to change. Um, but, uh, the other thing is this, um, this update, uh, is less severe than expected.
So there were Japanese chip equipment suppliers that really benefited from some of the, the tighter controls. And, um, uh, anyway, so, so there's, there's, um, sorry, uh, Sorry, too much. I spoke over myself there. Um, there are, I'll just start with the Japanese bit. Sorry about this. So there, there, um, Japanese chip equipment suppliers that yes, uh, are benefiting from this new round of controls.
For a whole variety of different and interesting reasons that we should get into in the hardware episode. Um, but, uh, apart from the actual envy list piece, I want to highlight HBM. So Leonard Heim has a great, um, on, on Twitter has a great, uh, thread on this. So let's just start with HBM, high bandwidth memory, again, crucial, crucial part of all your cutting edge GPUs. The GPUs will have a logic die that actually does the computing. And then it's got stacks of high bandwidth memory that.
You know, move data around anyway, and in a way that we'll talk about later, um, the, the bottom line is Huawei should not have access to HBM or they shouldn't have access to HBM two E and that was found in the Huawei ascend nine 10 B chip, uh, in a teardown that was recently done. And so this suggests that the HBM was sourced from Samsung via these distributors, right? Via Huawei spinning up these subsidiaries that no one had ever heard about. And it looks like.
Possibly all Huawei Ascend 910Bs were produced by TSMC. So this is the logic chip rather than the HBM. Um, that's really significant, right? That's really significant for a while. We were speculating on the podcast. Well, maybe Huawei is building these chips using SMIC, right? This is the domestic Chinese equivalent to TSMC. Well, it looks like that actually wasn't the case. It looks like they were actually having to use TSMC. Why does that matter? It means domestic production in China.
Is not going so well, right? It implies that they're actually struggling with whether their yields or their processes on some of their on some of their other level. So they're being forced to actually like spin up these entities and grab chips from TSMC, which which means export controls are working on some level, despite an awful lot of Chinese propaganda recently trying to suggest desperately that actually everything's fine. Uh, this, this actually is a, an interesting twist in the story here.
Um, last thing I'll just mention, uh, the foreign direct product rule, the FDPR is, is being applied for this. So this is this, this idea that, uh, you cannot sell to China your, uh, your equipment, your semiconductor equipment, your chips, whatever. If they were made using any amount of American technology, um, or at least that's the threshold that they picked for this basically 0 percent threshold on the foreign direct product rule. Um, so basically no one, you need to get a license exemption.
If your stuff uses any American tech whatsoever, they do carve out exceptions. Interestingly, for Japan and for the Netherlands, which is sort of. Sort of interesting anyhow, uh, but, uh, anyway, uh, there it goes. That was one of the big red lines that people were wondering, you know, will they cross this, this line?
And so, um, anyway, uh, all kinds of stuff that I think we actually should talk about this, uh, in more detail when we talk about the hardware episode, because there's stuff here as well on the semiconductor manufacturing equipment and UV, that's really important too. But. For now, I'll just leave it there because we got to wrap the episode at some point.
Yeah. Yeah. That hardware episode will, has to be our best work yet, honestly.
Well,
very much a related next story, pretty much directly affiliated. The next story is that China has retaliated against the U. S. for this very restriction. They have said that the exports of certain minerals, including gallium, germanium, antimony, And some other ones are prohibited to the U. S. So, you know, the U. S. cannot have these and there are strict, uh, stricter restrictions and controls related to exports of graphite and graphite is used in batteries.
China is the dominant supplier of it. 77 percent of the world's supply coming out of there. So this seems like a pretty big blow back and a pretty big retaliation as far as I can tell, although what do I know?
Yeah, I think what does anyone know? I mean, part of the challenges in the U. S. Uh, were we could produce a lot of this stuff. We're just not doing it for a whole bunch of reasons. And some of our most important critical minerals minds are actually Chinese owned as well. There's like a legacy of extraordinary and excruciating policy failures that have led to this dependency. But just to give you an idea. Yeah, China, yeah.
98 percent of the world's supply of gallium, 60 percent of its supply of germanium. And one question you're entitled to be asking yourself is, well, why does that matter? What are these actually used for? So when you think about gallium in particular, this is maybe the most significant one for AI chips. Gallium nitride is used in power delivery systems for for AI accelerator. So for GPUs, TPUs, that sort of thing.
And, um, Just because anyway, it has favorable, uh, favorable properties from a, uh, a conduction and, and the kind of thermos, uh, thermodynamic standpoint. Um, so, uh, yeah, so, so that's a really big issue. Uh, another one is, uh, the gallium arsenide side. So some chips use gallium arsenide, uh, for high speed interconnects and radio frequency stuff.
Um, so that's, that's gallium pretty, pretty central to the actual like logic side in, in, so far as, uh, they're important for, for power and stuff. Um, germanium also important because you do see silicon germanium used quite a bit in high high speed communication interfaces between AI chips and and memory especially. So these are actually pretty core.
I mean, it's, you know, it's not in, but like if we had a deregulation push, We really could onshore a lot of this shit, but we've, we've put ourselves in a, in a real pickle. Um, and, uh, and I think that this is something, you know, you're going to see the Trump administration come in and, and, uh, and do stuff about this, but, uh, this is also a bit of a warning shot you're seeing from the CCP saying that, Hey, the Trump administration is about to come in. We don't want to see them see us.
Take their export controls lying down. They just like, you know, the Biden administration just threw on as we just discussed this third round of tighter export controls. So now we want to try to, you know, put up a front and be like, Hey, you know, we're gonna, we're gonna respond with our own stuff. Um, how, how effective that actually ends up being remains to be seen, but this is a bit of a warning shot there. Uh, we'll, we'll see if that goes well with the, uh, with the Donald in charge.
Uh, but, uh, anyway, uh, it kind of an interesting tit for tat play here and a call certainly for. Uh, the U. S. to figure out its game on domestic production of, uh, of critical minerals and rare earths.
And onto the lightning round, just a few more things to cover. First, we have the story that Opnia is working with Anduril to supply the U. S. military with AI. Uh, I don't know how you pronounce this, Anduril. Anduril is a defense startup. Uh, okay, Anduril, I'll take your word for it. Uh, they, I believe, work on air defense. systems, uh, drones. And so it appears that they are now collaborating to improve those products.
And so our through and Dural, uh, working with the U S military, you've seen something similar with, uh, Oh, what's that other one that starts with P but everyone, yeah. Palantir, exactly. So, uh, yeah, the AI sector and the tech sector as a whole seems to be warming to the idea of collaborating with the defense sector. And this is just really just example of that.
Yep. I think it's also, I mean, this, this, uh, drags in a whole bunch of political considerations. Um, and, and people's ideologies quickly perk up. But, uh, this has been a recruiting challenge for, for Google in the past, right? When you, when you fill your company with people who, who don't like the idea of partnering with the, the DOD, the like U. S. Department of Defense, um, you know, if you then go out and do a partnership, you get protests.
That's what happened in the context of, uh, project Maven, which was Google's then famous, I think, 2018 project with, uh, with the DOD. Um, you had a lot of walkouts and a lot of protests, that sort of thing. Um, Of course, you know, us adversaries are doing this. So if, if we don't do it at all, like we're going to, we're going to be pretty screwed pretty fast. Um, but there's, you know, there are questions about how to do it and so on.
And, um, I, uh, I guess I, yeah, I have biases, same as anyone. And, and, um, yeah, mine are, you, you, you probably do want, uh, some close collaboration between American technology, uh, companies and, and the DOD so that we're at the forefront of capability. Um, though obvious ethical issues exist and so on, you know, this is not a cut and dry thing, but bottom line is, um, yeah, opening eyes, shifting gears there.
They used to have a policy that really said, you know, we won't do this sort of thing. So it is interesting to see the move in that direction. I think Anthropic made a similar move recently. You were talking about Palantir and I'm trying to remember, I'm embarrassed to say, I can't remember if it's, uh, Anthropic. I don't think they're partnering with Palantir. Are they? I think they're, there's something about making their, their stuff available. Yeah. It was the change
of policy. I somehow I'm also forgetting who exactly partnered with Palantir. Yeah. It's too much to try and remember, but this was another example of this in the sector for sure. And possibly related, but maybe not related, not sure, certainly related to a lot of stuff we've been covering in this section, uh, over the past months. The story is again on OpenAI and another AI safety researcher quitting.
And once again, having some indication that this is not, uh, because of, uh, their personal things, it's really. Because of what's happening at OpenAI. So this safety researcher is Rosie Campbell. She, um, you know, had a blog post that came out. Uh, she has worked with her for several years, including with. Miles Brondage, who was the head of the AGI readiness team also departed and essentially indicated that OpenAI has gone in a route where he could not be effective anymore in doing what.
Key things needs to be done for AI safety appears to be the same essential message here that, uh, she is concerned about the company's trajectory and safety practices. So, you know, add it to the trend. Certainly, you know, we don't know exactly what's happening inside OpenAI, but there's been an awful lot of safety people leaving in this past few months.
Yeah, it's sort of funny. I mean, it's, um, when you talk to people at the lab, it's like, there's very much a sense that the, um, the lab is treadmilling out a lot of the people who not just, um, not just care the most about safety and security, by the way. I think that one is undervalued.
Um, but, uh, But, but yeah, they're, they're like kind of bringing in all these product people and starting to think a about AGI as a product and be, um, framing all this kind of China competition piece through the lens of just like, yeah, let's accelerate on capabilities without really tracking that currently. And, and this is, I'll say this is my opinion. Um, this is a project that my team and I've been working on for over a year now.
Uh, OpenAI's security and, and broadly Frontier Lab security is shit. I mean, like, to the extent that OpenAI even believes its own marketing, that they are building superintelligence, that they're on track for that.
They have a completely inappropriate level of security that is, like, I find it very difficult to imagine that they are not, like, fully penetrated by the CCP, like, to the point of, as, as Marc Andreessen says, getting, you know, Like daily downloads or weekly downloads of model training checkpoints like that sort of thing is completely plausible
to me at this stage, um, whether you talk to the researchers themselves, like there are a lot of whistleblowers are opening, I was spoken to in that direction as well as national security experts who will tell you like roughly what, what the left and right bounds are and what can be done technologically from an infiltration espionage standpoint and what is being done to like how aggressive China actually is on a whole bunch of fronts.
So. I think that this is, uh, you know, we haven't heard the last of this. And, um, uh, they are, like, doing a great job of treadmilling out, like, their, their safety and security talent. I think there's already a bit of a, a gap in terms of just technical ability at OpenAI to keep up with the security situation because so many of the people who care about it have left. So, uh, yeah, I mean, this is a bit of a downward spiral in that dimension.
Um, and I think the losses of Ilya Sutskever and, and, uh, you know, John Shulman and all those, uh, very important figures, um, Mira Mirati, the list goes on and on. Uh, is, is also significant. And I think it's all part of that anyway, that, that mean plex. So, so there you have it.
That's right. And this blog post, you know, it's not too spicy, but at the same time, it's pretty clear that there is, uh, a mismatch and, uh, As with Miles Brundage, uh, this blog post says that, uh, Campbell doesn't see a place for her to continue doing this kind of work, which she's been doing internally at OpenAI also says, uh, as a two cents for the rest of OpenAI, remember the mission is not simply to build AGI. There's still so much to do to ensure it benefits humanity.
And that I think probably speaks a lot to what's happening is there's so much excitement about building AGI. The mission was to build AGI to benefit all of humanity, but, uh, you know, maybe building AGI is definitely at the forefront right now. And next up, let's move away a little bit from drama and open AI, uh, to a research paper related to safety and a reminder of why safety is something we should care about.
This one is titled on targeted manipulation and deception when optimizing LLMs for user feedback. And the high level quick summary is that you can train your LLMs to sort of do what the person wants, what the users want by saying, you know, Oh, this helps me good to do more of this, thumbs up, or you can be negative. Right. And there is a phenomena that's pretty well known in reinforcement learning, uh, and general optimization of feedback gaming, where the LLM can find a way to get the reward.
In kind of not the appropriate way in, uh, a way that gets it more reward, but is maybe not what you want, or even like exactly what you don't want. So in this example, uh, it could manipulate people and deceive them to get to high reward. And they did find that this reliably happens in scenarios where you have feedback. And I'm sure Jeremy, you picked out some interesting things here.
Yeah. Well, the, uh, the dimension they're exploring here too, right? Like historically, what we've seen is. During the training process, um, you get a bunch of Raiders, right. To give feedback to the model and that if you do that, uh, eventually the model learns to, to do things and say things that get upvotes from the Raiders, um, but not necessarily that are true or accurate or, or whatever. Right. So that's, that's one kind of, um, failure mode. This is, we see this reflected in.
You know, Claude, anthropic has a series of papers about how Claude does, um, sick of fancy. And they've got a whole bunch of different kinds of sycophancy where the model plays, like you place your ego or whatever to, to get that upvote, um, in, in fairly pathological ways that are associated with reward hacking. Um, this is a bit different. So this is about, uh, the model essentially being trained in, in real time from end user.
Feedback to, to optimize for those upvotes rather than just from Raiders during the training process before deployment. And so the question is like, does this problem set generalize to that new setting where people in real time, as you use like user feedback is being used to optimize. Basically,
yeah. Like charge UET, you can leave a thumbs up, right. When you have messages, something
like that. Exactly, exactly. Yeah, that's it. And so they give a couple of examples that that are pretty, pretty interesting. So what they do is they'll they found that models will will learn to deliberately lie about successful bookings, for example, if they're asked to book a flight or a hotel, even when the system Had errors when there was a system error that prevented the booking from going through.
So when, so what they tried was preventing the model from outright lying and they applied safety measures for that. Um, but when they did that, the models learned more subtle manipulation tactics, like trying to discourage users from booking altogether. Basically you're there, you're saying, Hey, I want to go in the Cayman Islands. And the model realizes, Oh shit, I can't book a hotel. There's no free hotel or whatever. So let me try to convince. The user not to go to the Cayman Islands.
Well, do you really want to go to Cayman Islands? Hey, it's a little cold this time of year, whatever, um, that sort of thing. So it's sort of fascinating and, and all the things that you might expect actually, unfortunately, um, from, from these models as they get more capable, they tend to get better at obfuscating applying it and so on and so forth. We'll be talking more about this.
Obviously, um, I guess next week, if we can, I think we should talk about the one model card that their system card. Um, because there's a lot of interesting examples that are in this direction. But this is as we discussed, this is in the context of a very particular kind of optimization scheme where it is end user ratings. And they found that even a very small fraction of manipulable end users can teach the model to manipulate. So it's a very generalizable sort of behavior pattern.
And just one more story, this one has to do with a theme that used to be pretty often something we got to, but not something we've talked about too much lately, it is about AI and election misinformation, and What the news is, is that Meta has a report now on the misinformation and in that report, they say that AI generated content seems to have accounted for less than 1 percent of election misinformation on its platforms during the 2024 global elections, including the U. S.
presidential election. So, there are many concerns about AI generated content, uh, simplifying, uh, making it easier to, uh, do a lot of misinformation cheaply. And according to this report, at least, it seems that the, uh, bad actors who would do that, uh, are not necessarily doing it. And, uh, you know, it's, uh, uh, Maybe not as big a problem as we, many people thought it would be, which, you know, kind of anecdotally seems to be the case.
It doesn't seem like anyone feels like AI had much to do with elections.
Yeah, it's also kind of, it's kind of tricky to assess a, Uh, you know, what is and isn't a I generated content, right? We've talked about this, but when you have short pieces of generated text, there's actually a pretty hard limit on how reliably you can assess whether it was a I generated or not. So I think there's there's some some questions there, but also, so. If you have, you know, less than 1%, around 1%, um, a big question is where is that 1 percent focused, right?
And if you're, if you're specifically targeting, you know, swing voters in, in certain districts, whatever, I mean, you would actually expect an efficient information operation, uh, to not involve. You know, a massive surplus of, of agent ties, disinformation or whatever on the internet.
Like what you would expect is a more targeted operation, just like, you know, the, the campaigns, the presidential campaigns only care about really like what's going on in Wisconsin and Florida and Georgia, you know, those, the swing States. Well, the, you know, these guys will too, and, and double down even further on that. They're only going to care about what happens in the small handful of counties within those States that determine the outcomes of elections.
And within that, the small handful of demographics that are actually shiftable. So you can do a pretty targeted operation on, on a small, uh, subset of, of the American voter. Uh, that, uh, that's pretty effective. That being said, you know, I, I'll happily say, I mean, 1 percent less than 1 percent is maybe less than I would have expected, uh, as a bulk measure.
I just, I just don't know how reliable that number is and, and how, how far it, it, it It can really be thrown if you know what I mean.
Exactly. Yeah. It, it doesn't seem like they disclose too much about how they got to a number and perhaps, you know, there is some uncertainty there, but regardless, certainly I think deep fakes, uh, you know, AI generated imagery didn't wind up being, playing a big role, perhaps. Next generation maybe did speed up or help some of these operations. Uh, but these kinds of operations did exist. And they do also say that they took down 20, uh, such operations from across the world as an example.
And that is it quite the full episode. So thank you for sticking around until the end. Uh, I know we probably moved pretty fast. Uh, and as always, if you want Get deeper into any of your stories. You can look into the episode description or go to lastweekin. ai where we also have our text newsletter. We always appreciate your comments. There was a lot in this episode, so do feel free to leave a comment. And of course, also feel free to leave a review.
We always appreciate a nice five stars and any feedback as to how we can improve. But more than anything, we like people tuning in and listening. So please keep doing that. In a digital world where
the future's on fold With chat, GPT, Pro, breaking every mold Amazon Nova's got the edge in the game And the Lama 3. 3 is rising to fame For the future's here now, watch the engines roar With DeepMind's Genie 2, it's an open door From the left of the streets, innovation's alive Last week in AI, let's drive and survive
In the tech filled dream
where the wonders flow From Jack, CPT, Probe to the streams that we know Watch the outer endless dance and the magic prevail In this rubble bay our future stories are told