#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings - podcast episode cover

#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

Jul 04, 20251 hr 34 minEp. 254
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Our 214th episode with a summary and discussion of last week's big AI news! Recorded on 06/27/2025

Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at [email protected] and/or [email protected]

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

In this episode:

  • Meta's hiring of key engineers from OpenAI and Thinking Machines Lab securing a $2 billion seed round with a valuation of $10 billion.
  • DeepMind introduces Alpha Genome, significantly advancing genomic research with a model comparable to Alpha Fold but focused on gene functions.
  • Taiwan imposes technology export controls on Huawei and SMIC, while Getty drops key copyright claims against Stability AI in a groundbreaking legal case.
  • A new DeepMind research paper introduces a transformative approach to cognitive debt in AI tasks, utilizing EEG to assess cognitive load and recall in essay writing with LLMs.

Timestamps + Links:

  • (00:00:10) Intro / Banter
  • (00:01:22) News Preview
  • (00:02:15) Response to listener comments

Tools & Apps

Applications & Business

Research & Advancements

Policy & Safety

Synthetic Media & Art

Transcript

Intro / Banter

Hello and welcome to the last week in AI podcast. We can here chat about what's going on with ai as usual. In this episode, you'll summarize and discuss some of last week's most interesting AI news, and you can check out the episode description for the links to that and the timestamps. I'm Ron of your regular hosts Andre Koff. I studied AI in grad school and now work at a generative AI startup.

And I'm your other host, Jeremy Harris co-founder of Gladstone ai ai, national Security, blah, blah blah, as you know. And I'm the reason this podcast is gonna be an hour and a half and not two hours. Andre is very patiently waiting for like half an hour while I just sorted out. Just my, my daughter's been teething and it's wonderful having a daughter, but sometimes teeth come in six or eight in a, in a shot, and then you have your hands full. And so she is the greatest victim of all this.

But Andre's a close second because boy, that that was a, I I kept saying five, five more minutes and it never happens. So I appreciate the patience, Andre. I got an extra half hour to prep, so, I'm not complaining. And I'm pretty sure you had a rougher morning than I did. I was just drinking coffee and, and waiting. So not too bad. But speaking of this episode, let's do a quick preview.

News Preview

It's gonna be, again, kind of less of a major Newsweek, some somewhat. Decently big stories, tools and apps. Gemini, CLI is a fairly big deal, applications and business. We have some fun OpenAI drama and a whole bunch of hardware stuff going on, and not really any major open source stuff this week. So we'll be skipping that. Research and advancements exciting new research from DeepMind and just various papers about scalable reasoning, reinforcement learning, all that type of stuff.

Finally, in policy and safety, we'll have some more interoperability, safety China stories, the usual, and some pretty major news about copyright following up on what we saw last week. Yes. So that actually would be one of the highlights of this episode towards Van.

Response to listener comments

Before we get to that, do wanna acknowledge a couple reviews on Apple Podcast as we do sometimes. Thank you to the kind reviewers leaving us some very nice comments. Also some fun ones. I like this one. This viewer said, I want to hear witty and thoughtful response on why AI can't do what you're doing with the show. And wow, you're putting me in a spot being both witty and thoughtful. And it did make me think, I will say I did try notebook alarm a couple months ago, right?

And that's the podcast generator from Google. It. Was good, but definitely started repeating itself. I found that LLMs still often have this issue of losing track of where they're at like 10 minutes, 20 minutes in repeating themselves or just otherwise. And, and also Andre and, and repeating themselves too. Right? And, and they'll, they'll just keep saying the same thing in, in repeating over and over, like they'll repeat and, and repeat a lot. So, yeah.

Yeah, no, that, that was, that kind of a petition was solved a couple years ago, thankfully. But yeah, true. Honestly, you could do a pretty good job replicating last week in AI with LLMs these days. I'm not gonna lie, but you're gonna have to do very precise prompting to get our precise personas and personalities and voices and so on. So, I don't know. Hopefully we are still doing a better job, but AI could do or at least doing a different job.

Then the more generic kind of outcomes you could get. Trying to elicit AI to make an AI news podcast. But dude, and what AI could compete with starting 30 minutes late because it's daughter's teething. Like I challenge you right now. Try it. You're not gonna find an AI that can pull that off. you can, you can have, I says it does. That's right. That's right. Will the emotion of that experience actually be it? That's right. I don't think so. I think the cop way, right?

People are often like, oh, it won't have the heart, it won't have like the, the soul, you know, the podcast. It will, it will. In fact, I think arguably our job is to surface for you the moment that that is possible that you can stop listening to us. One of the, the ver the virtues of, not being like a full-time podcaster on this too, is we have that freedom maybe more than we otherwise would. but man, I mean, it's, I would expect within the next. 18 months.

Hard to imagine that there won't be something comparable. But then you, you know, your, your podcast host won't have a soul. They'll be stuck inside a box, Well, in fact, I'm, I'm certain, I, I believe as of quite a while ago, there are already AI generated, the AI News podcast out there. Yeah. I haven't checked 'em out, but I am sure they exist. And nowadays they're probably quite good. And, and you get one of those every day as opposed to once a week, and they're never a week behind.

So in some ways definitely superior to us, but in other ways can they be so witty and thoughtful in responding to such a question? I don't know. I don't think. Yeah. In fact, can they be so lacking in with and fought as we can be sometimes. That's right. Now that's, that's a challenge. You know, they'll never outcompete with our stupid. Yes. As this true in general, I guess you'd have to really try to get AI to be bad at things when it's actually good. Anyways, a couple more reviews lately.

So do you wanna say thank you? Another one is called this the Best AI podcast, which is quite the honor and says that this is the only one they listen to at normal speed. Most of our podcasts are played in 1.5 or two x speed, so good to hear we are using up all our two hours at a good pace. That's right. Funny, a, a while ago there was a review that was like, I always speed up through Andre's talking and then have to listen. No worry for Jeremy. So maybe I've sped up since then.

So yeah, as always, thank you for the feedback and thank you uh, for questions that you bring in. I, it's a fun way to start the show. But now let's go into the news

Tools & Apps

starting with tools and apps. And the first story is, I think one of the big ones of this week, Gemini, CLI. So this is essentially Google's answer to cloud code. It is a thing you can use in your terminal, which for any non-programmers out there is just the text interface to working on your computer. So you can, you know, look what files they are, open them, read them, type stuff, et cetera, all via non UI interface.

And now this CLI is that is Gemini in your terminal and it has the same source of capabilities at a high level as cloud code. So it's an agent and you launch it and you tell it what you want it to do and it goes off and does it. And it sort of takes turns between it doing things and you telling it to follow up to change what it's doing or to check what it's doing, et cetera.

With this launch Google is being pretty aggressive, giving away a lot of usage 60 model requests per minute and 1000 requests per day. It's a very high allowance as far as caps, and there's also a lot of. Usage for free without having to pay. I'm not sure if that is the cap for free, but for now, you're not gonna have to pay much. I'm sure sooner or later you get to the cloud code type of model.

Where to use cloud code at the highest level, you have to pay $200 per month for a hundred dollars a month, which is what we, at our company already do because cloud code is so useful. From what I've seen on conversations online, the vibe eval is that this is not quite as good as cloud code. It isn't as capable as software engineering at using tools just generally figuring things out as it goes. But it was just released. Could be a strong competitor soon enough.

Yeah, I, I'm still amazed at how quickly we've gotten used to the idea of a million token context window, by the way, 'cause this is powered by Gemini 2.5 Pro, the reasoning model and that's part of what's in the backend here. So that's gonna be the reason also that it doesn't quite, you know, live up to the Claude Standard, which is obviously a model. That's a lot. I dunno, it just seems to work better with code. I'm curious about when that changes, by the way, and what.

Philanthropics actual recipe, like why is it working so well? We don't know obviously, but someday, maybe after the singularity, when we're all one giant hive mind, we'll know what actually was going on to make the Claude models this good and persistently good. But in any case, yeah, it's a really impressive play. The advantage that Google has of course, over. Philanthropic currently is the availability of just a larger pool of compute.

And so when they think about driving costs down, that's where you see them trying to compete on that basis here as well. So a lot of free prompts a lot of free tokens, I should say, good deals on the, the token counts that you put out. So, you know, it's, it's one way to go and I think as, as the, ceiling rises on the capabilities of these models eventually cost does become a more and more relevant thing for any given fixed application. So, that's an interesting dynamic, right?

The frontier versus the fast followers. Dunno if it's quite right to call Google a fast follower. They're definitely doing some frontier stuff, but anyway yeah, so interesting. Next, next move here. Part of the productionization, obviously of, of these things and entering workflows in very significant ways, I think that's, you know, this is heading in. Slow increments towards a world where agents are, are doing more and more and more.

And you know, context, windows, coherence lengths are, are all part of that. Right? Yeah. We discussed last year, like towards the beginning of last year, was real kind of hype train for agents and the agent future. And I think CLO code and Gemini CLI are showing that we are definitely there in addition to things like replicate lovable.

Yeah. Broadly speaking, L Lambs have gotten to a point partially because of reasoning, partially presumably just due to improvements in L lambs where you can use them in agents and they're very successful from what I've seen. Part of the reason cloud code is so good is not just cloud, it's also just. Cloud code, particularly the agent is very good at using tools. It's very good at doing text search, text replacement.

It's very keen on writing tests and running them as it's doing software engineering. So it, it is a bit different than just thinking about an LLM. It's a whole sort of suite of what the agent does and how it goes about its work that makes it so successful. And that's something you don't get out of a box with LLM training, right? Because tool usage is not in your pre-training data. Yeah. It's, it's something kind of on top of it.

So that is yet another thing similar to reasoning where we are now going beyond the regime of you can just strain on tons of data from the internet and get it for free. More and more things in addition to alignment. Now you need to add to VLM beyond just throwing a million. Gigabytes of data at it, it really is a system, right? Like at the end of the day, it's not, it's also not just one model.

I think a lot of people have this image of like, you know, there's one monolithic model in the backend. Assume that there's a lot of like models choosing which models to answer a prompt. And I'm not even talking about MOE stuff, like just liter, literal software engineering in the backend that makes these things have the holistic field that they do. So yeah, FYI, by the way, I didn't remember this so I looked it up.

CLI stands for command line interface command line, and never term for terminal. So again, for non-programmers, fund detail. And speaking of cloud code, the next story is about philanthropic and they have released their ability to publish artifacts. So artifacts are these little apps essentially you can build within cloud, you get a preview and their interactive web apps more or less. And as with some other ones, I believe Google allows you to publish gems is what they call it.

Now you can publish your artifacts and other people can browse 'em. They also added the support to building apps. With AI built in with Claude being part of the app. So now if you wanna build like a language translator app within Claude, you can do that because the app itself can query Claude to do a translation.

So, you know, not a huge delta from just having artifacts, but another sort of seemingly trend where all the LMS tend to wind up at similar places as far as you add things like artifacts when you make it easy to share what you build. Yeah. And you know, it's something that anyone can do. Most users on their free Pro Max tiers can share and they'll be interested to see what people build. And, and if I'm, if I'm rep lit, I'm getting pretty nervous looking at this.

Granted, obviously Rep Lit has, so Repli write that platform that lets you essentially like launch an app really easily, takes abstracts away, all the like server management and stuff. And like you've got kids, launching games and, and all kinds of useful apps and learning to code through it. Really, really powerful tool and super, super po. I mean, it's 10 x year over year. It's, it's growing really fast.

But you can start to see the, the frontier moving more and more towards, let's make it easier and easier at first for people to build apps. So we're gonna have, you know, a, an agent that just writes the, the whole app for you or whatever and, and just produces the code.

But at what point does it naturally become the next step to say, well, let's do the hosting, let's abstract away all the things, you could see open ai, you could see Anthropic launching a kind of app store that's not quite the right, term, right, because we're talking about more fluid apps.

But you know, moving more in that direction, hosting more and more of it, and eventually getting to the point where you're just asking, the AI company for whatever high level need you have, and it'll build the right apps or whatever. Like that's not actually that crazy sounding today. And again, that swallows up a lot of the replicate business model. And it'll be interesting to see how they respond.

Yeah. And this is particularly true because of the converging or parallel trend of these context model protocols that make it easy for AI to interact with other services. So now if you wanna make an. Talks to your calendar, talks to email, talks to your Google Drive, whatever you can think of. Basically any major tool you're working with, AI can integrate with it easily. So if you wanna make an app that does something with connection to tools that you use, you could do at VIN Cloud.

So, as you said, I think both rep and lovable are these emerging titans in the world of building apps with ai. And I'm sure they'll have a, a place in the kind of domain of more complex things. We need databases and you need authentication and, and so on and so on. But if you need to build an app for yourself or for maybe just a couple of people to speed up some process, you can definitely do it with these tools now and then share 'em if you want.

Applications & Business

And onto applications in business as promised, kicking up with some open AI drama, which we haven't had in a little while. So good to see it isn't ending this time. It's following up on this IO trademark kind of lawsuit that happened. We covered it last week where we had OpenAI, Sam Altman announced the launch of this IO initiative with Johnny Ive, and there's an other AI audio hardware company called io. Spelled differently, IYO instead of IO.

And they sued alleging that you know, they stole the idea and also the trademark. The names sound very similar. And yeah, Sam Altman hit back, decided to publish some emails, just screenshot of emails showing the founder of io, let's say, being very friendly, very enthusiastic about meeting a wildman and wanting to be invested in by OpenAI.

And the, the basic gist of what Alman said is this Founder Jason Rugo, who filed the lawsuit was kind of persistent in trying to get investments from Sal Malman, in fact even reached out in March prior to the announcements with Johnny. Ive and apparently Sam Altman, you know, let him know that the competing initiative he had was called io. So, definitely I think it effective pushback on the lawsuit. Similar in a way to what OpenAI also did with Elon Musk.

Just like, here's the evidence, here's the receipts of your emails. I, not too sure if what you're saying is legit. this is becoming, well, two is not yet a pattern. Is it? Is it three? I forget how many takes to make a pattern. They say. Then again, I don't know who they are or why they're qualified to tell us. It's a pattern, but yeah. This is a, a, an interesting situation.

One interesting detail kind of gives you maybe a bit of a window in, into how the balance of evidence is shaping up so far. We do know that in the lawsuit. Eo, so not io, but eo. This is, I was gonna say Jason Derulo, Jason Rug Rug's Rugo. Yeah, rug's company did end up sorry where was it? They were actually, yeah, they were granted a temporary restraining order against OpenAI using the IO branding themselves.

So the opening, I was forced to change the IO branding due to this, this temporary restraining order, which was part of EOS trademark lawsuit. So at least at the level of the trademark lawsuit, there has been an appetite from the courts to put in this sort of preliminary, temporary restraining order. I'm not a lawyer, so I don't know what the, the standard of proof would be that would be involved in that. So at least at a trademark level, maybe it's like, sounds vaguely similar enough.

So, yeah, for now, let's, let's tell OpenAI they can't do this. But there's, I. Enough fundamental differences here between the devices that you can certainly see Opening Eyes' case for saying, Hey, this is different. They claim that the IO hardware is not an in-ear device at all. It's not even a wearable. That's where that information comes from. That was itself doing the rounds. This big deal, opening eyes' new device is not actually gonna be a wearable after all. But we do know that.

Apparently. so Olo was trying to pitch a bunch of people about their idea about the IO concept, sorry, the EO concept way back in 2022, sharing information about it to former Apple designer Evans Hanky, who actually went on to co-found io. So, you know, there's a lot of overlap here. The claim from OpenAI is, look, you've been working on it since 2018. You demoed it to us. It wasn't working. There were these flaws.

Maybe you fixed them since, but at the time it, it was a janky device, so that's why we didn't partner with you. But then you also have this whole weird overlap where, yeah, some of the founding members of the EO team had apparently spoken directly to EO before. So it's pretty messy. I think we're gonna learn a lot in the, in the court proceedings. I don't think these emails give us enough to go on to make a firm determination about. What, because we don't even know what the hardware is.

And that seems to be at the core of this. So what is the actual hardware and how much of it did OpenAI did love from, did IO actually see? Right? And in the big scheme of things, this is probably not a huge deal. This is a lawsuit saying you can't call your thing IO because it's too similar to our thing EEO. And it's also seemingly some sort of wearable AI thing. So worst case, presumably the initiative by Semi and Journey Ive changes. I think more than anything this is just.

Another thing to track of OpenAI, right? Another thing that's going on that for some reason, right, we don't have these kinds of things with philanthropic or Yeah. Or Misra or any of these other companies, maybe because OpenAI is the biggest. There just tends to be a lot of this, you know, in this case, legal, business drama, not interpersonal drama, but nevertheless, a lot of headlines and honestly juicy kind of stuff to discuss that. Yeah, yeah, yeah.

Yeah. So, another thing going on and, and another indication of a way that Sam Ahman likes to approach these kinds of battles in a fairly public and direct way. Up next we have Huawei Make book contains Kirin X 90, using SMIC seven nanometer n plus two technology. If you're a regular listener of the podcast, you're probably going, oh my God. And then, or maybe you are, I don't know, this is maybe a little in the weeds.

But either way, you might wanna refresher on, on what the hell this means, right? So. There was a bunch of rumors actually floating around that Huawei had cracked, sorry, that SMIC, which is China's largest semiconductor foundry, or most advanced one. You can think of them as being China's domestic, TSMC. There was a bunch of rumors circulating about whether they had cracked the five nanometer node, right?

That critical node, that is what was used or a modified version of it was used to make the HH 100 GPU the Nvidia H 100. So if China were to crack that domestically, that'd be a really big deal. Well, those rumors now are being squashed because this this company, which is actually based in Canada did an assessment. So tech insights, we've actually talked a lot about their findings sometimes. While mentioning them by name, sometimes not. We really should.

Tech Insights is a very important firm in all this. They do these tear downs of hardware. They'll go in deep and figure out, oh, what manufacturing process was used to make this component of the chip? Right? That's the kind of stuff they do. And they were able to confirm that. In fact the Huawei X 90, so system on a chip is, was actually not made using five nanometer. Equivalent processes, but rather using the old seven nanometer process that we already knew SMIC had.

So that's a big, big deal from the standpoint of their ability to onshore, domestically GPU fabrication and keep up with the West. So it seems like, like we're like two years down the road now from when SMIC first cracked the seven nanometer node, and we're still not on the five nanometer node yet. That's really, really interesting. And so worth saying like Huawei never actually explicitly said that this new PC had a a five nanometer node. There was just a bunch of rumors about it.

So what we're getting now is just kind of the, the decisive quashing of that rumor. Right. And broader context here is of course, that the US is preventing Nvidia from selling top of line chips to Chinese companies. And that does limit the ability of China to create advanced ai. They are trying to get the ability domestically to produce chips competitive with Nvidia. Right now they're, let's say about two years behind, is my understanding. Mm-hmm.

And this is the real, one of the real bottlenecks is if you're not able to get the state of art fabrication process for chips, there's just less compute you can get on the same. Amount of chip, right? It's just less dense. And this arguably is the hardest part, right? To get this thing it takes forever, as you said, two years with just this process. And it, it is gonna be a real blocker if they're not able to crack it.

Yeah. The fundamental issue China's dealing with is because they have crappier nodes, so they can't fab the same quality of nodes as TSMC. They're forced to either steal TSMC fab nodes, so, or, or find clever ways of getting TSMC to fab their, their designs often by using subsidiaries or shell companies to make it seem like they're. Maybe we're coming in from Singapore and asking TSMC to FAB something, or we're coming in from a clean Chinese company, not Huawei, which is blacklisted.

And then the other, the other side is because their alternative is to go with these crappier seven nanometer process nodes. Those are way less energy efficient. And so the chips burn hotter or they run hotter rather, which means that you run into all these kinds of heat induced defects over time. And, and we covered that, I think last or two episodes ago the last episode I was on.

So anyway, there, there's a whole kind of hairball of different problems that come from ultimately the fact that SMIC has not managed to keep up with TSMC, right? And, and you're seeing all these 10 billion, $20 billion data centers being built. Those are being built with you know, racks and racks and, and huge amounts of GPUs. The way you do it, the way you supply energy, the way you cool it, et cetera. All of that is very conditioned on the hardware you have in there.

So it's, it's very important to ideally have the state of art to build with. Next story also related to hardware developments this time about a MD and they now have an ultra Ethernet ready network card. The Pan Sando Polar, which provides up to 400. Gigabits per second. Is that it? Mm-hmm. Per second performance. And this was announced at their Advancing AI event. It'll be actually deployed by Oracle Cloud with the a MD instinct MI three 50 x GPUs and the network card.

So this is a big deal because MD is trying to compete with Nvidia on the GPU front and the very series GPUs does seem to be catching up, or at least has been shown to be quite usable for ai. This is another part of the stack, the internship communications, but it's very important and very significant in terms of what NVIDIA is doing. Yeah, a hundred percent. This is, this is, by the way, the industry's first ultra ethernet compliant nic. So a network interface card.

So, what the NIC does you've got, and, and you can go back to our hardware episode to kind of see more, more detail on this, but in a rack, say at the rack level, at the pod level, you've got all your GPUs that are kind of tightly interconnected with accelerator interconnect. This is often like the, the Nvidia product for this is NV link. This is super low latency, super expensive interconnect.

But then if you wanna connect like pods to other pods or racks to other racks you're now forced to hop through a slower interconnect part of what's known sometimes as the backend network. And when you do that the Nvidia solution you, you'll tend to use for that is in finna band, right? So you've got, you've got envy link for the really, like within a pod, but then from pod to pod you have in Finny Band and in Finny band has been.

A go-to defacto like kind of gold standard in the industry for a while. companies that aren't Nvidia don't like that because it means that Nvidia owns more of the stack and has an even deeper kind of defacto monopoly on different components. And so you've got this thing called the Ultra Ethernet Consortium that came together, is founded by a whole bunch of companies MD, notably Broadcom.

I think meta and Microsoft were involved, Intel and they came together and said, Hey, let's come up with an open source standard. For this kind of interconnect with AI optimized features that basically can compete with the Infinity Band model that that NVIDIA has out. So that's what ultra ethernet is. It's been in the works for a long time. We've just had the announcement of specification 1.0 of that ultra ethernet protocol and that's specifically for hyperscale AI applications and data centers.

And so this is actually a pretty seismic shift in the industry and there are actually quite interesting indications that companies are going to shift from Infinity Band to this sort of protocol. And one of them is just cost economics. Like ethernet has massive economies of scale already across the entire, like networking industry and infinity band's more niche. So. As a result, you kind of have ultra ethernet chips and, and like switches that are just so much cheaper. So you'd love that.

You also have vendor independence. You have, because it's an open standard, anyone can build to it instead of just having Nvidia own the whole thing. So, so the margins go down a lot and, and people really, really like that. A obviously all kinds of operational advantages. It's just operationally more simple because data centers already know ethernet and how to work with it. So anyway this is a, a really interesting thing to, to watch. I know it sounds like it sounds boring.

It's the interconnect between different pods and a data center, but this is something that executives at the top labs really sweat over because there are issues with the infinity band stuff. This is one of the key rate limiters in terms of how big models can scale. Right. Yeah. To give you an idea, Oracle is apparently planning to deploy these latest A MD GPUs with a Zeta scale AI cluster with up to 131 and 72 instinct MI 3 55 x GPUs.

So when you get to those numbers, like think of it 131,000 GPUs, GPUs aren't small, right? Yeah. The GPUs are pretty big. They're not like a little chip where, I don't know, like notebook sized ish. And there's now 131,000 and you need to connect all of 'em. And when you say pod, right, typically you have this rack of them, like almost a bookcase. You could think where you connect them with wires.

But you can only get, I don't know how many, typically 64 or something on that side when you get to 121. Thousand, this kind of stuff starts really mattering. And in their slides in this event, they did, let's say very clearly compare themselves to competition. Said that this has 20 x scale over in feeding band, whatever that means, has performance of 20% over competition, stuff like that.

So, MD is very much trying to compete and be offering things that are in some ways ahead of Nvidia and others like Broadcom and so on. Mm-hmm. And next up another hardware story. This time dealing with energy, Amazon is joining the big nuclear party by buying 1.2 1.92 gigawatts of electricity from Talen Energy's. S usa Su, yeah. Susa nuclear plant in Pennsylvania. So nuclear, powerful ai, it's, it's over rage. Yeah. I mean, so we've known about if you flip back, right?

Originally this was the 960 megawatt deal they were trying to make and that got killed by regulators who were worried about customers on the grid. So essentially everyday people who are using the grid, who would. In their view, unfairly shoulder the burden of running the grid. Today, you know, Susquehanna powers the grid and that means every kilowatt hour that they put in leads to transmission fees that support the grid's maintenance.

And so, what, what Amazon was going to do was gonna go behind the meter, basically link the power plant directly to their data center without going through the grid so there wouldn't be grid fees. And that basically just means that the. General kind of grid infrastructure doesn't get to benefit from those fees over time. Sort of like not paying toll when you go on a highway.

And this new deal that gets us to 1.2 gigawatts is a revision in that it, it's got Amazon basically going through in front of the meter, going through the grid in the usual way. They're gonna be, as you can imagine, a whole bunch of infrastructure needs to be reconfigured, including transmission lines. Those will be done in spring of 2026. And the deal apparently covers energy purchase through 2042, Which is sort of amusing because like imagine trying to ahead of time.

But yeah, I guess we are predicting that they'll still need electricity by 2042, which assuming X risk doesn't come about. I, I suppose it's fair. Yeah. Yeah. Next story also dealing with nuclear and, and dealing with Nvidia. It is joining bill Gates and, and others in backing Terra Power, a company building nuclear reactors for powering data centers. So this is through NVIDIA's venture capital arm and ventures, and they have invested in this company Terra Power investing.

It seems like 650 million alongside Hyundai. And Terra Power is developing a 345 megawatt natrium plant in Wyoming right now. So they're, you know, I guess in the process of starting to get to a point where this is usable, although it probably won't come for some years. Your instincts are exactly right on the on the, the timing too, right? So, there's a lot of talk about SMRs, like small modular reactors which are just a very efficient way and very safe way of generating nuclear power on site.

That's the exciting thing about them. They are the obvious apart from like fusion, they are the obvious solution to the future for powering data centers. The, the challenge is when you talk to data center companies and builders, they'll, they'll always tell you like yeah, SMRs are great, but you know, we're looking at first, first approvals, first SMRs generating power, like at the earliest, you know, like 20, 29, 20 30 type thing.

So, you know, if you have sort of shorter a GI timelines, they're, not gonna be relevant at all for those. If you have longer timelines, even kind of. Somewhat longer timelines than, than they do become relevant. So it's a really interesting space where we're going to see a turnover in, in the kind of energy generation infrastructure that's used. And and this, you know, people talk a lot, a lot about China and their energy advantage, which is absolutely true.

I'm quite curious whether this allows the American energy sector to do a similar leapfrogging on SMRs that China did, for example, on mobile payments, right? When you, when you just like do not have the ability to build nuclear plants in less than 10 years, which is the case for the United States, we just. Like don't have that, that know-how and, and frankly, the willingness to deregulate to do it and the industrial base, then it kind of forces you to look at o other options.

And so if there's a shift just in the, the landscape of power generation, it can introduce some opportunities to, to play catch up. So, sort of a, I I guess that's a, hot take there that haven't thought enough about, but that's a, an interesting dimension anyway to the SMR story, by the way, one gigawatt apparently equivalent to 1.3 million horsepower. So not sure if that gives you an idea of what a gigawatt is, but it's a lot of energy. It gigawatt is a lot.

Yeah. 1 million homes for one day or, or what does that actually mean? I mean, it's a, so gigawatt is a unit of power, so it's like the amount of power that a million homes just consume at any given consume on a running basis. Yeah, yeah, exactly. So one gigawatt is a lot, so is 345 megawatts. Now moving on to some fundraising news. Mira Murti her company thinking Machines Lab has finished up their fundraising getting $2 billion at a $10 billion valuation.

And this is the seed round, so yet another billion round billion dollar seed round. And this is of course the former CTO of openAI left in 2024, I believe, and has been working on setting up, taking Machines Lab, another competitor in the a GI space, presumably planning to train their own models, recruited various researchers, some of them from OpenAI, and now has billions to work with that tell, deploy, presumably to train these large models.

Yeah, it's funny, everyone just kind of knew that it was gonna have to be a number with, with billion after it just because of the, the level of talent involved. It is a remarkable talent set. the Round is led by Andreesen Horowitz so a 16 Z on the cap table now. Notably though thinking machines did not say what they're working on to their investors. At least that's what this article, that's what it sounds like. The wording is maybe slightly ambiguous. I'll just read it explicitly.

You can make up your mind thinking Machines Lab had not declared what it was working on. Instead, using tis name and reputation to attract investors. So. That suggests that a 16 Z cut, they didn't cut the full $2 billion check, but they led the rounds. So hundreds and hundreds of millions of dollars. Just on the basis of like, yeah, you know, mi mirror's a serious fucking person. John Schulman's a serious fucking person. You know, Jonathan Laman, like all kinds of, of people bears off.

These, these are really serious people. So we'll cut you a $800 million. Check whatever they cut as part of that. That's both insane and tells you a lot about. How the, the space is being priced. The other weird thing we know, and we talked about this previously, but it bears kind of, repeating. So Mirati is gonna hold this, Amira is gonna hold board voting rights that outweigh all other directors combined. This is a weird thing, right?

This is not what is with all these a GI companies and the really weird board structures a lot of it is just like the OpenAI Mafia. Like people who worked at OpenAI did not like what Sam did, and learned those lessons and then enshrined that in the way they run their company, in their, in their actual corporate structure. And anthropic has, you know, their public benefit company set up with their, their oversight board.

And now thinking machines has this Mira Mirati dictatorship structure where she has final say basically over, over everything at the company. By the way, everything I've heard about her is. Is exceptional. Like every open AI person I've, I've ever spoken to about Mira has just like, glowing things to say about her.

And so even though $2 billion is not really enough to compete, if you believe in scaling laws it tells you something about, you know, the pe the kinds of decisions people will make about where they work. Include who will I be working with? And this seems to be a big factor, I would guess.

In, in all these people leaving open ai, she does seem to be a genuinely except exceptional per, like, I've never met her, but again, everything I've heard is just like glowing and both in terms of competence and in terms of kind of smoothness of, of working with her. So that may be part of what's attracting all this talent as well.

Yes. And on the point of not quite knowing what they're building, you go, if you go to thinking machines, ai this has been the case for a while, you'll get a page of text. The text is, let's say like reads, like a mission statement that sure is saying a lot.

There's stuff about scientific progress being a collective effort, emphasizing human AI collaboration, more personalized ai systems, infrastructure, quality, advanced multimodal capabilities, research, product co-design, empirical iterative approach to AI safety, measuring what truly matters. I have no idea. This is like just saying a whole bunch of stuff, and you can really take away whatever you want. Presumably it'll be.

Something that is competing with open AI and philanthropic fairly directly is the impression. And the, yeah. At near the bottom of the [email protected] founding team has a list of a couple dozen names. Each one with you can hover over it to see that background, as you say, like real heavy heaters. And then there are advisors and a Join Us page. So yeah, it really tells you what if you gain a reputation and you have some real star talent in Silicon Valley, that goes a long way.

And on that note, next story quite related Meta has hired some key open AI researchers mm-hmm. To work on their AI reasoning models. So, a week ago or two weeks ago, we talked about how meta paid a whole bunch of money invested rather in scale AI and hired a away the founder of Scale ai Alex Wang, to head their new super intelligence efforts. Now there are these reports I don't know if this is highlighting it particularly because open ai or perhaps this is just reduce details.

I'm sure Meta has hired other. Engineers and researchers as well. But I suppose this one is worth highlighting. They did hire some fairly notable figures from opening eye. Yeah. So this is Lucas Bayer, Alexander Knik, and she, how Ja, who I believe founded the Sweden office. Switzerland office, was it. Anyway, they, they we're a fairly significant team at OpenAI.

Or so it appears to me, and I think Lucas Bayer did post on Twitter and say that the idea that Twitter paid a hundred million dollars was fake news. This is another thing that's been up in the air. Sam Alman has been taking you could say some gentle swipes saying that meta has been promising insane pay packages. So all this to say is this is just another indication of Mark Zuckerberg very aggressively going after talent.

We know he's been personally messaging dozens of people on WhatsApp and whatever, being like, Hey, come work for Meta. And perhaps unsurprisingly, that is paying off in, in some ways in expanding the talent of this super intelligence team. Yeah, there's a lot that's both weird and interesting about this. The first thing is anything short of this would be worth zero.

The, the, when you are in Zucks position and you are, and I'll just sort of like this is colored by my own interpretation of who's right and who's wrong in this space. but I think it's increasingly sort of just becoming clear in fairness.

I don't think it's just my biases saying that when, when your company's AI efforts, despite having access to absolutely frontier scales of compute, so having no excuses for failure on the basis of access to, to infrastructure, which is the hardest and most expensive thing when you've managed to tank that so catastrophically.

Because your culture is taken, is screwed up by having y Koon as the mascot, if not the leader of your internal AI efforts, because he is not actually as influential as it sounds or hasn't been for a while on the internals of Facebook. But he has set the, beat at Facebook at meta. Being kind of skeptical about a GI, being skeptical about scaling and then like changing his mind in ego preserving ways without admitting that he's changed his mind. I think these are very damaging things.

They destroy the credibility of meta and have done that damage and I think I. The fact that meta is so far behind today is a reflection in large part A, a consequence of Yann Koon's personality and his inability to kind of update accordingly and maintain like epistemic humility on this. I think everybody can see it. He's like the old man who's still yelling at clouds and just like, as the clouds change shape, he's like trying to pretend they're not.

But, but I think just like speaking as, like, if I were making the decision about where to work, that would be a huge factor. And it has just objectively played out in a catastrophic failure to leverage one of the most impressive fleets of AI infrastructure that there actually is. And so what we're seeing with this set of hires is people who are, I mean, so completely antithetical to Yen Koon's way of thinking, like meta could not be pivoting harder in terms of the people it's poaching here.

First of all, OpenAI obviously one of the most scale pilled organizations in the space. Probably the most scale anthropic actually is, is up there too. But also scale AI's, Alex Wang. So, okay, that's interesting. Very scale pilled dude. Also very AI safety pilled dude. Daniel Gross, arguably quite AI safety. Pilled. At least that was the mantra of safe, super intelligence. Weird that he left that so soon. A lot of open questions about how safe Super intelligence is doing.

By the way, if Daniel Gross is now leaving, I mean DG was the CEO, right? Co-founded it with Ilya, so what's going on there? But so that's a hanging Chad, but just Daniel Gross being being now over on the the meta side, you have to have enough of a concentration of exquisite talent to make it attractive for other exquisite talent to join. If you don't break that critical mass, you might as well have nothing. And that's been meta's problem this whole time.

They needed to just like jumpstart this thing with a massive capital infusion. Again, these massive pay packages, that's where it's coming from. Just give people a reason to come get some early proof points that get people excited about meta again.

And the weird thing is with all this, like, I'm not confident at all in saying this, but you could see a different line from Meta on safety going forward too, because Jan Laun was so dismissive of it, but now a lot of the people they've been forced to hire because there is, if you look at it objectively, a strong correlation between. The people in teams who are actually leading the frontier, and the people in teams who take loss of control over ai, seriously.

Now meta is kind of forced to change in some sense. It's DNA to take that seriously. So I think that's just a really interesting, like shift, and I know this sounds really harsh with respect to y Laun, like, you know, take it from what it is. It's, it's just one man's opinion, but I've, I have spoken to a lot of researchers who feel the same way. And again, I mean, I think the data kind of bears it out. Essentially, mark Zuckerberg is being forced to pay the Yan Lacoon tax right now.

And I don't know what happens to Yan Lacoon going forward, but I do kind of wonder if his meta days is, may be numbered or, you know, if there's gonna be a face saving measure that has to be taken there. Right. For context, Yian Koon is Meta's chief AI scientist. He's been there for over a decade, hired like I think around 20 13, 20 12 by meta one of the key figures in the development of newer networks, really over the last couple decades.

And, and certainly is a major researcher and contributor to the rise of deep learning in general. But as you said, a skeptic on large language models and a proponent for sort of other techniques. I will say not entirely bought into this, this narrative personally. The, the person heading up the effort on LAMA and LMS was not Yian Una Farms aware. There was another division within Meta that focused on generative technology that has now been revamped.

So the person leading with generative AI efforts in particular has been, has left, and now there is an entirely new division. Called the a GI foundations that is now being set up. So this is part of a major revamp. Yna Kun, still leading his more like research publication type side of things. And perhaps as far as I know, not very involved in this side of scaling up Lama and LS and all of this, which is. Less of a research effort, more of an r and d kind of compete with open AI and so on.

Effort. No, absolutely agree. And, and that was what I was referring to when I was saying Yan Lako is not sort of involved in the, the day-to-day kind of product side of the org. You know, it's, it's been known for a while that he's not actually, you know, doing the heavy lifting on Lama. But he has defined what it means, like essentially articulated Meta's philosophy on AI and AI scaling for the last, you know, however many years.

And so it's understood that when you join Meta at least, it was that you are buying into a sort of y lacuna aligned philosophy, which I think has, has, is the kind of core driving problem behind where Meta finds itself today? Yeah, that's definitely part of it. I mean, that's, that's part of the reputation of Meta as an AI research club also, I mean, part of the advantage of Meta and why people might wider go to Meta is because of a very open source friendly nature.

They're only, they're only very open source friendly because they're forced to do that. 'cause it's the only way they can get headlines while they pump out me. But, but, but regardless, regardless, it's, it's still a factor here. Yeah. One last thing of noting, on this whole story. I mean, you could do a whole speculative analysis of what went on the meta. They did also try to throw a lot of people at the problem scale up to, from a couple hundred to like a thousand people.

I think probably had a similar situation to Google where it was like big company problems. Right. OpenAI, Andro, they're still, they're huge, but they're, they don't have big company problems. That's a great point. Yeah. They have, they have scaling company problems, so this revamp could also help with us. Oh, scaling.

Research & Advancements

Yeah. All right. Onto research and advancements. No more trauma talk, I guess. Next we have a story from DeepMind and they have developed Alpha Genome, the latest in their alpha line of scientific models. So this one is focused on helping researchers understand gene functions. It's not meant for personal genome prediction, but more so just general identification of patterns. So it could help identifying causative mutations in patients with ultra rare cancers.

So for instance, which mutations are responsible for incorrect gene expression? I'm, I'm gonna be honest, you know, there's a deep, a lot of deep science here with regards to biology and genomics, which I am not at all an expert on. And the, the gist of it is similar to alpha fold, similar to other alpha efforts.

On the benchmarks dealing with the problems that geneticists deal with, the kind of prediction issues, the analysis alpha Genome kind of beats all existing techniques out of a park on almost every single benchmark. It is superseding previous efforts and, and the swan model is able to do a lot of things all at once.

So, again, not really my background to come with him was too much, but I'm sure that this is along the lines of alpha fold in terms of alpha fold was very useful scientifically for making predictions about gene folding protein folding alpha genome is presumably gonna be very useful for understanding genomics, for making predictions about which genes do what things like that.

it's a really interesting take that's I guess a, a fundamentally different way of approaching the let's understand biology problem that, that Google Eat Mind, and then it's, its subsidiary, I guess it's it's spawned company Isomorphic Labs, which by the way, Demis is the CEO of and, and very focused on, I hear has kind of been, been very focused on, anyway.

when you look at alpha fold you're looking at essentially predicting the structure and, and to some degree the function of, of proteins from the. Lego blocks that make up those proteins, right? The amino acids, the individual amino acids get that, get chained together, right? So you got, you know, 20 amino acids you can pick from and, and that's how you build a protein.

And depending on the amino acids that you have, some of their positive charge, some of their negative, some of are polar, some of 'em not. and then the thing will fold in a certain way that is distinct from the problem of saying, okay, I've got a strand of, you know, 300 billion base pair, sorry, 3 billion base pairs of DNA. And what I wanna know is if I take this one base pair and I switch it from I don't know like from an A to a T right? Or from a G to an A. what happens to the, the protein?

What happens to the downstream kind of biological activity? What cascades does that have, what effects does it have? And that question is a, it's an interesting question because it depends on your ability to model biology in a pretty interesting way. it, it also is tethered to an actual phenomenon in biology. So there's a thing called the single nucleotide polymorphism. There's some nucleotides in the human genome that you'll often see. Can, can either be like a a G or a T or something.

And you'll see some people who have the G variant and some people have the T variant. And it's often the case that some of these. Variants are associated with a particular disease. And so there's like a, I used to work in a genomics lab doing cardiology research back in the day, and there's like famous variant called nine p 21.3 or something.

And, you know, if some people had, I forget what it was, the T version, you have a higher risk of getting coronary artery disease or atherosclerosis or whatever. and not if you had the other one. So, essentially what this is doing is it's allowing you to reduce in some sense the number of experiments you need to perform.

If you can figure out, okay, like we have all these different possible variations across the human genome, but only a small number of them actually matter for a given disease or effect. And if we can model the genome pretty well, we might be able to pin down the variants we actually care about so that we can run more controlled experiments, right?

So we know that, hey, you know, patient A and patient B, they may have like a zillion different differences in their genomes, but actually for the purpose of this effect. They're quite comparable or they ought to be. So, so this anyway, a really, I think, interesting next advance from from Google DeepMind. And, I expect that we'll see a lot more 'cause they are explicitly interested in that direction, right?

And very least pretty detailed research paper, a preprint on this as they have of alpha fold 55 page paper, describing the model, describing the results, describing the data, all of that also released an API so a client side ability to query the model and it is free of charge for non-commercial use with some query limiting. So yeah, again, similar to alpha fold, they're making this available to scientists to use.

They haven't open source this yet the model itself, but they did explain how it works. So. Certainly exciting and, and always fun to see DeepMind doing this kind of stuff. And up next we have Direct Reasoning Optimization, DRO. So we've got, you know, GRPO, we've got DPO, we've like, you know, there, there, there's so many so many pos or Ros or o's, so many o's so LLMs can reward and refine their own reasoning for open-ended tasks. I like this paper. I like this paper a lot.

It's I, I think I, I might have talked about this on the podcast before. I used to have a. A prof who would like ask these very simple questions when you were presenting something and they were like, embarrassingly simple. And you would, you would be embarrassed to ask that question, but then that always turns out to be the right and deepest question to ask.

This is one of those papers, it's like, it's very simple concept, but it's something that when you realize it, you're like, oh my god, that was missing. So first let's just talk about how currently we typically train reasoning into models, right? So you have some output that you know is correct, right? Some answer the desired or target output. And you've got your input. So what you're gonna do is you're gonna feed your input to your model.

You're gonna get it to generate a bunch of different reasoning traces. And then in each case you're going to look at those reasoning traces feed them into the model and based on the reasoning trace that the model generated, see what probability is, signs to the target output that you know is correct. So reasoning traces that are correct in general will lead to a higher probability that the model places on the target outcome because it's the right outcome.

So if the reasoning is correct, it's gonna be give a higher probability to the outcome. So this is sort of, it feels a little bit backwards from the way we normally train these models, but this is how it's done, at least in, in GRPO group relative pol policy optimization. So essentially you reward the model to incentivize high probability of the desired. Output conditioned on the reasoning traces.

And this makes you generate over time, better and better reasoning traces 'cause you wanna generate reasoning traces that assign higher probability to the correct output. So the intuition here is if your reasoning is good, you should be very confident about the correct answer. Right now, this breaks and it breaks in a really interesting way.

Even if your reference answer is exactly correct you can end up being too forgiving to the model during training because the way that you score the model's confidence in the correct answer based on the reasoning traces, is you average together essentially the confidence scores of each of the answer tokens in the correct answer. Now the problem is. The first token of the correct answer often gives away the answer itself.

So even if the reasoning stream was completely wrong, like even if, let's say the question was like who scored the winning goal and the soccer game? And the answer was Lionel Mess. If the model's reasoning is like, I think it was Christiana Ronaldo the model is going to okay from there assign a low probability to Lionel, which is the, the first word of the correct answer. But once it reads the word Lionel, the model knows that messy must be the next I. Token.

So it's gonna assign up actually a high probability to messy, even though its reasoning Trace said Christiana Christiana Ronaldo. And so, essentially this suggests that there are some tokens in the answer that are going to actually like, correctly reflect your, the quality of your model's reasoning. So, you know, if your model's reasoning was, I think it was Christian Christiano, Ronaldo, and the actual answer was Lionel Messi.

Well Lionel, you should expect it to have very low confidence in so that's good. It you'll, you'll be able to actually correctly determine that your reasoning was wrong there. But once you get Lionel. As in, as part of the prompt, then messy, all of a sudden becomes obvious. And so you get a bit of a misfire there.

So essentially what they're gonna do is they're gonna calculate, like they'll feed in a whole bunch of reasoning traces and they'll look at each of the tokens in the correct output and see which of those tokens vary a lot. tokens that are actually reflective of the quality of the reasoning should have high variance, right? Because if you have good reasoning trajectory, those tokens should have high confidence. And if you have a bad reasoning trajectory, they should have low confidence.

But then you have some like kind of less reasoning reflective tokens, like say messy in Lionel messy. 'cause then Lionel has already given it away. You should expect messy to consistently have high confidence because again, even if your reasoning trace is totally wrong by the time you get Lionel as by the time you've, you've read Lionel Messy is obvious.

It's almost like I. If you're writing a test and you can see like the first word in the correct answer well, yeah, you're gonna get, even if your thinking was completely wrong, you're gonna get the correct second word if the answer is Lionel messy. So anyway, this is just way that they use to kind of detect good reasoning and then they feed that into anyway, a, a broader algorithm that beyond that is, is fairly, fairly simple. Nothing too, too shocking.

They just fold this into something that looks a lot like A-G-R-P-O to get a, this DRO algorithm. Right. Yeah, they, it's been a while in the paper contrasting it with other recent work that deals with that doesn't pay attention to tokens.

Basically, so that, just to contextualize what you were saying their focus is on this our free reasoning, reflection, reward, and DRO direct T reasoning optimization is basically GRPO, what people use generally for RL typically with verifiable reward here, where focus is how do we train kind of generally in an open-ended fashion over long reasoning chains.

Identify some of these issues and existing approaches and highlight this reasoning reflection award that basically is looking at, add consistency between these tokens in the chain of thought and in the output as a signal to optimize over. And as you might expect, you know, they do some experiments, they show that this winds up being quite useful. And, and I think another indication of we are still in the early-ish days of using RL and training reasoning.

There's a lot of noise and a lot of significant insights being leveraged. Last thing, DRO, I guess kind of a reference to DPO, as you said, DPO is direct preference optimization and this is direct reasoning optimization, not super related. It's just, I guess fun, fun naming conventions. 'cause aside from arguably being sort of, analogous in terms of the difference between a L based preference alignment and DPO. Anyway, it, it's kind of a funny reference. Yeah. next paper.

Far seer a refined scaling law in large language models. So we've talked about scaling laws a ton. Uh, Basically you try to collect a bunch of data points of, you know, once you use this much compute or this much training flops or whatever, you get to this particular loss on language prediction, typically on the actual metric of perplexity. And then you fit some sort of equation to those data points. And what tends to happen is you get a fairly good fit.

That holds for feature data points that typically you're like scaling up, scaling up, scaling up, your loss goes down and down and down. And people have found that somewhat surprisingly, you can get a very good fit that is very predictive, which was not at all kind of, common idea or something that people have had really tried pre 2020. So what this paper does is basically do that, but. Better. It's, it's a novel and refined scaling law that Inha provides enhanced predictive accuracy.

And they do that by just systematically constructing a model loss surface and, and doing just a better job of fitting to empirical data. They say that they improve upon the Chinchilla law, one of the big ones from a couple years ago, by reducing extrapolation error by 433%, so a much more reliable law, so to speak. Yeah, the, the Chinchilla scaling law was sort of somewhat famously Google's correction to the initial OpenAI scaling law that was proposed, I think in a 2019 paper.

This is the so-called Kaplan Scaling Law. And so, it, it was, chinchilla was sort of haired as this kind of big and, and ultimately maybe pseudo final word on how scaling would work. It was more data heavy than the Kaplan scaling laws, notably, but what they're pointing out here is. Chinchilla works really well for mid-size models, which is basically where it was calibrated, like, you know, what it was designed for. But, but it doesn't do great on very smaller, very large models.

And obviously given that scaling is a thing, very large models matter a lot. And the whole point of a scaling law is to extrapolate from where you are right now to see like, okay, well if I trained a model a hundred times the scale and therefore at, you know, let's say a hundred times this budget where would I expect to end up? And you can imagine how much depends on those kinds of decisions.

So you want a model that is really well calibrated and extrapolates really well, especially to very large models. they do a really interesting job in the paper. We won't go into detail, but especially if you have a background in physics like thermodynamics, they, they play this like really interesting game. Where they'll use finite difference analysis to, to kind of separate out dependencies between n the size of the model and d, the amount of data that it's trained on.

And that ultimately is kind of the, the secret sauce, if you wanna call it call it that here. There's a bunch of other hijinks, but the core pieces, they sort of break the loss down into different terms. One of which only depends on n, the other of which only depends on D. So one is just model size dependent. The other is only dependent on the size of the training data dataset.

But then they also introduce this interaction effect between n and d, between the size of the model and the amount of data it's trained on. And then they end up deriving what should that term look like? That's one of the, the framings of this. That's really interesting. Just to kind of nutshell it, if Chinchilla says that data scaling follows a consistent pattern, it's like d to the power of some negative beta coefficient, regardless of model size.

Like no matter how big your model is, it's always d to the power of negative B. So if I give you the, the amount of data you can determine the contribution of the data term what Farer says is data scaling actually depends on model size. Bigger models just fundamentally learn from data in a different way. And we'll park it there, but, but there's a lot of cool extrapolation to figure out how exactly does this term have to look exactly. And, and this is very useful, not just.

To sort of know what you're gonna get. That aspect of it means that for a given compute budget, you can predict what balance of data to model size is likely optimal and basically is when you're spending millions of dollars training a model. It's pretty nice to know these kinds of things. Right? And one more paper. Next one is LLM first search, Self-Guided Exploration of the Solution Space.

So the gist of this is there are many ways to do search where search just means, you know, you're look at one thing and then you decide on some. Other things to look at, and you keep doing that until you find a solution. So the typical, or one of the typical ways is Monte Carlo research a classic algorithm. And this was, for instance, done with a while ago.

If you wanna combine with an LM typically what you do is you assign some score to a given location and, and make perhaps some predictions, and then you have an existing algorithm to sample or to decide where to go. The key difference here with LLM for a search is basically forget that motor college research, forget any preexisting search algorithm or technique. Just make the LLM decide where to go. It can decide how to do the search.

And they say that this is more flexible, more contact can contact sensitive, requires less tuning and just seems to work better. It's, all prompt level stuff, right? So there's no optimization going on, no training, no fine tuning. It's just like give, like, give the model a prompt so number one. Find a way to represent the sequence of actions that have led to the current moment in whatever problem the language model is trying to solve in a way that's consistent.

So like essentially format, let's say all the chess moves up till this point in a consistent way so that the model can look at the state and the, the history of the, the board, if you will. And then give the model a prompt that says. Okay. From here, like I, I want you to decide whether to continue on the current path or look at alternative branches, alternative trajectories.

the prompt is like, here are some important considerations when deciding whether to explore or continue, and then it lists a bunch. And then similarly they have the same but for the evaluation stage where you're scoring the available options and getting the model to choose the most promising one. So, you know, it's like, here are some important considerations when evaluating possible operations that you could take or actions you could take.

So once you combine those things together, basically at each stage, I'll call it, of the game or of the problem solving the model has a, a complete history of all the actions taken up to that point. It's then prompted evaluate the options before it and to decide whether to. Continue to explore and kind of add new options or to select one of the options and execute against it. anyway, that's basically it.

Like, it's a pretty conceptually simple idea, just offload the tree and branching structure development to the model. So it's thinking them, thinking them through in, in real time. Pretty impressive performance jumps. So, when using g PT four oh when compared with standard Monte Carlo tree search on the scheme of countdown, were essentially.

You're given a bunch of numbers and all the standard mathematical operations, addition division, multiplication, subtraction you're trying to figure out how do I combine these numbers to get a target number. So at each stage you have to choose, okay, do I try adding these together? Do I anyway so 47% on this, using this technique versus 32% using Monte Carlo tree search. And this effect amplifies. So the advantage amplifies as you work with stronger models.

So on oh three mini, for example, 79% versus 41% for Mon Monte Carlo tree search. So Reasoning models seem to be able to take advantage of this. You can think of it as a kind of scaffold, a lot better. It also uses fewer tokens so it's getting better performance. It's using fewer tokens, so less compute than Monte Carlo tree search as well. So that's, that's really interesting, right?

This is a, a way more efficient way of squeezing performance out of existing models, and it's all just based on very kind of interpretable and tweakable prompts. Right. And, and they compare this not just to Montecarlo Tree search, we also compare it to three of thoughts or three of thoughts. Bread for a search, best for a search. All these are, by the way, are pretty significant because search broadly is like, there's a sequence of actions that can take and I want to get the best outcome.

And, you know, so you need to think many steps ahead. And so depending you branches here mean like, I take this step and this and this step. Well, you can either go deeper or wider in terms of how many steps you consider. One step ahead, two step ahead. And this is essential for many types of problems. You know, chess go obviously, but broadly we do search and all sorts of things. So having a better approach to search means you could do better.

Reasoning means you could do better problem solving. And moving on to policy and safety. We have one main story here called unsupervised elicitation of language models.

Policy & Safety

This is really interesting, and I'll be honest like was a head scratcher for my, like I spent a good, Embarrassing amount of time with Claude trying to help me through the paper. Which is sort of ironic because I, if I remember, it's an anthropic paper. But this is essentially a way of getting a language model's internal understanding of logic to help it to solve problems. So imagine that you have a bunch of math problems and solutions.

So for example, you know, what's five plus three, and then you have a possible solution, right? Maybe it's eight. The next problem is like, what's seven plus two? And you have a possible solution, and that possible solution is maybe 10, which is wrong, by the way. So some of these possible solutions are gonna be wrong. So you have a bunch of math problems and possible solutions and you don't know what you're, which are correct and incorrect.

And you wanna train a language model to identify correct solutions, right? You want to figure out which of these are actually correct. So imagine you just lay these all out in a list. You have, you know, what's five plus three and then solution eight, what's seven plus two solution 10 and, and so on. Now what you're gonna do is you're gonna randomly assign correct and incorrect labels to a few of these examples, right?

So you'll say, you know, five plus three equals eight, and you'll just randomly say, okay, that's correct. And seven plus two equals 10, which by the way is wrong. But you'll randomly say that's correct, right?

Then you're going to get the model to say, given the the correctness scores that we have here, given that solution one is correct and solution two is correct, what should solution three be roughly, or, you know, given all the incorrect and incorrect and correct labels that we've assigned randomly, secretly what should be this missing label?

And generally, because you've randomly assigned these labels, the model's gonna get really confused because there's a logical inconsistency between these randomly assigned labels a bunch of the problems that you've labeled as correct or actually wrong and vice versa. And so now what you're gonna do is essentially try to like measure how confused the model is about that problem. And you are then gonna flip one label, so you'll.

Kind of think of like flipping the, the correct or incorrect label on one of these. One of these problems from correct to incorrect say, and then you'll repeat and you'll see if you get a lower confusion score from the model. anyway, this, this is roughly the, the, the concept. And so over time you're gonna gradually converge on a lower, lower confusion score.

And that's, it's sort of like, feels almost like the model's relaxing into the correct answer, which is why this is a lot like simulated and kneeling. If you're, if you're familiar with that, you're making random modifications to the problem until you get a really low loss and you gradually kind of relax into the correct answer. I hope that makes sense. It's sort of like you kind of gotta see it and it's. Right.

Just to give some motivation, they frame this problem, and this is from Tropic and, and a couple other institutes. By the way. They frame this in the context of super human models. So the unsupervised elicitation part of this is about the aspect of how do you train a model to do certain things, right? And these days, the common paradigm is you train your language model via pre-training, then you.

Post train you have some labels for your words or preferences of outputs, and then you do RLHF or you do DPO to make a model, do what you want it to do. But the framework or the idea here is once you get to superhuman ai, well, maybe humid, can't actually, you know, see what it does. And, kind of give it to labels of what is good and what's not.

So this internal coherence maximization framework makes it so you can elicit the good behaviors, the desired behaviors from the LLM without external supervision by humans. And, and the, the key distinction here from previous efforts in this kind of direction is that they do it at scale. So they train a cloud 3.5 haiku based assistant without any human labels and achieve better performance than its human supervised counterparts.

They demonstrate in practice on a, you know, significantly sized LLM, that this approach can work and this could have implications for feature, you know, even larger models. Next up, a couple stories on the policy side. Well, actually only one story. It's about Taiwan and it has imposed technology export controls on Hu Huawei and SMIC. Taiwan has actually black listed Huawei and SMIC, semiconductor Manufacturing International Corp. And this is from Taiwan's International Trade on Administration.

They have also included subsidiaries of these it's an update to their so-called strategic high tech commodities entity list. And apparently we added not just those 601 entities from Russia, Pakistan, Iran, Myanmar, and mainly, and China. Yeah. And one um, you know, reaction you might have looking at this is like, wait a minute. I thought China was already barred from accessing, for example, chips from Taiwan. And you're, you're absolutely correct. That is the case. That was my reaction.

Yeah, yeah, yeah. No, totally, totally. It's, it's a great question. Like, so what, like what is actually being added here? And so the answer is because of us export controls, and we can get, we don't, we won't get into the reason why us, the US has leverage to, to do this, but they do. Taiwanese chips are not going into mainland China, at least theoretically. Obviously Huawei finds ways around that, but this is actually a kind of broader thing to deal with.

A whole bunch of plant construction technologies, for example specialized materials, equipment that. It isn't necessarily covered by US control. So there's sort of broader supply chain coverage here. Whereas US controls are more focused on cutting off, like specifically chip manufacturing here. Taiwan is formally blocking access to the whole semiconductor supply chain. It's everything from specialized chemicals and materials to manufacturing equipment, technical services.

So sort of viewed as this loophole closing exercise coming from Taiwan. This is quite interesting because it's coming from Taiwan as well, right? This is not the US kind of leaning in and, and forcing anything to happen though, you know, who knows what happened behind closed doors. It's interesting that Taiwan is taking this kind of hawkish stance on China.

So even though Huawei couldn't get TSMC to manufacture their best chips, they have been working with SMIC to develop some domestic capabilities for chip manufacturing. anyway, this basically just makes it harder for that to happen. Next up paper, dealing with some concerns actually from a couple weeks ago, but I don't think we covered it. So worth going over it pretty quickly. The title of a paper is Your Brain on the Chad, GPT.

Accumulation of cognitive debt when using an AI assistant for essay writing tasks. So what they do in this paper is have a few have 54 participants write essays. Some of them can use LS to help 'em do that. Some of them can use search engines to throughout dut, some of them have to do it themselves, no tools at all. And then they do a bunch of stuff. They first measure the brain activity with EEGs. To they say assess cognitive load during essay writing.

They follow up by looking at recall metrics and the reduced results is there's significant differences between the different groups. EEGs reveal less circled brain connectivity between brain only participants and LLM participants and search participants. Similarly self-reported ownership. Recall all these things differed. This one got a lot of play, I think on Twitter and so on, and I. Quite a bit of criticism.

Also, I think in, in overblowing the conclusions, I think the notion of cognitive debt. The framing here is that there's long-term negative effects on cognitive per performance due to decreased mental effort and engagement. And you can certainly question whether that's the conclusion you can draw here. What they show is if you use a tool to write an essay, it takes up less effort and you probably don't remember what is in the essay as well.

Does that transfer to long-term negative effects on cognitive performance due to decreased mental effort engagement Maybe I like, and I, my, all I have is a, a personal take on this too. Like, I think that good, so good writers are good thinkers. Because you're, when you, when you are forced to sit down and, and write something, at least it's been my experience that I don't really understand something until I've written something about it with intent.

And so, in fact, I, when I'm trying to understand something new that I actually make myself write it out because it just doesn't stick in the same way. Dif different people may be different, but I suspect that maybe less so than some people might assume they are. So I think, at least for people like me, I imagine this would be an effect.

It's interesting they say, yeah, after writing 17% of chat, GPT users could quote their own sentences versus 89% for the brain only group, the ones who didn't use even Google. The other interesting thing here is that. by various measures, Google is either between using chat GPT and going brain only, or it can even be slightly better than brain only. I, I thought that was quite interesting, right?

Like Google is sort of this thing that allows like, fairly obsessed people like myself to kind of do deep dives on, let's say technical topics and learn way faster than they otherwise could without necessarily giving them the answer. And. Chat, JPT at least, or, or LLMs at least open up the possibility to not do that. Now, I will say, I think there are ways of using those models that, that actually do accelerate, accelerate your learning.

I, I, I think I've experienced that myself, but the reten, there has to be some kind of innate thing that you do. At least I, I don't know, I'm self diagnosing right now, but there there's gonna be some kind of innate thing that I do like whether it's writing or drawing something or making a graphic. To actually make it stick and make me feel a sense of ownership over the knowledge. But yeah, I mean, look, we're gonna find out, right?

People have been talking about the effects of technology on the human brain for, since the printing press, right? When people are saying like, Hey, we rely on our brains to store memories. If you just start getting people to read books, well now the human ability to, to have long-term memory is gonna atrophy and. And you know what, it, it probably did in some ways, but we, we kind of found ways around that.

So I think, you know, th this may turn out to be just another thing like that, or it may turn out to actually be somewhat fundamental. Because, you know, back in the, the days of the printing press, you still had to survive. Like, you know, there, there was enough kind of real and present pressure on you to to learn stuff and retain that. You know, maybe it didn't have the effect it otherwise would, but interesting study.

I'm sure we'll keep seeing analyses and analyses for the next the next few months. Yeah, quite a long paper, like 87 pages. Lots of details about the brain connectivity results. And ironically, it was too long for me to read. No, that's actually true. I used an LM for this one. It's like, anyway, I, I have seen quite a bit of criticism on the, yeah. Precise methodology of a paper and some of its conclusions. I think also in some ways it's very common sense.

You know, if you don't put in effort doing something, you're not gonna get better at it. Yeah. You know, that's already something we know. But. I, I guess there shouldn't be too much of a hitter. I'm sure this paper also has some nice empirical results that are useful in, as you say, like a very relevant line of work with regards to what actual cognitive impacts usage Olms has. And and how important is it to like go brain only sometimes.

Synthetic Media & Art

All right, onto synthetic media and art. Just do more stories to cover, and as promised in the beginning, these ones are dealing with copyright. So last week we talked about how philanthropic scored a copyright win. The gist of that conclusion was that using content from books to train LLMs is fine. At least philanthropic. What is actually bad is p pirating books in the first place.

So philanthropic bought a bunch of books, scanned them, and used the scanned data to train LLM, and that kind of passed a bar. It was Okay. So now we have a new ruling about a judge rejecting an off. Some offers claims that meta ai training has violated. Copyrights. So the, a federal judge has dismissed a copyright infringement claim by 13 offers against meta for using their books to train its AI models.

The Judge Vincent RIA, has ruled that Meta's used of nearly 200,000 books, including the people suing to Train the Lama language model constituted fair use. And this does similarly align with a ruling, very ruling about philanthropic with Claude. So, this is a rejection of the claim that this is piracy. Basically the judgment is that the outputs of LAMA are transformative, so you're not infringing on copyright. And, and this is, you know, using the data for training and a language model.

Is Fred use and copyright doesn't apply. Is at, at least as far as I can tell, is again, not a lawyer, is the conclusion seems like a pretty big deal, like the legal precedent for wherever it's legal to use the outputs of a model, when some of the inputs to it where copyrighted appears to be being kind of figured out. Yeah, this is super interesting, right? You've got judges trying to like square the circle on allowing what is obviously a very transformational technology.

And, but I mean, the challenge is like, no, no author ever wrote a book until say 2020 or whatever with the expectation that this technology would be there. It's just sort of like, no one ever imagined that facial recognition would get to where it is when Facebook was first founded and people, or MySpace and people first started uploading, you know, a bunch of pictures of themselves and their kids, and it's like, yeah.

Now that's out there and you're waiting for a generation of software that can use it in ways that you don't want it to. Right? Like, you know, deep fakes, I'm sure were not even remotely on the radar of people who posted pictures of their children on MySpace in the late nineties, right? That's like, that is one extreme version of, of where this kind of argument lands. So now you have authors who write books.

You can say like, in good faith or assuming a certain technological trajectory, assuming that those books when put out in, in the world. Could not technologically be used for, for anything other than just, you know, what they expected them to be used for, which is being read. And, and now that suddenly changes. And so, and it changes in ways that undermine the market quite directly for those books.

Like it is just a fact that if you have a great, like a book that really explains a, a technical concept very well and your language model is trained on that book. And now can also explain that concept really well. Not using the exact same words, but maybe having been informed by it, maybe having, you know, using. Analogous strategies. It's hard to argue that that doesn't undercut the market for the original book. But it, it is transformative. Right.

The threshold that the judge in this case was using was that llama cannot create copies of more than 50 words. Well, yeah. I mean, you can, you, every word could be different, but it could still be writing in the style of, right. And, and that's kind of a different threshold that you could otherwise have imagined the judge could have gone with or something like that.

But, there is openness apparently from the judge to this argument that AI could destroy the market for original works or original books just by making it easy to create tons of cheap knockoffs, and they're claiming that likely would not be fair use, even if the outputs were different from the inputs. But again, the challenge here is that it's not necessarily just books, right?

It's also like you just want a good explanation for a thing, and the form factor that's best for you is a, a couple sentences rather than a book. So maybe you err on the side of the language model and maybe you just keep doing that, whereas in the past you might have had to buy a book. So, I, I think overall this makes as much sense as any judgment on this. I, I don't have, you know, like I, I, I feel, feel deeply for the, the judges who are put in the position of having to make this call.

It's just tough. I mean, you, you can, you can make your own call as to what makes sense, but, but man is this littered with with nuance. It's, it's, it is worth noting to speak of nuance that the judge did very explicitly say that this is judging on this case specifically not about the topic as a whole. He did frame it as Caprio law being about, more than anything, preserving the incentive for humans to create artistic and scientific works.

And fair use would not apply, as you said, to copying that would significantly diminish the ability of copyright holders to make money from their work. And so in this case meta presented evidence that book sales did not go down after llama released four these offers, which included for instance, so Sour Silverman Junot Diaz, and overall there were 13 offers in this case. So, yes, this is not necessarily establishing precedent in general for any. Suit that is wrought.

But at least in this case, the conclusion is meta doesn't have to pay. These offers and generally did not go against copyrights by training on the data of their books without asking for permission or paying them. And just one last story. The next one is that Getty has dropped some key copyright claims in its lawsuit against stability ai, although it is continuing a UK lawsuit. So, the primary claim against spaghetti was about copyright infringement.

So they dropped a claim about stability, AI using millions of copyrighted images to train its AI model without permission. But they still are keeping the secondary infringement and I guess trademark infringement claims that say that AI models could be considered infringing articles if used in the uk, even a phrase, trained elsewhere. So, honestly, don't fully get the, the legal implications here.

It seems like in this case in particular, it was the claims were dropped because of weak evidence and lack of knowledgeable witnesses from stability, ai. There's also apparently jurisdictional issues where these kind of lacking evidence could be problematic. So. A development that is not directly connected to these prior things we were discussing. Seems to be, again, fairly specific to this particular lawsuit.

But in our case of, you know, copyright in cases going forward, this one being a pretty significant one dealing with training on the images and if you're dropping your key claim in this lawsuit that you know, bodes well for stability, ai. And that's it for this episode of Last in ai. Thank you for all of you who listened at one X Speed Routes being up. And thank you for all of you who tune in week to week share the podcast review and so on. So on. Please keep tuning in.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast